星期二 十二月 13, 2011

a dtrace example to troubleshoot 'cp -r' hang


Yesterday when I was tring some test on a zpool, I found a problem and then solved it with dtrace.

The test was a simple 'cp -r /etc /tank' where /tank is a zfs i created. However soon after the command was started, it just hang there.
===============================================================
root@vbs10 # df -h /tank
Filesystem             size   used  avail capacity  Mounted on
tank                    31G    31K    31G     1%    /tank
root@vbs10 # cp -r /etc /tank

===============================================================

df -h /tank showed that it almost got no growing after a while.
================================================================
root@vbs10 # df -h /tank
Filesystem             size   used  avail capacity  Mounted on
tank                    31G   824K    31G     1%    /tank
================================================================

Let's check the stack backtrace of the cp process.
=================================================================
root@vbs10 # ps -ef|grep cp
    root  2032  1964   0 10:04:57 pts/2       0:00 cp -r /etc /tank
root@vbs10 # pstack 2032
2032:   cp -r /etc /tank
 feefb597 read     (4, 803e1a0, 8000)
 08052bd9 ???????? (4, 5, 8046b40, 8068e58, 80662e0, 8066370)
 080526ab ???????? (8046b40, 8068e58)
 08053678 ???????? (8047d66, 8067100)
 08053b03 ???????? (8047d66, 8067100)
 08052777 ???????? (8047d66, 8067100)
 08051b73 main     (1, 8047c98, 8047ca4) + 430
 08051696 ???????? (4, 8047d60, 8047d63, 8047d66, 8047d6b, 0)
root@vbs10 #
=======================================================================

looks like it's waiting for some 'read' to complete.

well, take a closer look by mdb
======================================================================
root@vbs10 # mdb -p 2032
Loading modules: [ ld.so.1 libc.so.1 libavl.so.1 libcmdutils.so.1 ]
> $C
0803e17c libc.so.1`_read+7(4, 803e1a0, 8000)
080461b8 0x8052bd9(4, 5, 8046b40, 8068e58, 80662e0, 8066370)
08046a00 0x80526ab(8046b40, 8068e58)
08046f50 0x8053678(8047d66, 8067100)
08047010 0x8053b03(8047d66, 8067100)
08047848 0x8052777(8047d66, 8067100)
08047c64 main+0x430(1, 8047c98, 8047ca4)
08047c84 _start+0x7a(4, 8047d60, 8047d63, 8047d66, 8047d6b, 0)
> libc.so.1`_read+7::dis
libc.so.1`_read:                movl   $0x3,%eax
libc.so.1`_read+5:              syscall
libc.so.1`_read+7:              jae    +0xc     <libc.so.1`_read+0x13>
libc.so.1`_read+9:              cmpl   $0x5b,%eax
libc.so.1`_read+0xc:            je     -0xc     <libc.so.1`_read>
libc.so.1`_read+0xe:            jmp    -0x8abee <libc.so.1`__cerror>
libc.so.1`_read+0x13:           ret
0xfeefb5a4:                     addb   %al,(%eax)
0xfeefb5a6:                     addb   %al,(%eax)
0xfeefb5a8:                     addb   %al,(%eax)
0xfeefb5aa:                     addb   %al,(%eax)
0xfeefb5ac:                     addb   %al,(%eax)
0xfeefb5ae:                     addb   %al,(%eax)
>
======================================================================

yes it's stuck in libc.so`_read.

My idea was to find out what's the file being read at that time.

Well the best tool I can think of is the opensnoop in DTraceToolkit

Fortunately the issue was reproducible, so I did it again, this time with opensnoop.

I ran 'opensnoop -n cp' on one terminal and then 'cp -r /etc /tank' on another terminal.

==================================================================================
...

    0   2269 cp             8 /etc/webconsole/console/prereg/console/userlogin.reg
    0   2269 cp             8 /etc/webconsole/console/prereg/console/jato_jar.reg
    0   2269 cp             6 /etc/webconsole/console/config.properties
    0   2269 cp             6 /etc/webconsole/console/.pswd
    0   2269 cp             6 /etc/webconsole/console/status.properties
    0   2269 cp             7 /etc/webconsole/console/regcache/cache.bkp
    0   2269 cp             7 /etc/webconsole/console/regcache/registry.properties
    0   2269 cp             6 /etc/webconsole/console/default.properties
    0   2269 cp             4 /etc/utmppipe

...
=================================================================================

 The last file being accessed was '/etc/utmppipe' and yes cp was stuck here.

 So what's it?

================================================================
root@vbs10 # ls -lh utmppipe
prw-------   1 root     root           0 Dec 12 17:31 utmppipe
================================================================

 It's a pipe file.


 That's it. When you 'cp -r' on a pipe file, it tries to read from pipe instead of replicating the pipe file itself and this keeps it waiting.

  Let's see if there's any option regarding pipe in cp manpage. And indeed there's one

=======================================================================
     ‐R    Same as ‐r, except  pipes  are  replicated,  not  read from.
=======================================================================

   Now we know that the problem can be overcome by using 'cp -R /etc /tank' instead of 'cp -r /etc /tank'

====================================================================
root@vbs10 # time cp -R /etc /tank
cp: cannot create special file /tank/etc/utmppipe: File exists
cp: /etc/.syslog_door: read: Operation not applicable
cp: /etc/sysevent/syseventconfd_event_channel/reg_door: read: Operation not applicable
cp: /etc/sysevent/devfsadm_event_channel/1: read: Operation not applicable
cp: /etc/sysevent/devfsadm_event_channel/reg_door: read: Operation not applicable
cp: /etc/sysevent/piclevent_door: read: Operation not applicable
cp: /etc/sysevent/sysevent_door: read: Operation not applicable
cp: /etc/svc/volatile/repository_door: read: Operation not applicable

real    0m7.095s
user    0m0.331s
sys     0m3.130s

root@vbs10 # du -sh /etc
  70M   /etc

root@vbs10 # df -h /tank
Filesystem             size   used  avail capacity  Mounted on
tank                    31G    80M    31G     1%    /tank
====================================================================

    it only took me 7s.

    This problem may be also investigated by other tools such as truss. However as you can see dtrace is much powerful and quite easy to use especially with the dtracetoolkit 'killer-app'.

-EOF-

About

samwan

Search

Categories
Archives
« 七月 2014
星期日星期一星期二星期三星期四星期五星期六
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
今天