[Dirvish] dirvish hanging on find

Dave Howorth dhoworth at mrc-lmb.cam.ac.uk
Thu Jun 16 04:55:58 PDT 2005


I have a weird problem and am wondering if anybody has seen it before. 
One of my dirvish jobs, which has previously worked fine, has hung 
waiting on a subprocess. dirvish-runall is running a dirvish job which 
has hung waiting for /usr/bin/find to finish. find seems to be in an 
infinite loop. Here's what I can see:

 From ps, here are the commands:

root      4820  4755  0 Jun15 ?        00:00:00 /usr/bin/perl 
/usr/local/sbin/dirvish-runall

root      5042  4820  0 Jun15 ?        00:00:01 /usr/bin/perl 
/usr/local/sbin/dirvish --vault pcx36-var --image-time 22:00

root      5106  5042 99 Jun15 ?        1-05:49:36 find 
/backup/pcx36/pcx36-var/2005-06-14/tree -ls

 From top, find is very busy doing something:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
          5106 root      39  15  2928  528 2752 R 99.9  0.0   1792:15 find

 From strace, find isn't interacting with the system:

suse1:~ # strace -p 4820
Process 4820 attached - interrupt to quit
wait4(5042,

suse1:~ # strace -p 5042
Process 5042 attached - interrupt to quit
read(11,

suse1:~ # strace -p 5106
Process 5106 attached - interrupt to quit

(no output despite leaving it for quite a while, so my conclusion is 
that find is in an infinite loop)

 From lsof, to be sure it really is /usr/bin/find:

suse1:~ # lsof -p 5106
COMMAND  PID USER   FD   TYPE DEVICE    SIZE     NODE NAME
find    5106 root  cwd    DIR  253,5     744   793554 
/backup/pcx36/pcx36-var/2005-06-14/tree/www
find    5106 root  rtd    DIR    3,1     680        2 /
find    5106 root  txt    REG    3,1   66216    36947 /usr/bin/find
find    5106 root  mem    REG    3,1  106608    20736 /lib64/ld-2.3.3.so
find    5106 root  mem    REG    3,1  217016   164645 /var/run/nscd/passwd
find    5106 root  mem    REG    3,1  217016   164652 /var/run/nscd/group
find    5106 root  mem    REG    3,1 1412174    20761 /lib64/tls/libc.so.6
find    5106 root    0r  FIFO    0,7         32522573 pipe
find    5106 root    1w  FIFO    0,7         33987450 pipe
find    5106 root    2w   REG    3,1    2256   164546 
/tmp/run-crons.ZY4693/run-crons.daily.4691
find    5106 root    3r   DIR    3,1     848    13368 /root

This is the line of code in dirvish that executes find, whilst trying to 
index the backup tree:

         open(FIND, "find $destree -ls|") or seppuku 21, "dirvish 
$vault:$image cannot build index";

Has anyone seen anything like this?

To me it looks like find is in an infinite loop, and that seems like it 
must be a bug in find (it's on Suse9.2, BTW.  2.6.8-24.14-smp, x86_64) 
Any ideas on how to extract more information before I finally give up 
and kill the process?

Cheers, Dave



More information about the Dirvish mailing list