[Dirvish] a couple of issues

Keith Lofstrom keithl at kl-ic.com
Sun Mar 27 23:40:38 PST 2005


On Fri, Mar 25, 2005 at 10:33:22AM +0100, Paul Slootman wrote:

> Secondly,
> (looking at dirvish_1_3_khl):
> 
>          # 24 changed from error to warning.  This keeps dirvish from failing
>          # if the client deletes a file between list creation and the file
>          # movement  KHL  (from note on 2004-08-07 )
> 
> But in dirvishlib.pl this is present:
> 
>     [ 'error',      '^file has vanished: ',                 ],
> 
> I'd have expected the 'error' to be 'warning' here...

Paul:

Right you are, and I took the time to decypher the spagetti around that
before I fixed that, following your suggestion.

For those of you wondering what the heck we are talking about, this
is code in the  @erraction  array in the  errorscan  subroutine in the
dirvishlib.pl  library, code which is called by all the applications.
The second field in each entry of the table is matched to one of the
error messages that rsync can throw at STDERR (along with the status
return, which we are assuming is "24" for this error).   This is one
of the places that we decide that this status result is a  "warning"
and don't try again, as we would with an "error".  If the file goes
away, what do we need to try again for?  If it mysteriously re-appears,
we can grab it the next time around.

I understand the code better now, so it is good that you pointed this out. 

------ related subject -------

We should also consider rsync error 23 in dirvish:
    23 => [ 'error',   "partial transfer"              ],
and this line in the @erraction array:
    [ 'fatal',   '^\S*sh: .* No such file',            ],

These are not consistent - I sometimes get error 23s that look like:

send_files failed to open <filename>: No such file or directory

... in the dirvish rsync_error file, typically when my firewall machine
is diddling with /var/spool/postfix/deferred during a backup ( and
postfix will get deferred when the hose is full of rsync packets ) . 
This means the I do a second cycle sometimes because of the error 23,
but I do not trip the fatal warning, which is caused by a similar
but slightly different "No such file" command thrown by rsync.

Those the "send_files" lines do not match the entry in @erraction,
because it is looking for some word that begins in column one and
ends with "sh:", followed somewhere on the line with "No such file".
 I have no idea what error jw was looking for; perhaps something
thrown by ssh or rsh .  There is nothing I can find in the rsync
code ( http://rsync.samba.org/ftp/rsync/rsync-2.6.3.tar.gz ) that
would print that out. 

Help me out here.  If you folks could look at your own log files,
for <BANK>/<VAULT>/<IMAGE>/rsync_error files containing  "cycle 1"
on your backup disks, perhaps with a

   grep "cycle 1" */*/*/rsync_error 

... in the directory containing your banks.  Look at the files
this catches and see what kind of things are causing loops. 
Perhaps we can speed up dirvish a bit.  

In general it is a good idea to look at any rsync_error files you
might have, and let us know if there is anything in them you can't
figure out from the dirvish documentation and wiki.

Keith

-- 
Keith Lofstrom          keithl at keithl.com         Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs


More information about the Dirvish mailing list