[Dirvish] Re: Fatal errors with no apparent cause

Matthew Palmer mpalmer at hezmatt.org
Tue Mar 29 20:18:33 PST 2005


On Tue, Mar 29, 2005 at 09:32:33AM -0800, Keith Lofstrom wrote:
> Tell us a little more about your local machine.  Stock kernel or hacked?
> 2.4 or 2.6?  Distro?  ( uname -a )  What are you using for source and
> target disks, and on what type of interfaces?  Which version of rsync
> ( rsync --version )?  Which version of perl ( perl --version )?  Which
> version of ssh ( ssh -V )?  And THAT points out the need for ANOTHER
> dirvish feature, one that reports not just the dirvish version, but the
> versions of all the components dirvish relies on. 

Debian Woody+backports, 2.4.23 hand-compiled kernel, on a 2GHz P4 (no
hyperthreading enabled), The source discs are mostly SCSI of some sort (I'm
not a SCSI expert; they're using the D-type connectors though, so it's not
excruciatingly old), the bank disk is a basic EIDE disc on the on-board
controller.

rsync --version: 2.6.3 protocol version 28
perl --version: v5.6.1
ssh -V: OpenSSH_3.8.1p1 Debian 1:3.8.1p1-6.backports.org.1, OpenSSL 0.9.6c

Although the ssh version shouldn't matter, since it's failing locally...

On the whole, though, a lack of graceful degradation in rsync/dirvish to
insufficient resources seems a bit... fragile.  

Last night, with the -vv option enabled, I got a bit more in the e-mailed
logs but nothing particularly indicative in the rsync_error files:

dirvish vulcan/etc:default fatal error: write error, filesystem probably full
dirvish vulcan/etc:default fatal error: write error, filesystem probably full
dirvish vulcan/etc:default fatal error: write error, filesystem probably full
dirvish vulcan/etc:default fatal error: write error, filesystem probably full
dirvish vulcan/etc:default error (30) -- timeout in data send/receive
dirvish vulcan/home:default fatal error: write error, filesystem probably full
dirvish vulcan/home:default fatal error: write error, filesystem probably full
dirvish vulcan/home:default fatal error: write error, filesystem probably full
dirvish vulcan/home:default fatal error: write error, filesystem probably full
dirvish vulcan/home:default error (30) -- timeout in data send/receive

Since vulcan is the backup host, and /etc is pretty pifflingly small
(4.5MB), if dirvish has resource issues at that scale, I'm a bit worried.

> I am more curious than ever what top -b > top_log_file , run overnight,
> would report.  

That's going to go on in tonight's run.  I didn't enable it for last night
to see if the other things you suggested might have fixed it.

Note that dirvish doesn't run at any of the regular system job times,
though.  Dirvish kicks off at 2200, is finished by midnight, the tape backup
system (which I'm hoping to be able to shaft in the nearish future) runs at
2am, and the daily system tasks run at around 6am.

> Lastly, since the problem is mediated by rsync, you should google the
> rsync list for clues.  try this google search string:
> 
> site:lists.samba.org "rsync error: error in rsync protocol data stream (code 12)"
> 
> A lot of people on the rsync list have seen these errors, and they often
> resolve to a machine resource issue ( early 2.4 kernels unable to handle
>  >1GB RAM, for example ).  

I'm not finding anything useful from that google search.  A lot of "I have a
problem, it throws this error" posts, but not a lot of "this was the
problem, here's how I fixed it" followups.

- Matt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.dirvish.org/pipermail/dirvish/attachments/20050330/51d08abb/attachment.bin


More information about the Dirvish mailing list