[Dirvish] Repair failed image.. is it possible?

Keith Lofstrom keithl at kl-ic.com
Tue Jan 9 07:55:51 PST 2007


On Tue, Jan 09, 2007 at 08:19:35AM -0700, hanj wrote:
> Hello All
> 
> I have a interesting and annoying situation. On one remote server, I'm
> having issues with corrupted MAC via SSH and my session disconnects.
> This appears to be a hardware problem somewhere on my route.. and I'm
> working with my ISP's network admins on the problems. Now.. the
> question to you. When this happens, my dirvish image fails since I'm
> disconnected in the middle of the backup.
> 
> Is it possible to repair the image? Currently, I have to delete the
> dated folder and try again, and cross my fingers it doesn't fail on
> this try. I would really like to just repair the image from the point
> it failed.
> 
> I tried copying files, etc from 'good' images, but it doesn't see them
> for the next pass the following day.

This is more an network question than a dirvish question - dirvish needs
rsync to be working, and rsync needs the underlying network transport to
be working.  I don't think you should be trying to run dirvish (or any
other backup tool) over a network until you can get the network operating
properly.  This can be due to many things, very likely a configuration
problem since typical IP transport protocols are tolerant of lost packets,
but intolerant of configuration errors that continuously misdirect them.

Sometimes the "configuration error" is a zombied machine somewhere on
the path.  Do not rule out enemy action.  While ssh can tunnel through
hostile networks, it will get confused and have to restart a lot if
another machine is pretending to be one of two legitimate endpoints. 
However, it is more likely to be something like a iptables and NAT
misconfiguration - this has happened to me, and I fixed it mostly by
careful reading of the iptables docs and proper configuration rather
than by observing packets.

You will need a network guru, not an rsync guru, for now.  If you need
to build test cases that stress a probably-working network, rsync can
be good for that, but avoid the complexities of dirvish and build some
simplified test cases.  For example, use rsync alone to copy directories
between two machines, identical process each time (same initial source
and target data).  If you get varying results from identical rsync copy
processes and you cannot figure out what is happening from the tcpdump
logs, then pick an easier-to-understand application than rsync.  

Good luck.  Network problems are a pain.

Keith

-- 
Keith Lofstrom          keithl at keithl.com         Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs


More information about the Dirvish mailing list