[Dirvish] Copying banks

Keith Lofstrom keithl at kl-ic.com
Wed Jan 20 16:43:53 UTC 2010

On Wed, Jan 20, 2010 at 03:31:19PM +0100, Bernd Haug wrote:
> What do you use when you need to move whole banks to other hosts (or
> other file systems)?
> rsync -e ssh -aAHXx /mount-point root at remote:/new-mountpoint is very
> slow (due to hard link preservation, I presume).
> Just dd'ing is out of the question. (E.g. because, in my case, the new
> device is slightly (i.e., a few MiB, but still) smaller.)

I looked for tools fo this a few years ago, and did not find
anything.  I like to keep old images - the expense of expire (CPU
time and disk "wearout" and chances for error) is not normally
worth the extra space gained.  However, this results in lots of
images, and some files with hundreds of hard links.  If I am using
ext[2,3,4], and run out of inodes ... disaster.  I have a file
system that is partly full of images and heavily hardlinked. 
Copying the data to another file system built with a proper number
of inodes involves too much data movement, because the known
copying processes (and rsync at the time I looked) do not
efficiently copy hardlinked files.  Perhaps that is better now.

Something that copied the data once, and kept track of hardlinks,
without huge tables somewhere, might need to be aware of the
underlying structure of the of the filesystem to do the job
efficiently.  It may be necessary to keep track of the hardlinks
going the other direction, from data inode to directory entry.

Beyond that, a simple copy might not be as efficient as keeping
track of the actual file data, and merging hardlink trees where the
data permits it.  That would make the filesystem copies much more
compact than the original source filesystem, and help with keeping
evolving branches compact.  This would be helpful for rsync-based
backup, but a generally useful tool for active file systems, because
in some cases you might want the two hardlink trees to evolve
separately.  Evolution does not need to happen with backups.

If you can invent an efficient way to (with finite time and finite
RAM) copy large hardlinked data trees, and especially if you can
(optionally) merge the hardlinks of identical files,  you would be a
hero to me.  I just don't know whether there is a good way to do it.


Keith Lofstrom          keithl at keithl.com         Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs

More information about the Dirvish mailing list