[Dirvish] Copying banks

Paul Slootman paul at debian.org
Wed Jan 20 15:08:46 UTC 2010

On Wed 20 Jan 2010, Bernd Haug wrote:

> What do you use when you need to move whole banks to other hosts (or
> other file systems)?

I usually just move the latest image, and let time take care of the rest :)

> • the BSD dump port - it needs too much temp space (i.e., multi-GiB)
> for filesystems with a large directory structure and is also quite
> slow

Why temp space? You can pipe from dump to dump

> I can imagine that making a faster tool that does not have to search
> for other files that link there is possible -- in pythocode:
> multilinkers = {}
> for file in files:
>     if multilinked(file):
>         if inode(file) in multilinkers:
>             link(multilinkers[inode(file)], newname(file))
>         else:
>             copy(file, newname(file))
>             multilinkers[inode(file)] = newname(file)
> This should end up using < 1GiB of VM even on pretty big filesystems,
> which should be well worth the overhead for faster sync on modern
> servers...

I can't imagine that this would be  faster than just using rsync.

You could copy each image one by one, which might be quicker:
(untested :-)

cd $oldvault
previous="name of 1st available image"
rsync -aH --numeric-ids $previous/ $newvault/$previous/
for i in *; do
    if [ "$i" != "$previous" -a "$i" != "dirvish" ]; then # skip 1st one and dirvish dir
        rsync -aH --numeric-ids --link-dest=../$previous/ $i/ $newvault/$i/


