[Dirvish] hardlink command

Loren M. Lang lorenl at alzatex.com
Thu Sep 12 03:07:00 UTC 2013

On 9/10/2013 10:27 AM, Vortex wrote:
> On 06.09.2013 11:17, Dave Howorth wrote:
>> I suppose that it should not be necessary to run hardlink after dirvish,
>> in theory. dirvish uses rsync and instructs it to make hard links
>> between the backups. Any dupes in the original data are better fixed by
>> running hardlink or similar on the original data, not the backup.
> I suppose that dirvish only links identical files across images, NOT
> multiples inside the same image. For that, hardlink may be useful.

Yes, dirvish/rsync will not hardlink duplicate files within an image as
it's a bit more complicated than that. When hardlinking first happens
across images, it's because they start 100% identical before the sync
part of rsync happens. If I understand rsync correctly, any data or
metadata change will break that hardlink with a complete, independent
copy of the current version of the file. That means, IIRC, that even an
mtime change could cause a duplicate file while the contents may still
be identical. Even if mtime doesn't do it, a change in
permissions/ownership surely will regardless if the file contents are

So, here's the big question that needs to be answered. Does hardlink(1)
check for metadata differences? And if it does, how does it determine
which version to keep? As both hardlinks to the same file share the same
inode, they also share the same metadata (permission, times, ownerships,
etc.) A good backup solution should preserve that metadata for every
successful image.

There is also three levels of duplication I can see. One is duplication
of files on a single filesystem. If there are duplicated files on a
server, they should be hard-linked on the original filesystem which will
then transfer to dirvish/rsync automatically, but that can only be done
if it's acceptable to have the same metadata. It doesn't work if they
have different ownership, for example, due to some kind of per-user jail
that is being done.

The second is duplication between images. Dirvish/rsync should handle
this automatically and only create duplicate copies of data when there
is a metadata change.

The third duplication is between vaults due to identical software
installed on multiple servers. This will result in duplication when
files are changed or added, but can be squashed post-rsync with a
command like hardlink(1). But again, this will squash metadata to one
version. While permissions will probably be the same, times may not be.
Is it OK to squash this information?

> Cheers
> V.
> _______________________________________________
> Dirvish mailing list
> Dirvish at dirvish.org
> http://www.dirvish.org/mailman/listinfo/dirvish

Loren M. Lang
lorenl at alzatex.com

Public Key: ftp://ftp.tallye.com/pub/lorenl_pubkey.asc
Fingerprint: 10A0 7AE2 DAF5 4780 888A  3FA4 DCEE BB39 7654 DE5B

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.dirvish.org/pipermail/dirvish/attachments/20130911/c7801306/attachment.html 

More information about the Dirvish mailing list