[Dirvish] Too much disk space for images

foner-dirvish at media.mit.edu foner-dirvish at media.mit.edu
Fri Oct 2 19:21:10 UTC 2009

    > Date: Fri, 02 Oct 2009 15:02:06 -0400
    > From: Jason Boxman <jasonb at edseek.com>

    > Yes, I've been using it for years.  It's rather nice.  If you have many 
    > files, it eats up tons of space in /tmp for sort's temporary files.  I 
    > had to limit it to running against the next most recent snapshot since 
    > it runs nightly and limit it to checking for duplicates only per vault, 
    > not bank wide.

You can also either (a) put its tmp elsewhere if you have space
elsewhere (since it allows you to pass args to sort, and -T is useful
there) or (b) tell it to only examine files larger than a particular
cutoff, which saves enormous amounts of time---after all, they're
probably exponentially distributed, and trying only the largest ones
will give you almost all of the benefit, and (c) you can use
transitivity, checking across all vaults at a given date and then
doing the latest pair of backups vault-by-vault (though that spends
a lot of time rescanning dirs "vertically" in the run you just did
"horizontally", so so speak).

c.f. http://www.dirvish.org/pipermail/dirvish/2005-November/000525.html
for some benchmarking on how (b) works out for my particular workload.

