[Dirvish] How big of a file can be rsync'ed?

Paul Slootman paul at debian.org
Fri Apr 16 10:51:36 UTC 2010


On Thu 15 Apr 2010, Richard wrote:

> > wouldn't it be possible to touch the file to update the modification
> > time prior to running dirvish? 
> I'll have to try something like that.  I avoided that because I didn't 
> want to update the entire file.

Eh, you will *always* have to update the entire file! I would be majorly
pissed off if rsync decided to e.g. only update the first half.
Even if only one byte of a 500GB file is changed, rsync will still have
to update the entire file. Of course, its delta algorithm will prevent
it from *transferring* the entire file.

> Speaking of updating the entire file, I found a major-major gotcha.  
> When rsync copies a file without leaving the local system, it NEVER does 
> a delta copy.  In its infinite knowledge and wisdom it just does a 
> straight copy-replace.  When backing up a local filesystem to a local 

That's by design and a very good design decision it is too.

To use the delta algorithm rsync needs the entire source file *and*
destination file to be read to determine what blocks need updating.
Then, while creating the new version, it will read the destination file
*again* to extract those blocks that haven't changed.
In short, there is *way* less IO load by simply doing a local copy from
the source to the destination.

Rsync is designed to decrease network traffic at the expense of more IO.
When the additional IO makes no sense (ie. there is no network traffic)
then the delta algorithm is disabled.

> encrypted dirvish vault, it is important that the vault be set to a 
> remote ip address such as *cough* 127.0.0.1 so that rsync will use 
> deltas. I saw a reduction from 30gig a night of data change to 6 gig a 
> night of delta changes. (1/5th the size)

How did you measure this?  Did you also check how the total duration of
the run was influenced?

I would never contemplate forcing rsync into delta mode by using
127.0.0.1 as the "remote" IP.


Paul


More information about the Dirvish mailing list