[Dirvish] Large amount of data

Paul Slootman paul at debian.org
Thu Dec 8 05:03:46 EST 2005


On Wed 07 Dec 2005, Steve Ramage wrote:

> iirc. He said he had no luck with rsync and that it pukes, but my
> understanding is that dirvish locally (over NFS mount anyway) doesn't
> actually use rsync and so I was curious how well dirvish would preform

Wrong (as already explained).

> in these cases. I assume that if its just a few HUGE files it would be
> okay, but any changes would result in HUGE overhead. But I don't think
> rsync would puke over a few HUGE files. How well can dirvish handle
> millions of files (lets say).

I use dirvish daily to backup millions of image files. These are
distributed over 90 directories: 10, 11, .., 98, 99
I just checked the '10' directory, this is the tail from the dirvish
log:

Number of files: 3780276
Number of files transferred: 24
Total file size: 32779088204 bytes
Total transferred file size: 258631 bytes
Literal data: 258631 bytes
Matched data: 0 bytes
File list size: 95743818
File list generation time: 9211.475 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 2717
Total bytes received: 95998267

sent 2717 bytes  received 95998267 bytes  8369.38 bytes/sec
total size is 32779088204  speedup is 341.45

It takes a long time, but the job gets done.

If you can split the transfer up into subdirectories (like I did), then
it's easier for rsync to handle the memory load for the list of files
(which is the problem with rsync when handling many files). As long as
there will be no hard links across these subdirectories, then it's not a
problem to split it up.


Paul Slootman


More information about the Dirvish mailing list