[Dirvish] A few small bugs

Paul Slootman paul at debian.org
Wed Nov 23 06:27:41 EST 2005


On Tue 22 Nov 2005, foner-dirvish at media.mit.edu wrote:
> 
> (2) When requesting logfile and index compression, it looks like
>     dirvish creates both in their uncompressed form, and -then-
>     compresses them.  This means, for example, that in my runs over a
>     filesystem of about two million files, dirvish first has to write
>     out a half-gig file, and -then- compresses it.  This (a) makes it
>     more likely that the vault might run out of space, and (b) is very
>     slow, because the disk must thrash all over the place doing the
>     find while simultaneously writing this enormous file, and must
>     then thrash some more while reading this enormous (hence uncached)
>     file and then writing out its compressed version.  If compression
>     is requested, it should happen in a pipe in between the find
>     that's generating the data and the disk; this is presumably a
>     one-line change.  Doing so means that most of this data never hits
>     the disk in the first place and thus speeds things up enormously,
>     since the actual file compresses by about 95%.

For the log file I don't think that compressing on the fly is the smart
thing to do; if something goes wrong and the process gets killed, you
lose a lot of logging.

For the index it's certainly worth doing.


Paul Slootman


More information about the Dirvish mailing list