[Dirvish] A few small bugs

foner-dirvish at media.mit.edu foner-dirvish at media.mit.edu
Tue Nov 22 23:15:21 EST 2005

[These are all relative to version 1.2; I'm not using 1.3.1 since it's
currently marked experimental, and I'm not using 1.2.1 because 1.2 is
what Ubuntu Breezy packaged up for me and I don't know if it's worth
special-casing my system to not use the prepackaged version for
whatever might be in 1.2.1.  See below.]


(0) I just noticed that I can't seem to find any bug-tracking database
    for dirvish, so I'm sending mail here instead.  It might be nice
    to set up something like this.  (Or we can hope that the bug
    volume continues to stay so low that tracking them isn't worth the
    bother... :)  (Dunno if this will stay the same when 1.3.1 becomes
(1) There aren't any easy-to-find changelogs between releases.  It
    looks like one has to download the new source, unpack it, and look
    in its CHANGELOG file, which only lists CVS/SVN file versions;
    -then- one has to go into the actual tree to figure out what
    actually changed.  That's way too much work, especially if a
    change impacts multiple files but could be described via a
    one-liner.  Since I'm running 1.2 but the current version out is
    1.2.1, it's hard for me to even guess if the bugs I'm actually
    trying to report are already fixed, though I'm betting they aren't.
    It'd be nice for a real NEWS file to exist, which lists user-visible
    changes, and for that file to be readable directly at dirvish.com
    without having to download the entire source tarball to get it.

Actual bugs:

(2) When requesting logfile and index compression, it looks like
    dirvish creates both in their uncompressed form, and -then-
    compresses them.  This means, for example, that in my runs over a
    filesystem of about two million files, dirvish first has to write
    out a half-gig file, and -then- compresses it.  This (a) makes it
    more likely that the vault might run out of space, and (b) is very
    slow, because the disk must thrash all over the place doing the
    find while simultaneously writing this enormous file, and must
    then thrash some more while reading this enormous (hence uncached)
    file and then writing out its compressed version.  If compression
    is requested, it should happen in a pipe in between the find
    that's generating the data and the disk; this is presumably a
    one-line change.  Doing so means that most of this data never hits
    the disk in the first place and thus speeds things up enormously,
    since the actual file compresses by about 95%.
(3) There's no easy way to specify arguments to the compressor, e.g.,
    to say "--best" or "--fast" to gzip.  Currently, I make sure to
    set the appropriate environment variable before the dirvish script
    runs, but it'd be cleaner to just allow passing this in as an
    argument, in the same way that args to find are allowed---this
    centralizes all the configuration into the dirvish.conf file
    instead of spreading it out across various scripts, and also
    allows adding arbitrary other args in case the compressor might
    take command-line options that it doesn't know to look for in
    environment variables.


More information about the Dirvish mailing list