[Dirvish] dirvish and weeding out duplicates

Paul Slootman paul at debian.org
Tue Nov 22 11:53:20 EST 2005

On Thu 17 Nov 2005, Shane R. Spencer wrote:
> On Thu, 2005-11-17 at 11:32 +0100, Paul Slootman wrote:
> > 
> > Basically it's a way of quickly finding a file with a given md5sum.

> This is the kind of concept I like :) I suppose reading and md5sum dir
> full of inodes and filenames would be just as easy as reading an
> flatfile.. and hopefully less corruptable.

As the files in my situation should never change without changing their
filename (i.e. the files are never updated in place), this should stay

> And you run this per day I take it.. skipping files with inodes that
> exist in the md5sums dir.

Yes, and that skipping was an optimization I hadn't done yet at that
time. Now I have, and it's a lot quicker this way :)  I should probably
schedule a job to check that the list of linked inodes is still

Paul Slootman

