keithl at kl-ic.com
Sat Mar 5 18:34:40 PST 2005
> Keith Lofstrom said:
... about using Bill Stearn's "freedups" to locate identical files...
> This looks useful. The quality of the software is unknown, and it
> could use some documentation (limited to a README and the option
> --usage). I will make a dd copy of a backup drive and see if this
> works over the next few days.
On Sat, Mar 05, 2005 at 08:34:28PM -0500, Jason Boxman wrote:
> It seems to keep a structure in memory with md5s and inodes. I ran it
> against my vault with has over a million hard links and likely at least
> 200,000 files. It seemed to choke rather badly.
Sigh. Jason, thanks for being courageous (and saving me the
time)! A response on the VaultBranch wiki page is called for;
do you want to do it, or shall I? I would probably refactor
the information to a FreeDups page, anyway.
I can write a nice email back to the author (assuming he is
the source of the comment) asking him to consider a hash table
on disk or something. Still, even with an on-disk table, he
will have to traverse the directories of the whole dirvish
vault, and that could take a VERY long time!
I wonder, though, if a specialized version of freedups could be
designed to join two sets of similarly-named images (reducing the
search time)? This would have two applications: (1) Healing a
large set of branches after a major multiple-machine upgrade, and
(2) fixing the kind of problem Steve Ramage had when he accidentally
created a second image set. Even for that, freedups2 would need an
on-disk table! Well, that is Mr. Stearn's problem.
Keith Lofstrom keithl at keithl.com Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs
More information about the Dirvish