[Dirvish] How I restored a laptop
keithl at kl-ic.com
Sat Jun 24 05:45:30 UTC 2006
This is not a request for help, but a description of how I restored
a laptop hard drive - the hard way. I thought it would be educational.
This morning, just before she left for work, my wife's Thinkpad
T30 started making a loud click-click, click-click noise, the sound
of a hard drive that has lost control of the servo.
It was unresponsive, so I immediately shut it down, let it cool,
and rebooted. Click-click again. Oh joy, time to do a drive
replacement and a restore on dirvish.
I have an identical spare drive, an IBM (now Hitachi) Travelstar
40GB IC25N040ATCS04. It is good to keep an identical spare, so
the partition table copies exactly. I had used it as a "dd spare"
before ( on the road, I bring along the spare drive and an
ultra-bay drive holder, and use "dd if=/dev/hda of=/dev/hdc bs=1M"
for overnight backups).
I decided to use that drive to do an "almost complete" restore
using a Knoppix disk and rsync. I could have put the drive into
a Vipower swap cage, plugged it into in my backup server, and
done a disk copy restore, as I have done before. However, I
decided instead to try a restore over the network, to see how
that worked out, and so I could share the results. I made some
mistakes and it took far too long, but perhaps you can learn
from my mistakes.
The first problem: I rotate backup target disks, approximately
daily. The most recent drive was "J", and the drive from the day
before was "K". When I went to restore the root partition from J,
I realized I had misconfigured dirvish on that drive to back up
"/root" instead of "/". Not very useful. Fortunately, drive "K"
had a proper configuration backing up "/", so I had a 36 hour old
root partition. That doesn't change very fast, fortunately. All
the rest of the partitions were fine.
So, boot the laptop from a live CD, first with Ubuntu 5.10 .
Oops, no sshd on Ubuntu. I will be needing sshd, so back to good
old Knoppix (I used an available 3.9 disk). Knoppix makes a
slightly better recovery disk than Ubuntu; Knoppix STD might
be even better, but I don't know that one very well. Booted,
I do an "su - root" in a terminal window and start building
the new drive. I decide to cheat and keep the old partitions,
so I can keep the GRUB setup and save some time.
I made a second blunder when I rebuilt the partitions using
straight "mkfs" instead of the proper "mkfs -t ext3" or
"mkfs.ext3". Since the system is configured for ext3, this leads
to problems later on when I try to boot, specifically these errors:
ext3: No journal on filesystem on ide0(3,7)
mount: error 22 mounting ext3
Kernel panic: No init found, try ...
I encountered this problem later in this narrative, but the
"mkfs.ext3" should have happened at this point. If you make
the same blunder, you can fix it with a "tune2fs -j /dev/hdaXX"
for each partition, changing ext2 to ext3.
So, at this point I have a drive ready for some restores, with
empty filesystems. However, my "pull" backup server must "push"
the data back onto the laptop drive; I cannot get at it to
"pull" files from the laptop. But I don't have sshd configured
for the laptop yet, so I can't do a server push to it. So, I scp
the files in /backup/.../*0622*/tree/etc/ssh into a temporary
directory on a third machine, then use:
scp third:/tmp/sshfiles /etc/ssh/
on the laptop to move the files into the /etc directory in
Knoppix. My DHCP server assigns the laptop a fixed address on
the network and a DNS name. Since I was going to be using root a
lot, I set the root password on Knoppix to "a". Ugly but quick, I
assumed my internal network was secure for the duration of this
procedure. At the end of all this, I could ssh in from the backup
Ready to start moving files. I set up a directory "/a" on the
Knoppix laptop, and mounted the /dev/hda7 root partition on it.
On the backup server, I did a cd to the laptop root tree directory
on the backup drive, and used: "rsync -axc * laptop:/a/" to copy
the files to the root partition on the laptop.
Unfortunately, partway through the procedure, the network failed.
I had to do an "ifconfig eth1 down; ifconfig eth1 up" on the
laptop a few times before the files were moved. The laptop was
connected on the other side of 3 linksys switches from the server;
I moved it to the same switch and the problems seemed to go away.
It might have been associated with Knoppix also, because I can
rsync nightly backups through 3 switches without problems.
I repeated the mounts and rsyncs for the /boot, /var, and /usr
partitions. After each was complete, I did an "ls -R | wc"
and a "du -bxs ." for each partition, and compared results to
the same operations on the backup drive. There were some
differences associated with not having a "lost&found" on the
backups, but otherwise they were file count and byte count
identical (after fixing the ext3 problem). I did a "mkswap" on
the swap partition, and a "mkdir /proc" and a "mkdir /initrd".
I finally unmounted the new partitions, and did "fsck -f" on all
I did not copy my big partitions /home and /opt; instead I
rebooted from the partially restored hard drive. After fixing
the ext2/ext3 problems noted before, the laptop came up, and
I logged in as root. I could now finish the restore using the
laptop OS rather than slow Knoppix. This also meant that I did
not have to enter a password each time I did an rsync. I wrote
a shell script to do these last two rsync restores, and about
two hours later all the remaining files were moved. I took the
two hours off to go to a movie ("United 93" - makes restoring
a hard disk quite a trivial problem indeed).
After doing this, I encountered a third blunder - I had improperly
configured the .gnome files to look for setups and icons in
/home/keithl rather than in /root ; I could not unmount /home
while X was running as root. I fixed that. I did the ls and
du again. All the files were transferred. I unmounted /home
and /opt, did an "fsck -f" on them, and rebooted the machine.
It looks like the machine is back where it was; I will let
my wife decide. She just got home, and has email to look at!
I hope this helps you, if you ever need to restore over the
network. After I get some feedback from you folks, I will copy
this to the wiki so the rest of you can improve the procedure.
Yes, we need a more automated approach. Who wants to write
Keith Lofstrom keithl at keithl.com Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs
More information about the Dirvish