Ikke's Blog

Archives for: February 2007

Feb 21

Recovering a crashed LVM2 PV harddrive

2 weeks ago, due to a power failure, a harddrive in a (UPS protected) server at VTK crashed. Unable to boot the machine, unrecoverable IDE errors,...
The machine was only used as a database server for now (both MySQL and PostgreSQL), and did not make external backups (yet). Yeah I know, stupid, but managing our network isn't that easy.

Anyway, the harddrive consisted of 3 partitions:

/boot, ext3
/, ext3
an LVM2 PV

Both /boot and / were not of any use (except maybe some files in /etc), but the database files inside the LVM2 VG were pretty important, so they had to be recovered.

Do note all 3 partitions were the first part of a RAID1 volume, a second disk still had to be added.

Here's the recovery procedure, could be useful for some people out there:

Get the harddrive out of the machine (other parts of it might be broken too, we want to use other hardware)
Put the disk in a second machine, including another harddrive, big enough to contain images of all recovered partitions. This means this hard drive might need to be bigger than the original one!
Run Linux on the machine. I had no Linux box around, so I used Knoppix (as I knew it had all tools I need)
Make sure you got dd_rescue on the box (Knoppix got it), and dd_rhelp. In Knoppix, just download the tarball, extract it, run ./configure && make && sudo make install
Become root: sudo su -
Mount the new harddrive, first creating partitions/filesystems as necessary. I mounted it on /mnt/sda1
cd /mnt/sda1
dd_rhelp /dev/hda4 hda4-lvm-pv.img
(where hda4 is the PV's partition)
This can take an enormous amount of time, get some coffee, sleep, a life, whatever
When the process is done, poweroff the machine (in a clean way). At this point, the broken harddrive was even more broken: fdisk -l /dev/hda didnt even show hda4 any longer.
Take the broken harddrive out of the machine, and make a backup of the image you created, on another harddrive, a network share,... I used another harddrive.
Remove the hard drive on which you made the backup copy from the system (just in case).
Now real recovery can start. Boot the machine, mount the partition on which you made the image.
From this point, all depends on how broken the disk was. As only some minor part of the 120GB PV contained important data, I was able to recover everything necessary.
Export the image as if it's a block device:
losetup /dev/loop0 /mnt/sda1/hda4-lvm-pv.img
Now /dev/loop0 is just like the old /dev/hda4
As the image is taken from a RAID-volume part in this case, I had to remove the RAID signature to let the LVM2 tools recognize it as an LVM2 PV:
mdadm --zero-superblock /dev/loop0
Let the system search for LVM2 PVs:
pvscan
This should show you it found some PV
Activate the volume groups:
vgchange -a y
Now the old LV's should be back. If your VG used to be called "vg":
ls /dev/vg
fsck the necessary partitions:
fsck /dev/vg/var-lib-mysql
If this passes nicely, you're almost saved
mkdir /mnt/mysql
mount /dev/vg/var-lib-mysql /mnt/mysql
Now you can use the files in /mnt/mysql to regenerate all necessary services. As I needed the databases to be up and running ASAP, I did a quick Debian install, using MySQL 4.1 (which was also the version running on the crashed server), and moved the recovered mysql database files to the correct locations. Some "check table", "myisamchk", "myisamchk -r" and "repair table"s later, the databases were up and running nicely

That's about it... Obviously, everything should be adapted to your specific setup/situation. The MySQL recovery might not work at all when the MySQL versions and binary formats defer, or the files are damaged too much. See the application-specific documentation for recovery techniques.

The new harddrive we bought to put images on, crashed itself too after 5 hours of operation. Luckily it could be replaced ;-)

For the record: the first broken harddrive was a Maxtor, 120GB PATA (not bought by me), the second one a Seagate 320GB PATA (bought by me, although I wanted to get a WesternDigital, but there were no more in the shop, and I needed a drive quickly). This Seagate was then replace by a WD 320GB SATA2 (which forced me to buy another PCI SATA controller), which works fine. Now don't start a brand-war.

/me is pretty happy all data is back, after all... Now up to reinstalling some webservices that were running on the crashed server.

Ikke • Technology

Misc

XML Feeds

RSS 0.92: Posts Comments
RSS 1.0: Posts Comments
RSS 2.0: Posts Comments
Atom: Posts Comments

What is RSS?

Ikke's Blog

Archives for: February 2007

Categories

Archives

Who's Online?

Misc

XML Feeds