Hi everyone, today I will tell how I restored a defunct LVM thinpool. Unfortunately I could not find any howtos or manuals on the internet, so maybe this one will help someone in a similar situation.
So what do we have:
Baremetal server with hardware raid contrioller.
10 TB RAID 6.
LVM thin pool on top of that RAID 6.
8GB thin LV in the pool, with around 4TB of data in it.
around 14 thin LV snapshots in the thinpool itself as backups.
now for the real fun part: no external backup of the data (the customer was aware that it is unsafe but willing to take that risk)
no physical access to the baremetal server, only iDRAC
The server was running just fine until during a regular maintenance window server got rebooted to run with a newer kernel. After it didn't come back online after 5 minutes the first bells began to ring. I got shell access through iDRAC and found out that the server was not booting because the thin LV could not be mounted. So first thing I did was comment out the thin LV mount in the fstab to get the server to boot. After the server booted, I stopped and disabled all services, then I run lvs -a
and saw following:
thinpool vg0 twi--otz-- <10.40t
[thinpool_tdata] vg0 Twi--o---- <10.40t
[thinpool_tmeta] vg0 ewi--o---- 12.40g
Next I tried to activate the thinpool with lvchange -ay /dev/vg0/thinpool
with no luck. It just threw an Check of pool vg0/thinpool failed (status:1). Manual repair required!
Next thing I tried was to repair the thinpool with lvconvert --repair vg0/thinpool
. Unfortunately the command also failed with transaction_manager::new_block() couldn't allocate new block
That error message was the last clue that the metadata pool was completely broken to say it mildly. So next thing we did was dump the whole disk image with dd to another server to run experiments on the dump.
We created a new kvm VM with libvirtd and mounted the disk image as a HDD into it.
yum install -y qemu-kvm libvirt libvirt-python libguestfs-tools virt-install cloud-utils
wget https://cloud.centos.org/altarch/7/images/CentOS-7-x86_64-GenericCloud-2111.qcow2
cat << EOF > config.yaml
users:
- name: test
groups: wheel
lock_passwd: false
# password: test
passwd: $6$aTZ.WMuCtdIWDORe$nsSOunFW07/JrJ/SjFY9jZcQRbI5m.7woazjUFtLhmUr1AZxfnUloDyl.EjEOEPp2cqn.jiGBt88iN3EdTZvh/
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
EOF
cloud-localds config.iso config.yaml
virt-install --name CentOS7 --memory 2048 --vcpus 1 --disk CentOS-7-x86_64-GenericCloud-2111.qcow2,bus=sata --disk config.iso,device=cdrom --import --os-variant centos7.0 --network default --virt-type kvm --graphics none
virsh attach-disk CentOS7 /mnt/dump.img sdb
virsh console CentOS7
On the new VM I tried various approaches to dump and restore the metadata pool with commands like thin_dump --repair
, thin_restore
or lvextend --poolmetadatasize +1G vg0/thinpool
. But none of this approaches would work.
Then I started looking into the structure of the thinpool metadata dump file. Apparently it is just a bunch of disk block mappings:
<superblock uuid="" time="26120" transaction="52225" flags="0" version="2" data_block_size="128" nr_data_blocks="174406720">
<device dev_id="1" mapped_blocks="76890251" transaction="0" creation_time="0" snap_time="26120">
<range_mapping origin_begin="0" data_begin="1568050" length="7" time="26120"/>
<single_mapping origin_block="7" data_block="2627015" time="25868"/>
<single_mapping origin_block="13" data_block="102196995" time="25719"/>
...
After looking at the thin_dump --help output I realized the mappings can be skipped:
Usage: thin_dump [options] {device|file}
Options:
{-h|--help}
{-f|--format} {xml|human_readable|custom}
{-r|--repair}
{-m|--metadata-snap} [block#]
{-o <xml file>}
{--dev-id} <dev-id>
{--skip-mappings}
{-V|--version}
After running the thin_dump --skip-mappings I got following structure:
<superblock uuid="" time="26120" transaction="52225" flags="0" version="2" data_block_size="128" nr_data_blocks="174406720">
<device dev_id="1" mapped_blocks="76890251" transaction="0" creation_time="0" snap_time="25951">
</device>
<device dev_id="15" mapped_blocks="76283161" transaction="51865" creation_time="25951" snap_time="25951">
</device>
...
But the only interesting device for us is the device with dev_id=1, so I try to dump it:
lvchange -ay vg0/thinpool_meta0
thin_dump --dev-id 1 /dev/vg0/thinpool_meta0 > metadata.xml
Next I tried to create an empty LV and restore the metadata.xml into it and it worked!
Now we need to restore the metadata.xml into the thinpool_tmeta LV, but if you activate it, it is activated only as read-only LV. I found one workaround here. Apparently you can just swap the metadata LV with a normal LV, activate it as read-write and manipulate it, which I did:
# create some tmp LV
lvcreate -L2 -n tmp vg0
# swap tmp LV with tmeta of inactive thin-pool
lvconvert --thinpool vg0/thinpool --poolmetadata tmp
# active 'tmp' LV now with content of _tmeta
lvchange -ay vg0/tmp
# restore metadata
thin_restore -i metadata.xml -o /dev/vg0/tmp
# thin check it
thin_check /dev/vg0/tmp
# deactivate
lvchange -an vg0/tmp
Now we have a healthy thinpool metadata, we can swap the LVs back and try to activate the thinpool. The command to swap the thinpool metadata and tmp LV is tricky because it is the same:
lvconvert --thinpool vg0/thinpool --poolmetadata tmp
lvchange -ay vg0/thinpool
Now we have an activated thinpool, so we can activate the thin data LV, run fsck on it and activate the rest of the services. PROFIT.
PS: always use external backups, even if the customer says its fine to not have one, in the end he will come to you asking you to repair the broken server and you might be not that lucky to have an healthy metadata dump of at least the data LV.