Home Fileserver: ZFS boot pool recovery

If you should be unlucky enough to be unable to boot your OpenSolaris NAS one day, then these notes taken from a real restoration test might help you get back up and running again quickly.

After setting up a supposedly robust mirrored ZFS root boot pool here using two SSD drives, I decided to give it a system test by completely destroying the boot pool. I did this in order to understand and verify the process of restoring a boot environment from scratch, and I have documented this process here in case I need it one day, and you’re welcome to use it too if you need it. 🙂

Backup

Before I deliberately zap my OS boot environment, I’m going to back it up to a remote machine by following the steps detailed here under the ‘Archiving streams of the file systems in the existing root pool’ section.

Destruction

Once the boot environment was backed up, it was time to create the disaster. I don’t take any responsibility for any loss you may incur by doing this, but all I can say is that it worked for me.

Boot the OpenSolaris 2009.06 installation CD, select keyboard and language, get to the desktop, and then su to root using ‘opensolaris’ as the password. Then destroy the root boot pool and remove all partitions on the drives it contained:

# zpool import -f rpool
# zpool destroy rpool
# format

In the format command, I selected the disk I want to zap, selected ‘fdisk’ and deleted the single partition, and exited after saving partition changes.

As I had a redundant mirror-based boot pool, I repeated this for both drives, just to make sure the OS had gone. I could have formatted them too if I was being really professional. 🙂

Construction

Boot the OpenSolaris 2009.06 Live CD.

When you reach the desktop, double-click on the Install icon found on the desktop to kick off the installation process.

In the installation program, select the disk to install onto, and in my case this was the first of the two disks I would use in the mirror. Select the whole disk as the installation target. Reboot when finished, and leave the OpenSolaris Live CD in the drive.

After reboot, su to root and restore the backed-up root boot pool filesystem stream to the local root boot pool:

# zpool import -f rpool
# mount -F nfs 192.168.0.45:/backup/snaps /mnt
# cat /mnt/rpool.recursive.20090827 | zfs receive -Fdu rpool

Create a single 100% solaris partition on drive 2 that will be used for the other half of the mirror by running the format command, selecting the second drive to be used in the boot mirror, select fdisk, create a default 100% solaris system partition, and quit out of the format command back to the command line.

Now we need to create the geometry for the second drive to match that of the formatted first drive. Note the ids of the drives you have and substitute them for the ones shown in the following line:

# /usr/sbin/prtvtoc /dev/rdsk/c11t6d0s2 | /usr/sbin/fmthard -s - /dev/rdsk/c11t7d0s2
fmthard:  New volume table of contents now in place.

Now, very importantly, set the boot file system to use the ‘be2’ boot environment, or whichever one you used:

# zpool set bootfs=rpool/ROOT/be2 rpool

Now let’s attach the second drive to the first drive to form the boot pool mirror:

# zpool attach -f rpool c11t6d0s0 c11t7d0s0
Please be sure to invoke installgrub(1M) to make 'c11t7d0s0' bootable.

Now install the GRUB bootloader to the second SSD so that the BIOS can boot the second drive in the event that the first becomes unreadable (don’t forget to enable the two SSDs in the BIOS as bootable drives afterwards!):

# installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c11t7d0s0
Updating master boot sector destroys existing boot managers (if any).
continue (y/n)?y
stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 271 sectors starting at 50 (abs 16115)
stage1 written to master boot sector

Now inspect results:

# zpool list rpool
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool  29.8G  15.1G  14.7G    50%  ONLINE  -
#
# zfs list -r rpool
NAME                     USED  AVAIL  REFER  MOUNTPOINT
rpool                   17.1G  12.2G  81.5K  /rpool
rpool/ROOT              12.4G  12.2G    19K  legacy
rpool/ROOT/be2          30.6M  12.2G  6.95G  /
rpool/ROOT/be3          9.55G  12.2G  7.02G  /
rpool/ROOT/opensolaris  2.82G  12.2G  2.82G  /
rpool/dump              2.00G  12.2G  2.00G  -
rpool/export             578M  12.2G    21K  /export
rpool/export/home        578M  12.2G   527M  /export/home
rpool/swap              2.10G  14.2G   101M  -

Check the snapshots restored from the stream:

# zfs list -t snapshot
NAME                                  USED  AVAIL  REFER  MOUNTPOINT
rpool@20090829-migrated              23.5K      -  81.5K  -
rpool@20090830                           0      -  81.5K  -
rpool/ROOT@20090829-migrated             0      -    19K  -
rpool/ROOT@20090830                      0      -    19K  -
rpool/ROOT/be2@20090830                  0      -  6.95G  -
rpool/ROOT/be3@20090829-migrated     58.9M      -  6.14G  -
rpool/ROOT/be3@2009-08-29-23:45:50   44.9M      -  6.95G  -
rpool/ROOT/be3@20090830                  0      -  7.02G  -
rpool/ROOT/opensolaris@install       3.10M      -  2.82G  -
rpool/dump@20090829-migrated           16K      -  2.00G  -
rpool/dump@20090830                      0      -  2.00G  -
rpool/export@20090829-migrated         16K      -    21K  -
rpool/export@20090830                  16K      -    21K  -
rpool/export/home@20090829-migrated  51.1M      -   485M  -
rpool/export/home@20090830               0      -   527M  -
rpool/swap@20090829-migrated             0      -   101M  -
rpool/swap@20090830                      0      -   101M  -

Now you should be able to reboot and successfully boot your previously backed-up OS boot environment. So the key here is to ensure that you have regular backups of your boot environments so that you can restore when you need to.

This post was made from notes made during the process of backing-up, destroying and restoring my NAS’s root boot pool file systems. Let me know if you find any errors with a comment below.

For more ZFS Home Fileserver articles see here: A Home Fileserver using ZFS. Alternatively, see related articles in the following categories: ZFS, Storage, Fileservers, NAS.

Join the conversation

4 Comments

  1. Just wanted to point out that instead of reinstalling, then recovering over the top of the new rpool, you can prepare the root disk using the livecd and restore directly. I assume the whole disk being prepared will be allocated to the rpool.

    * It may be necessary to remove traces of the existing rpool. I found that ‘zpool import’ was discovering phantom rpools. To get around this, I created a new pool from the vdevs ‘zpool import’ claimed were members of these phantom rpools, then destroyed the newly created pool.

    * Partition the disk. Restore the partition table from backup, if you have it. Otherwise, use format/fdisk to create a whole-disk Solaris partition. Make sure the partition table is type SMI, not EFI, for the boot disk (format -e). Make sure this partition is active.

    * Use format/partition to setup the slices. Make sure the first slice (0) is set to type “root” and starts at cylinder 1, to the maximum cylinder. Slice 2 (whole-disk) and 8 (boot) should already exist by default.

    * Create a new rpool with zpool, as per your instructions.

    * Restore zfs backup stream, as per your instructions.

    * Set the pool root, as per your instructions.

    * Reinstall grub, as per your instructions.

    * Update the boot archive (may not be necessary):
    zfs set mountpoint=/tmp/a rpool/ROOT/opensolaris
    mkdir /tmp/a
    zfs mount rpool/ROOT/opensolaris
    bootadm image-update -R /tmp/a
    umount /tmp/a
    zfs set mountpoint=/ rpool/ROOT/opensolaris

  2. I am hoping you or one of your readers might be able to help with a non booting openindiana server. Actually I will offer to pay for hopefully some basic support on a non booting server after power failure If any one who can help with this issue is listening.

    I have a storage server that has
    1 boot drive
    a zpool with some cache drives
    some extra drives — 15 total

    After power failure I get to the server and the openindiana logo is in the bottom right of the screen, and the keyboard is hard locked the machine does not ping, I power cycle and immediately after the grub option is chosen, the machine locks up in the same fashion with the openindina logo in the bottom right, I tried modifying the grub options with a -s after the kernel but this did not help.

    It is late thursday night Central US time. If someone could help me within 24 hours I would be happy to pay at least $100 or donate to their favorite project, as well as post the solution on this page.

    I booted to a new openindiana cd and ran zpool import f rpool to and it imported – verified by zpool status

Leave a comment

Your email address will not be published. Required fields are marked *