When I started using this ZFS NAS for additional tasks, I realised that my boot environment was becoming more important, and was a weak point in this system.
Initially I wanted to run a master copy of this website locally on this ZFS server, and that entailed installing WordPress and the OpenSolaris AMP package comprising of the Apache HTTP server, MySQL and PHP. Also, I enabled and configured the VNC server. All of this required a fair amount of configuration and I didn’t want to have to do it again in the event of drive failure. Once is enough, so I wanted to discover how to backup and restore my boot environment to protect my investment in time and painstaking configuration work.
Also, there were the users and groups. In addition, I intend to install a version control system like git or equivalent. And there will also be development tools including C, Java, Perl, PHP, Python, Ruby, SQLite etc.
When I damaged and rendered unbootable my previous Solaris boot drive here, although I had the system up and running in around 3 hours or so, it was a very simple system setup. Once I get all the software I want up and running on this system, it would take longer to restore the boot environment. However, with snapshots and zfs send / receive, we’ll see how to make archive backups of the boot environment.
Now the boot environment is becoming more complex and valuable, it’s time to consider setting up a redundant mirrored root boot environment using two similar sized drives.
There were a couple of other considerations: (1) I had run out of existing SATA ports on this motherboard, and (2) I didn’t want to use mechanical, vibrating, power-consuming, noisy and heat-producing drives for my boot pool that are mostly idle.
SATA controller card
Lack of free SATA ports meant acquiring a new SATA controller card, and there were only two real candidates, and both from a great company called SuperMicro. These have the advantages of supporting 8 SATA drives and being well supported in Solaris.
The first possibility was the 8-port AOC-SAT2-MV8 card which has a good Solaris driver and is well regarded, but the problem with it, from my perspective, is that it uses the PCI-X interface to the motherboard and this is only available on a small number of server motherboards and PCI-X seems be old technology now. It will run on my motherboard (Asus M2N-SLI Deluxe), but only in 32-bit mode using a standard PCI slot, which is slower and doesn’t use the card to its potential. This was a possibility but not optimal, and so not my first choice.
The second possibility was the 8-port AOC-USAS-L8i card based on the well-supported LSISAS1068E controller chip. At first glance this AOC-USAS-L8i card appeared to be exotic and unfamiliar hardware, so I dismissed it initially. The card has only two physical ports on it, but each of these two mini-SAS ports feed 4 SATA ports, thus the card supports 8 SATA connectors, and each SATA lane is at the full SATA II spec of 3 Gbits/sec. The other great advantage of this card is that it uses a much more modern, faster and commonly found PCIe interface (PCI Express), and it can auto-negotiate the operating speed according to the slot it finds itself in. Luckily I had a spare PCIe x16 slot free so this was the card for me. You need to get some special cables for it too, like these. These are about $10 or so, and have the disadvantage that they are harder to find on-sale than standard SATA cables, but they have the big advantage that they lock into place on the card, and so don’t come loose. After reading of other Solaris users having success with this card I bought one and it seems to be working well now.
New mirrored SSD boot drives
I thought I’d try out a couple of the smallest and cheapest MLC-based Solid State Drives that have started to become more affordable for non-enterprise users. OCZ was a name that started appearing everywhere, and I saw they made a good range of different budget-priced SSDs. After reading a few articles like this excellent one from AnandTech: The SSD Anthology: Understanding SSDs and New Drives from OCZ, I decided to avoid the absolute cheapest OCZ drives based on the JMicron JMF602B controller as this controller has well-documented patchy performance problems. However for a small additional premium, I found the OCZ Vertex series of SSDs, based on the far superior Indilinx Barefoot controller, and the 30GB model looked like it should offer sufficient capacity for a ZFS root boot pool supporting a number of Boot Environment versions: snapshotted GRUB-bootable versions of the OS code, that allow you to boot different versions of the OS, and roll back if an upgrade fails, back to a known working version. As they are snapshots, you may also use ‘zfs send’ to send a stream of the OS boot environment to archive storage for restoration in the event of complete loss of your boot drives. Very cool, as you will see later!
I saw that OCZ had just recently announced the OCZ Vertex Turbo range of SSDs which gave maximum read speeds of around 240 MBytes/sec and maximum write speeds of around 145 MBytes/sec, or 100 MBytes/sec sustained, which is pretty good. The firmware currently is at level 1.0 which should be slight cause for concern, but these devices are an evolutionary update from the original Vertex series which has good firmware, and they have added a 64 MByte cache. I believe the version 1.0 of the Vertex Turbo series of SSDs provides TRIM support, a technology used to restore write speed to previously written-to MLC memory cells by doing a kind of garbage-collect sweep through the cells. This TRIM process seems to be a collaboration between the storage host’s (1) OS, (2) file system, and (3) the SSD’s controller firmware, although I’m new to this so I may have made mistakes here. Even if the OS and file system are not TRIM-aware, then there is a utility you can use from OCZ which can be used manually if required. As these SSDs are mostly read-only for the purposes of booting the OS and loading supporting files, write-speed degradation associated with MLC SSDs shouldn’t be too much of an issue.
So I bought two 30GB OCZ Vertex Turbo SSDs, and a Scythe 2.5″ Twin Mounter to fit the two SSDs into a standard 3.5″ drive bay. Actually I had run out of 3.5″ drive slots in this case, so I screwed this twin mounter onto an existing 3.5″ to 5.25″ drive adapter to enable these two 2.5″ SSDs to be mounted in a 5.25″ drive bay. It looks a bit silly, but it does the job. One problem I encountered was that because the SSDs are so close to each other, the power and SATA connectors put a lot of strain on the PCBs in the SSDs by bending them apart. Must find long-term solution to this later…
Preparing for migration of existing root pool
I had previously installed my OpenSolaris 2009.06 onto a single 160GB 3.5″ IDE drive. I wanted to create an exact copy of this boot configuration, including the user accounts, AMP configuration, WordPress, VNC server setup.
My initial idea was simply to attach one of the SSDs to the existing IDE drive to form a mirror using the ‘zpool attach’ command, but this was not possible due to the new drive being smaller than the existing one! So it had to be done the hard way — always the way!
The process would entail making snapshots of the file systems within the existing boot pool and then sending streams of the files referenced by these snapshots to my main storage pool. The idea would then be to disconnect the IDE drive, plug in the two SSDs, install a fresh copy of OpenSolaris 2009.06 onto one of the SSDs, then create the mirror by use of the ‘zpool attach’ command, and finally restore the contents of the previously archived streams from the main data pool onto the boot pool mirror. As usual, there were complications along the way, just to make things more interesting…
Archiving streams of the file systems in the existing root pool
The first step was to create snapshots of the file systems within the existing root boot pool. I have filtered out the swap and dump file systems from the list below, as we don’t need to archive these. In the list below, the ‘rpool/BOOT/be2′ file system is my boot environment ‘be2′, which is the one I want to boot when the SSDs are set up later.
# zfs snapshot -r rpool@20090827 # zfs list -t snapshot | grep 20090827 | grep -v dump | grep -v swap NAME USED AVAIL REFER MOUNTPOINT rpool@20090827 18K - 81.5K - rpool/ROOT@20090827 0 - 19K - rpool/ROOT/be2@20090827 11.2M - 6.14G - rpool/export@20090827 16K - 21K - rpool/export/home@20090827 38.3M - 481M -
Now we have the snapshots available, the next step is to send streams of the files referenced by these snapshots to an archive file system for later retrieval. So next, I will create an archive file system. For this archive, I used my Solaris backup server, but I suppose you could use another pool on your NAS, if you can mount them when you later run the OpenSolaris 2009.06 boot CD, using ‘zpool import’ I suppose.
Here I’ll show you how to archive onto a separate Solaris box using NFS. Fire up the backup server and note its IP address. Then do the following on the backup server (’zfsnas’ is the host name of the NAS whose root boot pool we are archiving):
# zfs create backup/snaps # zfs set sharenfs='rw=zfsnas,root=zfsnas' backup/snaps # share -@backup/snaps /backup/snaps sec=sys,rw=zfsnas,root=zfsnas ""
Send a single stream for all the files referenced by the recursive snapshot of all the root boot pool file systems. Substitute the IP address 192.168.0.45 with your own one. This is less typing than sending each file system stream individually and creates a single file, although it includes the dump and swap file systems, which are not required:
# zfs send -Rv rpool@20090827 > /net/192.168.0.45/backup/snaps/rpool.recursive.20090827
Alternatively, send individual streams of the files referenced by each of the root boot file system snapshots to the archive file system, excluding the swap and dump file systems:
# zfs send -v rpool@20090827 > /net/192.168.0.45/backup/snaps/rpool.20090827 # zfs send -Rv rpool/ROOT@20090827 > /net/192.168.0.45/backup/snaps/ROOT.20090827 # zfs send -Rv rpool/ROOT/be2@20090827 > /net/192.168.0.45/backup/snaps/be2.20090827 # zfs send -Rv rpool/export@20090827 > /net/192.168.0.45/backup/snaps/export.20090827
Now that the OS and related file systems have been archived, it’s time to shutdown the machine:
# shutdown -y -g0 -i5
Install OpenSolaris 2009.06 onto one of the SSD drives
Next I removed the old IDE hard drive, and plugged in the two SSD drives plus an IDE DVD ROM drive.
Then reboot and place the OpenSolaris 2009.06 install CD-ROM in the drive. Select keyboard and language and let the OS boot up to the desktop. Then open the terminal, and su to root with password ‘opensolaris’.
First step is to get the list of drives:
# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c8t0d0
/pci@0,0/pci1043,8239@5/disk@0,0 1. c8t1d0 /pci@0,0/pci1043,8239@5/disk@1,0 2. c9t0d0 /pci@0,0/pci1043,8239@5,1/disk@0,0 3. c9t1d0 /pci@0,0/pci1043,8239@5,1/disk@1,0 4. c10t0d0 /pci@0,0/pci1043,8239@5,2/disk@0,0 5. c10t1d0 /pci@0,0/pci1043,8239@5,2/disk@1,0 6. c11t6d0 /pci@0,0/pci10de,376@a/pci15d9,a380@0/sd@6,0 7. c11t7d0 /pci@0,0/pci10de,376@a/pci15d9,a380@0/sd@7,0 Specify disk (enter its number): ^C #
Drives 0 to 5 inclusive are the six drives that form the main storage pool, and drives 6 and 7 are the two 30 GB SSD drives that we will use to install the OS, make a mirror and then restore our previous boot pool to.
I will use drive 6 to install the OS to — i.e. c11t6d0. Double-click the Install icon on the desktop, select the drive to install to, and select “use whole drive” option. Installation to the SSD was amazingly fast — it only took 11 minutes!
After installation, click the reboot button in the installer, but keep the install CD in the drive.
Restoring the archived streams of the previous root boot pool
Boot the system with the OpenSolaris 2009.06 installation CD-ROM in the drive. The idea of this is that the zfs pools will be unmounted and then you can restore the archived streams to the root boot pool which will not be in use, as the OS boots from the CD.
Once the installer has booted, you’ve selected keyboard and language, and reached the desktop again, open a terminal and su to root, using password ‘opensolaris’.
Then follow the steps below to restore the archived OS file system stream(s). Now we have the root command line we can connect to the remote box, from where we’ll restore the file system streams. By the way, who is ‘jack’ ?
jack@opensolaris:~$ su Password: ('opensolaris' is the default root password on the installation CD) jack@opensolaris:~# zpool list no pools available jack@opensolaris:~# zpool import -f rpool jack@opensolaris:~# mount -F nfs 192.168.0.45:/backup/snaps /mnt jack@opensolaris:~# cat /mnt/rpool.recursive.20090827 | zfs receive -Fdu rpool (full recursive stream) (or for individual streams): jack@opensolaris:~# cat /mnt/rpool.20090827 | zfs receive -Fd rpool jack@opensolaris:~# cat /mnt/ROOT.20090827 | zfs receive -Fd rpool jack@opensolaris:~# cat /mnt/export.20090827 | zfs receive -Fd rpool jack@opensolaris:~# cat /mnt/be2.20090827 | zfs receive -Fd rpool cannot mount '/': directory is not empty jack@opensolaris:~# zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 20.7G 8.57G 81.5K /rpool rpool/ROOT 16.2G 8.57G 19K legacy rpool/ROOT/be2 9.00G 8.57G 6.14G / rpool/dump 2.00G 8.57G 2.00G - rpool/export 506M 8.57G 21K /export rpool/export/home 506M 8.57G 481M /export/home rpool/swap 2.00G 10.5G 101M - jack@opensolaris:~# zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT rpool@20090827 0 - 81.5K - rpool/ROOT@20090827 0 - 19K - rpool/ROOT/be2@20090827 0 - 6.14G - rpool/export@20090827 16K - 21K - rpool/export/home@20090827 0 - 481M -
Now, very importantly, set the boot file system to use the ‘be2′ boot environment, or whichever one you used:
jack@opensolaris:~# zpool set bootfs=rpool/ROOT/be2 rpool
Setup the SSD mirror
Open a terminal and su to root and take a look at the root pool:
# zpool status rpool pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c11t6d0s0 ONLINE 0 0 0 errors: No known data errors
Now let’s zap disk 7 (c11t7d0) and create a single 100% Solaris partition. Run format command, select drive 7 (c11t7d0), select the fdisk option and delete any existing partitions, then select the create option and select a 100% Solaris partition for the whole drive. Then select the option to save changes and quit from the format command.
Now transfer the volume table of contents from the first SSD to the second SSD:
# /usr/sbin/prtvtoc /dev/rdsk/c11t6d0s2 | /usr/sbin/fmthard -s - /dev/rdsk/c11t7d0s2 fmthard: New volume table of contents now in place.
And finally, form the mirror by attaching the second SSD to the root pool:
# zpool attach -f rpool c11t6d0s0 c11t7d0s0 Please be sure to invoke installgrub(1M) to make 'c11t7d0s0' bootable.
Now install the GRUB bootloader to the second SSD so that the BIOS can boot the second drive in the event that the first becomes unreadable (don’t forget to enable the two SSDs in the BIOS as bootable drives afterwards!):
# installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c11t7d0s0 Updating master boot sector destroys existing boot managers (if any). continue (y/n)?y stage1 written to partition 0 sector 0 (abs 16065) stage2 written to partition 0, 271 sectors starting at 50 (abs 16115) stage1 written to master boot sector #
Good, now let’s checkout the mirrored root boot pool. As you see, an automatic scrub has occurred in order to resilver the second SSD from the first SSD in the mirror, as a result of attaching the second SSD to the pool:
# zpool status -v rpool pool: rpool state: ONLINE scrub: resilver completed after 0h2m with 0 errors on Fri Aug 28 20:40:36 2009 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c11t6d0s0 ONLINE 0 0 0 c11t7d0s0 ONLINE 0 0 0 5.46G resilvered errors: No known data errors #
Next I rebooted and checked if the boot environment was 100% as it was before. It was!
I now have a mirrored root boot pool on two SSD drives. Of course, other drives would work too. A much cheaper option would have been to use two small 2.5″ SATA HDDs to achieve similar results and for home / SOHO environments on a budget, this would make a cheaper alternative to using SSDs.
I must say that when I’m working on this box now, the speed with these SSDs is phenomenal, and well worth the money. Also, the boot and shutdown processes are lightning fast! With a single SSD read speed of around 240 Mbytes/sec, when using a mirror of two of these speedy drives on this new SATA controller card, it must be loading stuff incredibly quickly. During boot, I only see the HDD light on for about 2 or 3 seconds in total, and the rest of the time must be being spent starting services! I’ll maybe do a speed test one day…
The learning curve has enabled me to create archives of my boot environment from time to time, further protecting an already solid mirrored system boot environment, with multiple rollback boot environments to choose from should a package upgrade go wrong, coupled to a double-parity RAID-Z2 protected main storage pool. It doesn’t get much better than this!
Next I’ll have to chuck one of these SSDs into the Mac Pro case for the OS and apps. Anand Lal Shimpi of AnandTech highly recommends speeding up your ‘old’ Mac Pro with an SSD!
As these notes were taken in a fairly chaotic manner during attempts to get this to work, please notify me of any mistakes you find using a comment below. Thanks and I hope this post helps others wishing to do the same.
It’s a great feeling to know you have a robust OS boot environment, having a combination of a mirror plus multiple boot environment rollback capability to guard against upgrades that break stuff using the ‘update all’ feature of Package Manager.
I found the following URLs helpful when trying to get this to work, and would like to thank their authors for spending the time to document their efforts, which have benefited me here:
Popularity: 9% [?]