Home Fileserver: RAIDZ expansion

Recently I decided to expand the storage capacity of my ZFS home fileserver’s ZFS storage pool, which was configured to use a single RAIDZ vdev comprising of three 750 GB drives.

I wanted to expand the RAIDZ vdev by adding one extra 750 GB drive.

As you may or may not be aware, it is currently not possible to expand a RAIDZ vdev in ZFS, which is a great pity.

However, it is still possible to achieve this expansion goal, but you have to perform a few tricks. Here is how I did it.

The way to achieve RAIDZ vdev expansion is to destroy the pool and re-create it, using the additional drive(s).

Of course, this means you need to back up all the data first, so you’ll need to have an additional storage location available, which has large enough capacity.

I decided to do two full backups of all my data.

The first backup I did was by attaching one additional SATA drive, typing ‘format’ to see the id of the new drive. Once I had the id of the new drive, I created a second ZFS storage pool that used only this new drive:

# zpool create backuppool c3t0d0

Once the backup pool had been created, it was simply a matter of copying the file systems from the main storage pool to the new backup pool. Of course, if the backup pool required more capacity than a single drive provides, then two drives can be used.

Once the transfer of all the file systems had completed, I made a second full backup by copying the file systems to an iSCSI-connected storage pool on a separate backup server. For setup info see: Home Fileserver: Backups.

After both full backups had been made, I verified that the backups had been successful. All seemed well.

The next step was to destroy the main storage pool comprising of the three-drive RAIDZ1 vdev (single parity RAIDZ vdev, as opposed to RAIDZ2 which uses double parity). This moment is not a pleasant one, but with the backups done, nothing could go wrong, could it?

So the storage pool ‘terminator’ command I executed was:

# zpool destroy tank

Gulp.

The next step was to re-create the new ZFS storage pool using the additional drive, which I had plugged in. First step is to get all the drive ids with which to create the new pool. For this, I used the good old format command:

# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c0d0 
          /pci@0,0/pci-ide@4/ide@0/cmdk@0,0
       1. c1t0d0 
          /pci@0,0/pci1043,8239@5/disk@0,0
       2. c1t1d0 
          /pci@0,0/pci1043,8239@5/disk@1,0
       3. c2t0d0 
          /pci@0,0/pci1043,8239@5,1/disk@0,0
       4. c2t1d0 
          /pci@0,0/pci1043,8239@5,1/disk@1,0
Specify disk (enter its number): ^C
# 

Then, using the drive ids from above, I created the new storage pool comprising a single RAIDZ1 vdev using the four drives:

# zpool create tank raidz1 c1t0d0 c1t1d0 c2t0d0 c2t1d0

After creating the new storage pool, I took the opportunity of reviewing the file system layout and making a few changes by creating a new file system hierarchy that better suited my needs.

Once I had created the file systems using the new layout, the next step was to copy the file system data back from the backup pool to the new main storage pool.

After I had successfully copied all the data back to its new larger storage pool, I decided to unplug the backup pool drive from the case, and keep it on the shelf in case it was needed again in future. This was done by exporting the backup pool to flush any pending writes before removing the drive from the case:

# zpool export backuppool

So, the end result of this work is that I now have a storage pool using a single RAIDZ1 vdev but utilising four drives instead of three. Using 750GB drives, this means instead of a pool comprising ~1.4 TB for data capacity and ~0.7 TB for parity data, the new setup with 4 drives gives ~2.1 TB for data and ~0.7 TB for parity data. Nice! Grrrrreat.

For more ZFS Home Fileserver articles see here: A Home Fileserver using ZFS. Alternatively, see related articles in the following categories: ZFS, Storage, Fileservers, NAS.

Join the conversation

15 Comments

  1. Hi Simon,

    great blog entry, as always! I like the mix between casual style and solidly founded tech content.
    Is there any particular reason why you prefer rsync over zfs send/receive for performing the backups?

    Cheers,
    Constantin

  2. Hi Constantin, thanks a lot for the compliments and glad you like the writing style!

    In the linked-to backups article, I used rsync simply because I was not familiar at that time with zfs send / receive. Later, I wrote about using zfs send / receive for performing backups in this article: Home Fileserver: Backups from ZFS snapshots.

    Thanks for writing all the great posts about ZFS on your blog, and I look forward to reading more in future. It was your blog and Tim Foster’s blog that inspired me when I first started looking into ZFS, so thanks again!

    Cheers,
    Simon

  3. I have really enjoyed reading this series of posts, as I am in the planning stages of my home file server.

    Unfortunately, I just can’t justify jumping into ZFS when you cannot expand a raidz vdev. I know drobo and symilar units are expensive and proprietary. But the functionality of being able to expand an array by adding additional disks is just too valuable to me, and I would think it is similar for most other home users.

    Call me strange, but I don’t see much point in growing a raidz pool if you need enough temporary storage to store all of your data for backup anyway. At that point you could just use those drives to form a second raidz vdev and add it to your existing pool.

    Is there any hope that ZFS will add this functionality? Don’t worry, I’m not holding my breath.

  4. Thanks Eric.

    As you say, not being able to expand a RAIDZ vdev needn’t be a problem if you are happy with simply adding an additional RAIDZ vdev when expanding your pool. I wanted to add just one drive though, so adding a new vdev wasn’t an option. The backup storage was no problem as I had sufficient spare capacity available.

    I have noted my comments on Drobo here: http://breden.org.uk/2008/03/02/home-fileserver-existing-products/

    The points I mentioned about Drobo were:

    Drobo initially looked quite interesting, especially in its flexibility for handling different sized drives. However, it seems to suffer from pitifully slow transfer speeds due to using a USB2 interface, and its data format seems undocumented and, therefore, proprietary. I didn’t need to look any further. Also the price was around $500 without drives, so it’s not cheap either. I believe there is some kind of bolt-on box called DroboShare for giving ethernet access too, available for another $200. So this gizmo with ethernet access will cost $700 without any drives. No thanks.

    Yes, I think there will be a solution available for expanding RAIDZ vdevs sometime. The last time I looked, there was some info on how this might work here: http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z

  5. ERIC,

    You can not easily expand an existing vdev as of now, but you can create a new vdev and add it to the ZFS raid.

    Say you have 3 drives in a ZFS raid. Then you can add 3 more drives to that raid. So now you got a ZFS raid with 3 drives + 3 drives. On each set, one disc will be used for parity, so here you have 4 drives for storing data, and 2 drives for parity. All data will be shared on the ZFS raid, spread out evenly on the discs.

    Alternatively, you could have created one ZFS raid with 6 drives. Then one of the drives would have been for parity and store data on 5 drives. This is the approach used in this article.

    (When using many drives, 10 or more, the prefered approach is to have several smaller sets combined into one large ZFS raid. This is prefered on the SUN thumper machine which has 48 drives. Many smaller sets combined into one big ZFS raid is recommended. Just buy several SATA I/O cards with 8 connections, and you can have a thumper at home. Remember, ZFS prefers no hardware raid card. ZFS likes to have control of everything by itself. No hardware raid cards then).

    Of course, you could add only one drive to your first set of 3 drives. Then you would have a ZFS raid with 3 drives + 1 drive. This would be bad, because what happens if that 1 drive fails? In the set of 3 drives, one drive can fail because that set has redundancy. But the second set has no redundancy. Therefore, add a group of drives to an existing ZFS raid. Within a group, the drives should have equal storing capacity.

  6. Simon, another way to grow a raidz pool is to swap each drive with a larger one. Swap one drive at a time and resilver then after the last drive export and reimport the pool. Just finished upgrading my 4x500GB to 4x1TB. My next step is to migrate to OpenSolaris 2008.11 to take advantage of some of the newer features (like Time Slider) and drivers.

  7. Hi Rob,

    Yes, I knew that method was a possibility, but it would have required a full resilver after each of the three disks was replaced, which (a) would have taken a long time, and (b) could have toasted my data if an existing drive had failed during any of the three required resilver operations.

    There are other issues with the ‘replace one disk at a time’ method, and that is:
    1. I would have needed to buy three more larger disks (e.g. 1TB) to replace the existing operational 750GB drives.
    2. There is no backup available in the event of the operation failing.
    3. I would have three 750GB drives with no use at the end of replacing them all with 1TB drives (I prefer to have one vdev currently, due to the limitation of existing available SATA connectors and drive cage space).
    4. I would have only gained ~750GB by replacing the three existing 750GB drives with 1TB replacements, which is the same that I gained by recreating the array with one additional 750GB drive.

    Hopefully this makes sense 🙂

    Cheers,
    Simon

  8. hmmm, I was about to ditch my ReadyNAS for this ZFS solution but I’m shocked to find that it can’t be expanded or added to. I can replace any drive in the ReadyNAS and it rebuilds the proprietary raid5. Is it true that I cannot replace just one drive? Say I have 1.5TB + 1TB + 1TB. And in 6mo I want to replace 1TB with 1.5TB. Is this possible? Also, is it just like RAID5 … I will only see 2TB free (in a 1.5TB + 1TB + 1TB setup)? Its March 2009, anyone know if these capabilities are on roadmap for this year??? THANK YOU FOR GREAT blog with lots of information!

  9. Simon, I’m in the process of planning for a home-backup solution, and I’m very interested to hear how other people are managing expansion challenges with zfs raid-z implementations. Keep the posts coming, Great article!

  10. Well, this is nothing but a workaround. It still requires you to have a shitload of free HDD capacity (where you store all your data while creating the new pool).

    So you have to waste a lot of space.
    It’s a shame, that RAIDZ still can’t be expanded!

  11. Hi Arne,

    You’re right, you do need a lot of space available. A backup system should help with this, as all your data should be backed-up, difficult as it can be.

    RAID is not backup, and this is often forgotten by people thinking of implementing a RAID setup.

    vdev expansion is not a priority for Sun (Oracle soon?), as enterprise users simply buy large arrays to start with, and then upgrade them by adding another 6 or 12 drives etc, as a new vdev. Adding vdevs to an existing pool is possible, but not expanding an existing vdev.

    But you’re right, having the ability to expand a vdev would be a useful feature for home users. But Sun won’t get any money from us home users.

    There is some recent work by Matt Ahrens which may prepare the way for vdev expansion one day… soon?
    He says: ‘This work lays a bunch of infrastructure that will be used by the upcoming device removal feature. Stay tuned!’
    See: http://blogs.sun.com/ahrens/entry/new_scrub_code
    It might look unrelated to vdev expansion, but if he says the on-disk changes should allow upcoming device removal, then it seems likely that it might also facilitate device addition.

  12. Hi,

    I’m pretty new to OpenSolaris. I had a Debian NAS but the performance of Samba was not really good, so I tried other systems. OpenSolaris is really fast. But I’d like to have a system like my old Debian installation. I have a mainboard with 6x S-ATA connectors and I have six S-ATA hard disks. On Debian I had the following setup, all hard disks had the same partitioning:
    1GB /boot, RAID1 using all hard disks
    5GB / (root), RAID5 using all hard disks (25GB for root)
    about 700GB /srv, RAID5 using all hard disks (the RAID for CIFS / Samba)

    Is it possible using ZFS to create two or three pools on the hard disks so I can install OpenSolaris like my old Debian installation?

    At the moment I’m trying OS on my USB stick, but this is a slow solution and seems not to work pretty good…

    Regards, Nils.

  13. Hi Nils,

    The configuration I would strongly recommend you use is the following, comprised of two pools:

    1. ‘rpool’ : one pool containing / and /boot. This contains the boot environment and root file system. Mirror this using two drives to give protection. You’ll also benefit from GRUB bootable ‘Boot Environments’ allowing you to recover from failed OS upgrades (like Debian’s apt-get -u dist-upgrade). This has saved my boot system on one occasion already.

    2. ‘tank’ : one pool containing all your non-OS/boot data. Put all your user data here. This corresponds to your previous ‘/srv’

    ‘rpool’ can be mirrored using a ZFS ‘mirror’ vdev.

    ‘tank’ can be configured to use any single vdev or combination of vdevs you like. My recommendation, assuming you have the cash to invest, is to create a single RAID-Z2 vdev for this, for simplicity of administration and strong protection of data. As the capacity of two drives is used for parity data with RAID-Z2 vdevs, you will probably want something like capacity of 4 drives for data and capacity of 2 drives for parity — i.e. 6 drives in total for user data.

    However, as you only have 6 SATA ports on your mobo, and you will need to reserve 2 for your root/boot pool, you might have to reconsider this and either economise to a 4-drive RAID-Z1 vdev for user data, or upgrade to using a PCIe SATA HBA like an AOC-USAS-L8i or an LSI LSISAS3081E-R, both of which give you a further 8 SATA ports to use as you choose.

    I am using the AOC-USAS-L8i, after having gone through two upgrade processes. If cash is not a problem for you then I would recommend investing in one of these cards to give a strongly protected boot/OS + data storage environment.

    Cheers,
    Simon

  14. Hi,

    I have stumbled across this long after it was written.

    I have a 7 drive RAID6 mdadm software raid system. It is dying. I am looking at ZFS this time as I like the checksumming and encryption. BUT I have only just read about not being able to grow the ‘pools’.

    Now, a lot of time has passed. Has this been resolved does anyone know? I extend the life of my server by adding drives and that is a core part of the new systems design? I do not have enough space to scrap and recreate….

  15. Hi Jon,

    Sorry to hear your mdadm system is dying.

    You can grow the pool, by adding new vdev(s) (bunch of drives with a specified parity level) to the pool.

    A pool consists of one or more vdevs.

    Cheers,
    Simon

Leave a comment

Leave a Reply to Mark Cancel reply

Your email address will not be published. Required fields are marked *