Home Fileserver: Backups

Now that you’ve got your ZFS Home Fileserver up and running and you’ve got your file systems created and shared to other machines on your home network, now’s the time to consider getting some backup policy in place.

I’ll show a few different possibilities open to you.

It’s all very well having a RAIDZ setup on your fileserver with single-parity redundancy and block checksums in place to protect you from a single failed drive, and snapshots in place to guard against accidental deletions, but you still need backups, just in case something really awful happens.

I built myself a backup machine for around 300 euros, using similar hardware as described in the Home Fileserver: ZFS hardware article, but reduced the cost by using cheaper components and using old SATA drives that were lying on the shelf unused. I will describe the components for this backup machine in more detail elsewhere.

For the purposes of this article, we’ll perform a backup from the fileserver to the backup machine.

The backup machine has Solaris installed on an old Hitachi DeathStar IDE drive I had lying around. These drives don’t have a particularly stellar reliability record, but I don’t care too much as nothing apart from the OS will be installed on this boot drive. All ZFS-related stuff is stored on the SATA drives that form the storage pool and this will survive even if the boot drive performs its ‘click of death’ party trick 🙂

The SATA drives I had that were lying around were the following: a 160GB Maxtor, a 250GB Western Digital, a 320GB Seagate and a 500GB Samsung. In total these drives yielded about 1.2TB of storage space when a non-redundant pool was created with them all. I chose to have no redundancy in this backup machine to squeeze as much capacity as possible from the drives, after all, the data is on the fileserver. In a perfect world I should probably have redundancy too on this backup machine, but never mind, we already have pretty good defences against data loss already with this setup.

So let’s create the ZFS storage pool now from these disks. First let’s get the ids of the disks we’ll use:

# format < /dev/null
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0d0 
          /pci@0,0/pci-ide@4/ide@0/cmdk@0,0
       1. c1t0d0 
          /pci@0,0/pci1043,8239@5/disk@0,0
       2. c1t1d0 
          /pci@0,0/pci1043,8239@5/disk@1,0
       3. c2t0d0 
          /pci@0,0/pci1043,8239@5,1/disk@0,0
       4. c2t1d0 
          /pci@0,0/pci1043,8239@5,1/disk@1,0
Specify disk (enter its number): 
# 

Disk id 0 is the boot drive — the IDE disk. For our non-redundant storage pool, we’ll use disks 1 to 4:

# zpool create backup c2t0d0 c2t1d0 c1t1d0 c1t0d0
#
# zpool status backup
  pool: backup
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        backup      ONLINE       0     0     0
          c2t0d0    ONLINE       0     0     0
          c2t1d0    ONLINE       0     0     0
          c1t1d0    ONLINE       0     0     0
          c1t0d0    ONLINE       0     0     0

errors: No known data errors
# 
# zpool list     
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
backup  1.12T   643G   503G    56%  ONLINE  -   < -- here's one I already used a bit
# 
# zfs list
NAME                    USED  AVAIL  REFER  MOUNTPOINT
backup                 1.07T  28.1G    19K  /backup
#

This created a storage pool with around 1.12TB of capacity. I have shown data from a pool that was created previously, so it shows that 56% of capacity is already used.

Let's try out iSCSI

As I'd heard that iSCSI performs well, I thought it should make a good choice for performing fast backups across a Gigabit switch on a network.

Quoting from wikipedia on iSCSI we find this:

iSCSI is a protocol that allows clients (called initiators) to send SCSI commands (CDBs) to SCSI storage devices (targets) on remote servers. It is a popular Storage Area Network (SAN) protocol, allowing organizations to consolidate storage into data center storage arrays while providing hosts (such as database and web servers) with the illusion of locally-attached disks. Unlike Fibre Channel, which requires special-purpose cabling, iSCSI can be run over long distances using existing network infrastructure.

Sounds good, let's try it between these two Solaris boxes: (1) the fileserver, and (2) the backup machine.

At this point, I'm trying to choose from the notes I kept of the results of various experiments I did with iSCSI to see which commands I performed. But here is another nice feature of ZFS: it keeps a record of all major actions performed on storage pools. So I'll ask ZFS to tell me which incantations I performed on this pool previously:

# zpool history backup
History for 'backup':
2008-02-26.19:40:29 zpool create backup c2t0d0 c2t1d0 c1t1d0 c1t0d0
2008-02-26.19:43:24 zfs create backup/volumes
2008-02-26.20:07:24 zfs create -V 1100g backup/volumes/backup
2008-02-26.20:09:16 zfs set shareiscsi=on backup/volumes/backup
# 

So we can see that I created the ‘backup’ pool without redundancy, then created a file system called ‘backup/volumes’, then created a 1100GB (1.1TB) volume called ‘backup’. Finally, I set the ‘shareiscsi’ property of the ‘backup’ volume to the value ‘on’, meaning that this volume will become an iSCSI target and other interested machines on the network will be able to access it.

Let’s take a look at the properties for this volume.

# zfs get all backup/volumes/backup
NAME                   PROPERTY         VALUE                  SOURCE
backup/volumes/backup  type             volume                 -
backup/volumes/backup  creation         Tue Feb 26 20:07 2008  -
backup/volumes/backup  used             1.07T                  -
backup/volumes/backup  available        485G                   -
backup/volumes/backup  referenced       643G                   -
backup/volumes/backup  compressratio    1.00x                  -
backup/volumes/backup  reservation      none                   default
backup/volumes/backup  volsize          1.07T                  -
backup/volumes/backup  volblocksize     8K                     -
backup/volumes/backup  checksum         on                     default
backup/volumes/backup  compression      off                    default
backup/volumes/backup  readonly         off                    default
backup/volumes/backup  shareiscsi       on                     local    
backup/volumes/backup  copies           1                      default
backup/volumes/backup  refreservation   1.07T                  local
# 

Sure enough, you can see that it’s shared using the iSCSI protocol and that this volume uses the whole storage pool.

This iSCSI shared volume is known as an ‘iSCSI target’. In iSCSI parlance there is the concept of iSCSI targets (server) and iSCSI initiators (clients).

Now let’s enable the Solaris iSCSI Target service:

# svcadm enable system/iscsitgt

Now let’s verify that the system indeed thinks that this volume is an iSCSI target before we proceed further:

# iscsitadm list target -v
Target: backup/volumes/backup
    iSCSI Name: iqn.xxxx-xx.com.sun:xx:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    Alias: backup/volumes/backup
    Connections: 1
        Initiator:
            iSCSI Name: iqn.xxxx-xx.com.sun:0x:x00000000000.xxxxxxxx
            Alias: fileserver
    ACL list:
    TPGT list:
    LUN information:
        LUN: 0
            GUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
            VID: SUN
            PID: SOLARIS
            Type: disk
            Size: 1.1T
            Backing store: /dev/zvol/rdsk/backup/volumes/backup
            Status: online
# 

This was performed after the iSCSI initiator was configured and connected, so you see ‘Connections: 1’ and the initiator’s details.

Now we’re done with setup on the backup server. We’ve created a backup volume with 1.1TB of storage capacity from a mixture of old disparate drives that were lying around and we’ve made it available as an iSCSI target to machines on the network, which is needed, as we want to allow the fileserver to write to it to perform a backup.

Time to move on now to the client machine — the fileserver, which is known as the iSCSI initiator.

Let’s do the backup

Now we’re back onto the fileserver, we need to configure it to enable it to access the iSCSI target we just created. Luckily with Solaris that’s simple.

iSCSI target discovery is possible in Solaris via three mechanisms: iSNS, static and dynamic discovery. For simplicity, I will only describe static discovery — i.e. where you specify the iSCSI target’s id and the IP address of the machine hosting the iSCSI target explicitly:

# iscsiadm modify discovery --static enable
# iscsiadm add static-config iqn.xx-xx.com.sun:xx:xx-xx-xx-xxxx-xxxx,192.168.xx.xx

Now that we’ve enabled our fileserver to discover the iSCSI target volume called ‘backup’ on the backup machine, we’ll try to get hold of its ‘disk’ id so that we can create a local ZFS pool with it, after all, it’s a block device just like any other disk, so we can let ZFS use it just like a local, directly-attached, physical disk:

# format < /dev/null
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0d0 
          /pci@0,0/pci-ide@4/ide@0/cmdk@0,0
       1. c1t0d0 
          /pci@0,0/pci1043,8239@5/disk@0,0
       2. c1t1d0 
          /pci@0,0/pci1043,8239@5/disk@1,0
       3. c2t0d0 
          /pci@0,0/pci1043,8239@5,1/disk@0,0
       4. c3t0100001E8C38A43E00002A0047C465C5d0 
          /scsi_vhci/disk@g0100001e8c38a43e00002a0047c465c5
Specify disk (enter its number): 
# 

The disk id of this backup volume is the one at item number 4 — the one with the really long id.

Now let’s create the storage pool that will use this volume:

# zpool create backup c3t0100001E8C38A43E00002A0047C465C5d0
#
# zpool list
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
backup  1.07T   623G   473G    56%  ONLINE  -
tank    2.03T  1002G  1.05T    48%  ONLINE  -
test    3.81G   188K  3.81G     0%  ONLINE  -
# 

Voila, the pool ‘backup’ which used the iSCSI target volume ‘backup’ hosted on the backup machine is now usable, so now let’s do the backup — finally! 🙂

For demo purposes I created a 4GB video content folder to backup. We’ll time it being sent over a Gigabit network to see how fast it gets transferred — gotta have some fun after all this aggro, haven’t you? 🙂

# du -hs ./test_data
 4.0G   ./test_data
# 
# date ; rsync -a ./test_data /backup ; date
Thursday, 13 March 2008 00:20:55 CET
Thursday, 13 March 2008 00:21:50 CET
# 

OK, so 4GB was copied from the fileserver to the backup machine in 55 seconds, which is a sustained 73MBytes/second, not bad at all! 🙂

That’s all folks!

I’ll tackle other subjects soon like incremental backups using ZFS commands and also using good old ‘rsync’.

For more ZFS Home Fileserver articles see here: A Home Fileserver using ZFS. Alternatively, see related articles in the following categories: ZFS, Storage, Fileservers, NAS.

Join the conversation

13 Comments

  1. So you’re creating a zvol on top of a zvol? How is the performance? I wonder if making the back end storage available as plain iscsi targets will make much of a difference…

    Thanks for the overview!

    Wout.

  2. Hi Wout, I would say it’s more like a zvol (block device) that has been exported from the backup machine in the form of an iSCSI target, and mounted via the iSCSI initiator in the form of a local ZFS pool — local to the fileserver, that is.

    Simon

  3. Oops, forgot to say: I got 73MBytes/sec sustained copying 4GB of video data. On a much larger transfer (650GB) I got a sustained 48MBytes/sec, and this included a mixture of different file sizes. Then again, I don’t know how the iSCSI target zvol block device is written to: stripe across all disks or simply sequential write across a series of disks as each becomes full. Perhaps the latter, which would mean if you were writing to a really slow disk, you might get lousy slow writes. Maybe it’s described somewhere?

  4. Hi Simon,

    I meant, you have your zvols exported as iSCSI targets, and then you give those as raw disks to ZFS which implements another layer of ZFS on it. So you basically are storing ZFS blocks inside ZFS blocks on the disks…

    So I was wondering if it would be faster if you made the individual disks available as iSCSI targets instead of the ZVOL. The iSCSI server would have to do slightly less processing and block sizes would perhaps be better aligned? Plus the client would have more insight in broken disks, which might or might not make a difference.

    Although I must say I like those transfer rates 🙂

    ZFS will stripe at all times, so don’t worry about that.

  5. Ah yes, I see what you mean now. Yes, it may well perform better using it directly. If you try it let me know if you see any difference, as I still have a load of other stuff to learn like snapshots, clones, zfs send/recv for incremental backups etc.

    If I wanted to try using the imported iSCSI disk directly, rather than attaching it to a pool, how would I write to it with ZFS? There seems relatively little documentation about using iSCSI on Solaris with fully explained examples. Either that, or I have not looked well enough, perhaps.

    Yes, those transfer speeds are quite decent, I thought. And I haven’t yet looked into (1) setting the MTU size to 9000 for jumbo frames, or (2) using the 2 onboard GbE ports simultaneously to get a 2Gbps full-duplex pipe using trunking/bonding — something else to try out one day 🙂 Although I’ll probably use the 2 built-in GbE’s on the Mac Pro and the 2 built-in GbE’s on the ZFS fileserver as this is probably where I’d get the most benefit, as the backup machine will only be taking incremental backups once I get that going properly using snapshot ranges combined with use of zfs send/recv.

    Good to know about the striping — thanks a lot!

  6. Hi Simon,

    I’m fairly new to ZFS, but as I’m in charge of setting up my lab’s fileserver, I figured I’d give it a shot! I think it’s great but always had trouble with the incremental backups until I found this page – iSCSI is amazing! Looking forward to the incremental backups post. Keep up the good work!

    -Justin

  7. Hi Justin,

    Thanks for the compliment! Yes, iSCSI seems quite magical so far. It makes it easy to see how streaming data for backups could be easy to a remote offsite location, as iSCSI needs an IP address. Have fun, and you reminded me that I need to tackle that article on incremental backups… have fun!

    Simon

  8. Thx a lots!
    I am planning to setup a file server to replace my slow slow NAS, you blog helps me a lots.
    Really thank you very much!
    A post from Hong Kong.
    🙂

    P.S. Wish you have a merry christmas

  9. Hi Simon

    Using the configuration described above, the “backup” filesystem is created on fileserver so we can use it as if it was local… but can it be mounted or accessed from the backup machine itself? I’d still like to have the option of reading and writing to it from the backup machine.

    Also, can I make the backup filesystem available via smb? Would that just be a zfs change on fileserver? Isn’t that less efficient (connecting with smb via fileserver over to backup through iscsi)?? I’d rather be able to use the same filesystem directly on backup and make it available directly via smb while still using it via iscsi from the fileserver. Hope that makes sense.

    PS: thanks for the articles — inspired me to build these two boxes!

  10. Hi Heath,

    Like yourself, I also wondered if I could mount and read/write to the backup pool from the backup server, where the disk space is specified as a volume. I didn’t spend much time trying to find out how to do it though, so I can’t give you a definitive answer. I expect it is possible, but you will probably need to search the internet for answers. Try posting to the ZFS discuss forum on the opensolaris site here, and refer here for the example of the setup you refer to: http://www.opensolaris.org/jive/forum.jspa?forumID=80

    For the answer to your second question, I think if you find the answer to the first question above, then it will be a simple matter to share it via SMB directly from the backup server. If you don’t manage this though, like you said, you can simply do an SMB share of the backup file system via the fileserver. I would have thought you wouldn’t lose too much in terms of speed.

    Glad you were inspired by these articles to build a couple of boxes.

    Merry Christmas,
    Simon

  11. While I am still in research phase of putting together a similar system for a web hosting cluster. I am leaning toward agreeing that you should not mount your backup as a iSCSI target to the device being backed up but it should be OK. But I am still trying to decide how to arrange a nearline SAN. Should I backup to the primary SAN and replicate. Maybe backup to the replication SAN and replicate both ways? While my setup is going to be a little more complicated here it goes.

    VMWare Servers (Linux/Windows Web/DB servers)
    |
    | VMWare Server (Backup, log, monitoring servers)
    iSCSI |
    | iSCSI
    | |
    ZFS iSCSI target———WHAT PROTOCOL?——->> Storage Replication server

    OR

    VMWare Servers (Linux/Windows Web/DB servers)
    |
    | VMWare Server (Backup, log, mon, etc)
    iSCSI |
    | iSCSI
    | |
    ZFS iSCSI target—WHAT PROTOCOL?—>> <<– Storage Replication server

  12. Hello Simon,

    Another interesting and well written article! I am very impressed by the thoroughness and completeness of your write ups and hope you will continue to explore this subject in your writings as I have not found anyone who has carried the “ZFS/Solaris for beginners” torch with equal depth and breath. If you don’t mind the inquiry, I am curious about your background… Are you a system admin or network engineer or do you do something else to afford your toys? 🙂

    Regarding Wout’s earlier question about performance of running ZFS on top of ZFS:

    I believe you have the superior implementation as you did it. The reason is that you want to have as little administrative overhead traversing the network as possible. Having a single iSCSI volume for the initiator to handle is going to translate in to less client-side overhead and thus less network over head and thus more available bandwidth for the data itself and higher transfer rates.

    Regards,

    James

    -U.S.

Leave a comment

Your email address will not be published. Required fields are marked *