The next step in setting up your own ZFS home fileserver is to set up your ZFS storage pool and file systems and then share them with other machines. The ZFS commands should work from any operating system where ZFS is available. I have used two machines in this example: a machine running Sun Solaris for the fileserver, and a Macintosh client machine.
Update 2009-05-10: Please see the post Home Fileserver: ZFS File Systems for more details on setting up a practical file system hierarchy.
Choose your operating system
The first step is to choose which operating system you will install on your machine. Personally, I can recommend Sun Solaris as I have it running well here and it is the original operating system that ZFS has been designed to run on. I believe it runs on FreeBSD 7.0, Linux under FUSE, and Mac OS X with the relevant download from developer.apple.com or here. If you are able to choose, and wish to have the most stable and reliable version of ZFS, I would personally recommend that you choose Sun Solaris as your operating system.
If you choose to install Sun Solaris, then the next hurdle is to find out which version. When I was looking to install Solaris I had to choose among the following:
- Solaris: version 10 (current), 11, 12 etc is a major release of Solaris that occurs once every year or two, and is solid and heavily tested.
- Solaris Express Developer Edition (SXDE): is a release that occurs once every 2 months or so, and is quite well tested and stable.
- Solaris Express Community Edition (SXCE): is a release that occurs once every week or two, has some testing and may or may not be stable in all areas.
- OpenSolaris Developer Preview: is a preview of a next-generation development of Solaris, and is part of a project called Indiana. It includes new technologies like APT for packaging, first seen in Debian Linux, plus many other new features which you can look up if of interest.
Solaris Express Developer Edition and Solaris Express Community Edition are part of the project called Nevada.
My above comments on each version of Solaris are a brief, but hopefully correct, understanding. Even some Sun insiders seem to think having all these choices is a bit confusing.
Anyway, for running Solaris and ZFS on a home fileserver, my opinion is that your best bet is to choose from the SXDE or SXCE editions.
For quicker bug fixes and new features, I have chosen to use SXCE.
Getting and installing Solaris
To get SXCE, go to http://www.opensolaris.org, click on the Download icon at the top right of the page, then select the DVD link under the text ‘Solaris Express CE’.
You’ll have to register, but it’s free. Then you can download the Solaris image.
You can then burn the ISO file using whatever DVD burning software you have — e.g. DiskUtility for the Mac, or Nero for Windows.
Then you can install Solaris by booting the burned DVD. Installation is simple to do, but fairly slow. After selecting your region and language etc and letting the installation get under way, it’s a good time to go and get a coffee and something to eat 🙂
During installation or after booting the new Solaris installation, you can create your user account. You may wish to use the same user and group ids as you use on your other UNIX box, e.g. your Mac. This will simplify permissions hassles later if you are copying over data from your other machine.
Once you have a system where ZFS is available, you can get to work setting up your ZFS storage pool.
You’ll need to decide what kind of setup you want (redundancy or no redundancy). Also which disks you will use. I will assume here that you have multiple disks of the same size and that you wish to setup a large storage pool with built-in redundancy. For redundancy, I will assume you want single-parity. Then our choice will be simple. We’ll setup a RAIDZ array, which is kind of equivalent to the old RAID level 5 setup, but with extra ZFS features not available with RAID level 5.
Once your disks are connected within the case, you need to get the ids of your disks, because you’ll need to specify these disk ids when creating the storage pool.
In Solaris, you can type, as root user:
This will give the following output:
# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0d0 (DEFAULT cyl 20007 alt 2 hd 255 sec 63) /pci@0,0/pci-ide@4/ide@0/cmdk@0,0 1. c1t0d0 (ATA-WDC WD7500AAKS-0-4G30-698.64GB) /pci@0,0/pci1043,8239@5/disk@0,0 2. c1t1d0 (ATA-WDC WD7500AAKS-0-4G30-698.64GB) /pci@0,0/pci1043,8239@5/disk@1,0 3. c2t0d0 (ATA-WDC WD7500AAKS-0-4G30-698.64GB) /pci@0,0/pci1043,8239@5,1/disk@0,0 Specify disk (enter its number): ^C #
Now hit CTRL-C to break out of it.
I have installed Solaris on disk 0, and we will now use the 750GB SATA disks with numbered 1 to 3 in this list. They are 750GB disks, but only about 692GB are actually usable thanks to the marketing con used by the disk industry to make the disk sizes appear bigger than they actually are.
Now we’ll issue the ZFS command to create a RAIDZ array of these three 750GB drives, which should give around 1.4TB for data (2 x 692GB), and the other 692GB is used by the RAIDZ array for parity data. This parity data gives the array the ability to (1) remain operational in the event that one of the three hard drives fails, and (2) seal-heal any ‘latent defects’ (aka silent errors or bit rot) so the space is not being wasted 🙂
We’re going to call this data storage pool ‘tank’, as used in all the Sun ZFS demos and examples. Presumably tank refers to a large storage container like a pool:
# zpool create tank raidz1 c1t0d0 c1t1d0 c2t0d0
Now let’s check its status:
# zpool status tank pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 errors: No known data errors #
You can see that this pool is called ‘tank’, that it is online, that no scrub has been requested, and the configuration for the ‘tank’ pool is a RAIDZ1 (single-parity RAIDZ vdev), and you can see the id of each disk in the vdev (virtual device), and for each disk in the vdev, you can see that there have been no read, write or checksum errors found so far.
Now let’s see how much space we have (these figures are from an in-use pool, not a newly created one):
# zpool list tank NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 2.03T 996G 1.06T 47% ONLINE - # # zfs list tank NAME USED AVAIL REFER MOUNTPOINT tank 663G 701G 25.3K /tank #
The first command ‘zpool list tank’ gives raw storage capacity data for the storage pool that includes capacity used to store parity data.
The second command ‘zfs list tank’ gives storage capacity data for the file systems created within the storage pool, that excludes capacity used for parity data — i.e. the figures only consider user data.
So we can see that the pool has around 2TB of capacity (including parity data), and that in the file systems created under the ‘/tank’ mountpoint we have used 663GB and have 701GB available. 663GB + 701GB = 1364GB, or around 1.3TB, so this seems about right, considering that an additional 692GB is used for parity data (~1.3TB + ~0.7TB = ~2TB).
So it’s looking good.
Setting up your file systems
Now we’ll move on to exploring what we can do with storage pools. So that I don’t risk messing up my existing storage pool, I’m going to create a new pool which will use a 4GB USB memory stick (thumbdrive for U.S. readers?).
Once the pool is created, all the ZFS commands will be identical to the ones I would use if I was working with standard hard drives. There will be no redundancy with the USB stick as I will only use one of them here, but that doesn’t matter for this example.
First I’ll plug the 4GB stick into the USB slot and then see what its ‘disk’ device id is:
# format -e < /dev/null Searching for disks... The device does not support mode page 3 or page 4, or the reported geometry info is invalid. WARNING: Disk geometry is based on capacity data. The current rpm value 0 is invalid, adjusting it to 3600 done c4t0d0: configured with capacity of 3.84GB AVAILABLE DISK SELECTIONS: 0. c0d0
/pci@0,0/pci-ide@4/ide@0/cmdk@0,0 1. c1t0d0 /pci@0,0/pci1043,8239@5/disk@0,0 2. c1t1d0 /pci@0,0/pci1043,8239@5/disk@1,0 3. c2t0d0 /pci@0,0/pci1043,8239@5,1/disk@0,0 4. c3t0100001E8C38A43E00002A0047C465C5d0 /scsi_vhci/disk@g0100001e8c38a43e00002a0047c465c5 5. c4t0d0 < -USBFLASHDRIVE-34CE cyl 1965 alt 2 hd 128 sec 32> /pci@0,0/pci1043,8239@2,1/storage@a/disk@0,0 Specify disk (enter its number): #
Now we’ll create a ZFS storage pool called ‘test’ that will use the USB stick to store its data. I had to use the ‘-f’ flag here because the USB stick previously had a UFS file system stored on it, and when I inserted it, Solaris mounted it as ‘/media/USB FLASH DRIVE’, so we’re forcing it to ignore errors here, and do what we want anyway:
# zpool create -f test c4t0d0
# zpool status test pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 errors: No known data errors # # zpool list test NAME SIZE USED AVAIL CAP HEALTH ALTROOT test 3.81G 584K 3.81G 0% ONLINE - # # zfs list test NAME USED AVAIL REFER MOUNTPOINT test 106K 3.75G 18K /test #
In this case, as there is no redundancy used (i.e. we’re not using RAIDZ or MIRROR), the figures shown in ‘zpool list test’ and ‘zfs list test’ match, more or less.
Now let’s create a file system for a user called simon:
# zfs create test/home # zfs create test/home/simon # # zfs list NAME USED AVAIL REFER MOUNTPOINT test 186K 3.75G 19K /test test/home 57K 3.75G 21K /test/home test/home/simon 18K 3.75G 18K /test/home/simon #
As these were created as root, now let’s change the owner and group ids for the test/home/simon file system to simon (owner id) and simon (group id):
# cd /test/home # ls -l total 3 drwxr-xr-x 2 root root 2 Mar 8 20:18 simon # # chown simon simon # chgrp simon simon # ls -l total 3 drwxr-xr-x 2 simon simon 2 Mar 8 20:18 simon #
Now let’s create a file in /test/home/simon called ‘readme.txt’ and fill it with some text. As I’m still root user, I’ll change the owner and group ids to simon again:
# cd simon # echo 'This is a test.' > readme.txt # ls -l total 2 -rw-r--r-- 1 root root 16 Mar 8 20:24 readme.txt # chown simon readme.txt # chgrp simon readme.txt # ls -l total 2 -rw-r--r-- 1 simon simon 16 Mar 8 20:24 readme.txt #
Making the storage pool accessible from other machines
The next step here is to enable the file system at /test/home/simon to be accessible to another machine. With ZFS we have three possibilities:- sharing with SMB/CIFS (Samba), NFS or as an iSCSI target. Only volumes are shareable as an iSCSI target and our file system is not a volume so we can only share it with CIFS or NFS. For this example, I will share the file system using CIFS:
# zfs set sharesmb=on test/home/simon # zfs get all test/home/simon NAME PROPERTY VALUE SOURCE test/home/simon type filesystem - test/home/simon creation Sat Mar 8 20:18 2008 - test/home/simon used 19.5K - test/home/simon available 3.75G - test/home/simon referenced 19.5K - test/home/simon compressratio 1.00x - test/home/simon mounted yes - test/home/simon quota none default test/home/simon reservation none default test/home/simon recordsize 128K default test/home/simon mountpoint /test/home/simon default test/home/simon sharenfs off default test/home/simon checksum on default test/home/simon compression off default test/home/simon atime on default test/home/simon devices on default test/home/simon exec on default test/home/simon setuid on default test/home/simon readonly off default test/home/simon zoned off default test/home/simon snapdir hidden default test/home/simon aclmode groupmask default test/home/simon aclinherit secure default test/home/simon canmount on default test/home/simon shareiscsi off default test/home/simon xattr on default test/home/simon copies 1 default test/home/simon version 3 - test/home/simon utf8only off - test/home/simon normalization none - test/home/simon casesensitivity sensitive - test/home/simon vscan off default test/home/simon nbmand off default test/home/simon sharesmb on local test/home/simon refquota none default test/home/simon refreservation none default #
Note that the ‘sharesmb’ property has the value ‘on’ now, so it is now shared as a CIFS share. We didn’t specify a name to use as a share, so let’s see which default share name ZFS has assigned for us:
# sharemgr show -vp default smb=() nfs=() zfs zfs/test/home/simon smb=() test_home_simon=/test/home/simon #
So we can see here that the share name assigned is ‘test_home_simon’. We could easily have specified our own preferred share name when we set the ‘sharesmb’ property to ‘on’ earlier, if we had wanted to.
Now let’s ensure that the Solaris SMB service is running, so that this share will be visible from any connected client machines:
# svcadm enable -r smb/server svcadm: svc:/milestone/network depends on svc:/network/physical, which has multiple instances. # svcs | grep smb online 15:49:20 svc:/network/smb/server:default
I used CIFS to share the file system in this example. When I have made previous experiments using NFS, I have noticed disappointing write speeds from the Mac to the Solaris fileserver and, from what I could find out, it seems that there are some issues with NFS shares of ZFS filesystems, which result in fairly slow write speeds. I think the problem related to NFS requiring some kind of acknowledgement during write operations, and this caused slow performance. But don’t quote me on this, as I was unable to satisfy myself that I had found the definitive answer to this problem, and it may just be that the Mac OS X’s NFS implementation is flawed.
Update 30/03/2008: extra steps required for CIFS setup
I noticed some differences between OpenSolaris Nevada build 82 (SXCE) and build 85 (clean install) regarding CIFS shares.
Existing working CIFS shares from build 82 didn’t work after moving to build 85.
In order to try and see why, I looked up CIFS guide here:
where it says:
“The Samba and CIFS services cannot be used simultaneously on a single Solaris system. The Samba service must be disabled in order to run the Solaris CIFS service. For more information, see How to Disable the Samba Service.”
So, disable samba:
# svcs | grep samba maintenance 20:35:42 svc:/network/samba:default # svcadm disable svc:/network/samba # svcs | grep samba #
When I try to access the share from the Mac using autofs (smbfs:), I see the following message in /var/adm/messages:
Mar 27 20:38:48 solarisbox smbd: [ID 653746 daemon.notice] SmbLogon[WORKGROUP\simon]: NO_SUCH_USER
So, something changed?
# smbadm join -w WORKGROUP Successfully joined workgroup 'WORKGROUP' #
Edit the /etc/pam.conf file to support creation of an encrypted version of the user’s password for CIFS.
Add the following line to the end of the file:
# vi /etc/pam.conf other password required pam_smb_passwd.so.1 nowarn
Specify the password for existing local users.
The Solaris CIFS service cannot use the Solaris encrypted version of the local user’s password for authentication. Therefore, you must generate an encrypted version of the local user’s password for the Solaris CIFS service to use. When the SMB PAM module is installed, the passwd command generates such an encrypted version of the password.
# passwd simon
Now it works again, after reinitialising the client’s autofs (on the Mac for me).
Configuring client machine access to the file system share
Now I’m going to configure my Mac so that it can access the CIFS share and read and write to it. As I’m using Mac OS X 10.5 (Leopard), autofs is available to auto mount specified file systems.
We’ll configure the autofs configuration files in /etc to use the share we created on the ZFS fileserver. Perform the following steps as root user on the Mac, or other OS/machine supporting autofs.
First create a mountpoint directory where our share will reside:
sh-3.2# mkdir /shares sh-3.2# cd /etc sh-3.2# ls -l auto* -rw-r--r-- 1 root wheel 67 Oct 10 06:53 auto_home -rw-r--r-- 1 root wheel 236 Feb 24 15:00 auto_master -rw-r--r-- 1 root wheel 164 Oct 10 06:53 auto_master.org -rw-r--r-- 1 root wheel 319 Mar 1 14:49 auto_smb -rw-r--r-- 1 root wheel 89 Feb 19 15:36 auto_zfs -rw-r--r-- 1 root wheel 1755 Feb 24 14:59 autofs.conf -rw-r--r-- 1 root wheel 1759 Oct 10 06:53 autofs.conf.org sh-3.2# sh-3.2# vi auto_master
Now add the following line to the end of the ‘auto_master’ file:
# simon's additions /shares auto_smb -nobrowse
This specifies that all paths for shares specified in the ‘auto_smb’ file will be relative to the /shares directory. Now let’s create the ‘auto_smb’ file to specify the relative mountpoint for accessing the file system we shared from the fileserver:
# vi auto_smb test_home_simon -fstype=smbfs ://simon:password@fileserver_ip_address/test_home_simon
Save the file.
Now be sure to check that the permissions are correct on this ‘auto_smb’ file, or you may leave passwords visible!
Now that autofs has been configured to access the shared file system from the fileserver using CIFS, we can cause the Mac to remount any specified file system shares:
sh-3.2# automount -vc automount: /net updated automount: /home updated automount: /shares mounted sh-3.2#
Within the Solaris file manager UI, you may need to set the attributes within the ‘Permissions’ and ‘Access List’ tabs of the properties for the /test/home/simon directory. After that, you may need to restart the Solaris machine (or probably just restart relevant services), and possibly the client machine to ensure it gets the new properties for the share.
Now let’s see if we can read the file ‘readme.txt’ that we created on the fileserver:
Macintosh:~ simon$ cd /shares Macintosh:shares simon$ ls -l drwx------ 1 simon wheel 16384 Mar 8 20:24 test_home_simon Macintosh:shares simon$ cd test_home_simon Macintosh:test_home_simon simon$ ls -l total 1 -rwx------ 1 simon wheel 16 Mar 8 20:24 readme.txt Macintosh:test_home_simon simon$ Macintosh:test_home_simon simon$ cat readme.txt This is a test. Macintosh:test_home_simon simon$
Voila, it’s worked. We successfully read the file hosted on the fileserver.
A little tip if you’re using the Mac’s Finder application to view your shares graphically is that you may need to restart it, as it seems to give a special ‘share’ icon to CIFS & NFS shares, and the icon seems not to be displayed correctly until you restart the Finder. Restart the Finder by holding down the ‘alt’ key and right-clicking on the Finder icon in the Dock. Then click the ‘Relaunch’ menuitem in the popup menu. Perhaps Apple needs to sync the ‘automount -vc’ with a repaint of the Finder app?
That has given you a simple overview of how to create a ZFS storage pool, how to create a file system within the pool, and how to share the file system with another machine across the network using CIFS. Time for a beer to celebrate! 😉
You may find the following links interesting:
- What is ZFS?
- ZFS: Getting started
- ZFS: Should You?
- ZFS Adds Exciting Twist to Mundane World
- ZFS best practices from the Solaris Internals authors
- Solaris ZFS Administration Guide (HTML)
- ZFS Administration Guide (PDF: updated monthly)
- ‘zfs: discuss’ forum
- ZFS community
- Rajeev’s autofs link
- Greenfly.org autofs link
And these inspiring blogs from some great Sun guys, which I learnt a lot from:
- Tim Foster’s ZFS articles including his ZFS Automatic Snapshot SMF Service
- Constantin Gonzalez of CSI:Munich – How to save the world with ZFS and 12 USB sticks fame — you have to watch this video! 🙂
- Tim Thomas’ ZFS, CIFS & NFS sharing articles
And last, but certainly not least, the two Sun guys who created the amazing ZFS:
The more you learn about ZFS, the more you appreciate what a true engineering marvel it is, so I have deep respect for Bill and Jeff, and all the other people that helped make ZFS a reality — congratulations to you all!
And I applaud Sun for encouraging their staff to create their own blogs, as it helps spread the good word about their products and projects, and it is done in a personal style that shows the blog authors’ enthusiasm for the subject.
And having started to know a bit about Solaris via ZFS, I am beginning to see what a great operating system it really is.
Now I’m off to look for a good hosting service that gives me a full root access Solaris account so I can snapshot my running system regularly and zfs send / recv snapshots of it to another geographical location for safety. I’ve seen Joyent.com, and if anyone knows of others, feel free to comment below.