A Home Fileserver using ZFS

For many people who use a computer, knowing where to store growing amounts of data can become tricky.

You start off with one disk, run out of space, buy a bigger one etc. And if you have a camcorder you’ll be generating gigabytes of data for every Mini DV tape you record. Also, you may have a digital video recorder attached to your TV and wish to permanently keep some of the programmes/films you’ve recorded. Now you’re talking hundreds of gigabytes, if not terabytes of storage that are required to handle all this data.

And then there’s the problem of backups… oh boy, this will be a fun project :)

Here’s a series of articles that tackle this tricky subject, where I describe the choices I made, the problems encountered and the solutions found during my quest to build my own ZFS home fileserver, or ZFS home NAS box (network attached storage).

  1. Home Fileserver: What do I need?
  2. Home Fileserver: Existing products
  3. Home Fileserver: I’ll use ZFS
  4. Home Fileserver: ZFS hardware
  5. Home Fileserver: ZFS setup
  6. Home Fileserver: Backups
  7. Home Fileserver: Suspend
  8. Home Fileserver: Trunking
  9. Home Fileserver: ZFS snapshots
  10. Home Fileserver: Backups from ZFS snapshots
  11. Home Fileserver: Drive temps
  12. Home Fileserver: RAIDZ expansion
  13. Home Fileserver: Active Directory Integration
  14. Home Fileserver: A Year in ZFS
  15. Home Fileserver: ZFS File Systems
  16. Home Fileserver: OpenSolaris 2009.06
  17. Home Fileserver: Media Center
  18. Home Fileserver: Mirrored SSD ZFS root boot
  19. Home Fileserver: ZFS boot pool recovery
  20. Home Fileserver: Handling pool errors

A quote from Paul Venezia, InfoWorld: (click the picture to go to the full InfoWorld article)

It’s not every day that the computer industry delivers the level of innovation found in Sun’s ZFS. The fluidity, the malleability, and the scalability of ZFS far surpass any file system available now on any platform. More and more advances in the science of IT are based on simply multiplying the status quo. ZFS breaks all the rules here, and it arrives in an amazingly well-thought-out and nicely implemented solution.

We’re talking about a file system that can address 256 quadrillion zettabytes of storage, and that can handle a maximum file size of 16 exabytes. For reference, a zettabyte is equal to one billion terabytes. In order to bend your mind around what ZFS is and what it can do, you need to toss out just about everything you know about file systems and start over.



ZFS: Best Filesystem, InfoWorld 2008 Technology of the Year Awards: Storage

For more ZFS Home Fileserver articles see the following categories: ZFS, Storage, Fileservers, NAS.

Popularity: 100% [?]

Share and Enjoy:

  • RSS
  • del.icio.us
  • StumbleUpon
  • Digg
  • Twitter
  • Mixx
  • Slashdot
  • Technorati
  • Facebook
  • NewsVine
  • Reddit
  • Google Bookmarks
  • LinkedIn
  • Yahoo! Buzz
  • email

96 Responses to “A Home Fileserver using ZFS”

  1. Hi,
    very interesting project – can you tell me which version of solaris did you use ? there seems to be Solaris 10, Open Solaris, Solaris Developers Edition …. Which one is the most suitable for a stable, reliable fileserver ? I’m after data integrity (zfs), redundancy (raidz) and something simple to admin that “just works!”

    thanks
    CW

  2. @Chn Wng: I used Solaris Express Community Edition: get it here:

    http://opensolaris.org/os/downloads/sol_ex_dvd_1/

  3. Thank you for this. Your walkthrough persuaded me to give ZFS a try over unRAID and (heaven forbid) WHS.

    One of the few things that originally had me leaning towards a WHS machine was the availability of a simple plug-in for implementing off-site backups using Amazon’s S3 service.

    I found this and plan to give it a go:
    http://developers.sun.com/solaris/articles/storage_utils.html

    Have you tried something similar? I appreciate the information on incremental snapshots.

    ZFS seems to be far and away the most powerful means for implementing a home NAS.

  4. Hi Sal, thanks for the feedback — I’m a sucker for compliments :)

    Amazon’s S3 could be interesting. Do you know the current prices they charge for storage? I didn’t try this yet.

    Yes, the snapshots for incremental backups seem a great idea.

    And I agree, ZFS does seem to be the most powerful thing right now for a NAS, and I don’t see anything replacing it any time soon.

    Good luck with ZFS!

  5. From what I’ve seen of people using Amazon S3 for off-site Windows Home Server backups, they pay only a few US Dollars a month for the service.

    This page has the info:
    http://www.amazon.com/gp/browse.html?node=16427261

    They charge for the amount stored and the amount transfered. I’d have to imagine that snapshots are generally very small for a system used this way (very few deletions and/or changes) so the transfer costs would be minimal.

    I don’t know how Amazon S3 compares to other services, but it seems to be the go-to service for WHS off-site backups. Good luck!

  6. Thanks Sal, I’ll check that link out.

  7. I’m surprised no one considers either NexentaOS or NexentaStor, a prebuild NAS using ZFS.

  8. Hi Joe, I’m aware of NexentaOS/NexentaStor but not used it so far. I’ve heard some good things about it, so it might be good to take a look at it one day.

    However, for this project, I wanted to try and use a standard OpenSolaris installation and build something completely from scratch, so that I could learn and understand how it all works.

    Also, I didn’t want to use a free version of a commercial product having a 2TB size limitation, if I recall correctly, in case I needed more space one day.

  9. Hey Simon, I’m still playing with ZFS on vmware while my hardware is on order. I like to familiarise myself with it as much as possible. There’s just one thing that I don’t seem to be able to find an answer to anyway and I wonder if you can help.

    I’ve got a zfs shared with smb, not samba as i understand it but CIFS. SMB authentication through pam is working fine and I can create files on the share perfectly. The only problem is that after a file is created, it is assigned no permissions whatsoever. I know that using samba I would use the “create mask” option, but seeing as this is CIFS I’m a little stuck. Google hasn’t brought me much success and the osol-discuss mailing list hasn’t thrown any answers either.

    Thanks

  10. Hi Matt, I think what you’re looking for is ACL and there is a complete chapter about how it works in the ZFS Administration Guide: http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf — see the chapter entitled ‘Using ACLs to Protect ZFS Files’. Hope this helps.

  11. I am going to try setting this up but don’t have much experience. My question is… can I use a bunch of different size drives for ZFS or do they all have to be the same. I will be using this to serve media throughout the home as well as store work docs so that I can access from any computer. Ie. a pool for each of these: movies, music, pictures, home videos, work
    My primary goal is to be able to serve these up quickly over my gig lan but I also don’t want to lose the data. I have 3 750gb disks and then I have 4 that range from 160 to 300. Would I just setup two different sets of pools? Ie. use the 3 750gb just like you set it up and then create pool using the 4 disks. sorry if this is really basic but I just don’t understand as I am very new to this.

    Thanks

  12. Simon, very nice writeup! I’ve just finished a similar project for my home file server. I wish that smartmontools would support SATA drives so that I can monitor HDD temps as you do.

    http://home.comcast.net/~robhensel/NAS/Solaris_10_NAS.htm

  13. Hi JV,

    the short answer is it depends what kind of pool configuration you want. If you want to create a simple RAIDZ1 configuration then it’s best to have the same size drives, probably the same make and model too — i.e. I create a pool comprised of a single RAIDZ1 vdev that uses three 750 GB drives.

    Or, if you had four drives, say two 500 GB drives and two 750 GB drives then you could create a pool composed of two mirrors, one mirror using the two 500 GB drives, and the other mirror composed of two 750 GB drives.

    The problem with trying to make vdevs from multiple different sized drives is that each disk will have the capacity of the smallest drive. For example, if you had three drives — a 160 GB, a 320 GB and a 750 GB drive, and you formed a RAIDZ1 vdev from these, ZFS will make all the drives 160 GB in size, so you’d only be able to create a vdev from these which has a capacity of 480 GB (3 x 160 GB).

    If you have a bunch of odd-sized drives, then perhaps you can use them to make a backup pool, and use no redundancy, thereby getting the full capacity of each drive. This is exactly what I did to make use of a bunch of old SATA drives I had — and I made a backup server from all these old drives.

    Hopefully that answers your question.

    Sorry, been a long day, and re-reading your question I might make the following suggestions:

    1. Make one big pool using the three 750 GB drives, probably best to use a single RAIDZ1 vdev — i.e. zpool create mypool RAIDZ1 disk1 disk2 disk3
    2. Once the pool is created, I would create 5 file systems within this pool for each of: movies, music, pictures, home videos, work. You can then assign properties to each file system independently. For example, you might decide to have only one copy of each file within the movies and music file systems, but you might decide to create 2 copies of each file within the pictures, home videos and work file systems, as they are content created by you which cannot be recovered if data is lost. For this see the ’set copies=2′ attribute for the file system. Some people prefer to create 2 disk mirrors for personally created content, as each disk in the mirror is an exact copy of the other one.
    3. See my comment above regarding using 4 different sized disks. Basically, you have 2 choices if you include a vdev in your pool comprising these 4 different-sized disks: (1) create a RAIDZ1 vdev but then ZFS will use the smallest disk size for all 4, or (2) create a vdev with no redundancy and be able to have the full capacity of all the drives added together — but the problem is that if there’s a an error you will probably lose the affected data permanently! For this reason, like I did, it may be best to do this (no redundancy) only for a backup server/pool, where you have the original data in the main server’s pool. It’s your choice.

  14. Hi Rob,

    Thanks! smartmontools does support SATA drives, but for some reason, the notation specifies SCSI. I know it works because I am using smartmontools with my SATA drives and the script here: http://breden.org.uk/2008/05/16/home-fileserver-drive-temps/

    I’ll take a look at your writeup once I’ve finished replying to this stack of comments… (returned from holiday) :)

  15. Hi Simon,

    you posted a problem concerning using “cp” to copy files at http://www.opensolaris.org/jive/thread.jspa?threadID=59201

    Are there any new information about that problem? Did you try the latest snv_91?

    Best regards
    Chris

  16. Simon, as I’ve said previously I’ve been following your setup almost exactly and so far I’m extremely happy with how things are going. The main way my deployment is differing from yours is that I’m using a windows 2003 Active Directory domain and i’ve linked my solaris box into that, using centralised permissions etc.

    I’ve got some information on my experiences with CIFS and Active Directory which you’re welcome to post here to complete your excellent docs on the subject :)

  17. Hi Chris, no I have no more news on that bug as I’m still on snv_87. However it looks like bug 6669134 was fixed in snv_90, so it’s possible that the bug has been fixed now. However, there were a number of similar-looking bugs that I listed in that post, and some of them have not been fixed yet, so it’s quite possible that the bug still remains in snv_91.

    If you try snv_91, it would be great if you can confirm whether you see the bug. Likewise, when I upgrade, I will note here if I find the bug has been fixed.

    Cheers,
    Simon

  18. Hi Matt, glad you’ve got it working nicely. And it’s good you’ve managed to make it work with Windows 2003 Active Directory Domain.

    Personally, I try to avoid using Microsoft products as much as possible, so this may not be useful for my setup, but I could link to your content if you’d like, once you’ve got it ready.

  19. Great write up.I appreciate it.I’ve been planning a new NAS set up and this really looks good.

    Al

  20. Thanks a lot Al ! Believe me, probably like yourself, I spent a lot of time hunting down the best storage solution, and ZFS is it. Enjoy!

  21. Hi Simon,

    I made an upgrade on my OpenSolaris 2008.05 machine. The kernel is now snv_91 and the problem is still not solved. Too bad!

    Best Regards,
    Chris

  22. Hi Chris, thanks a lot for your feedback regarding this bug. That’s a real pity that they haven’t fixed it yet. I have been too busy to learn how to use DTrace to debug this problem as the code in the nv_sata() driver is called by the ‘cp’ command for this bug.

    If you have the time and the knowledge to look at the possibility to run DTrace to see if anything can be discovered about this bug, see the last post from ‘bhorn’ here: http://www.opensolaris.org/jive/thread.jspa?threadID=59201#233503

    Otherwise, I will try to debug it, but it won’t be very soon, as I have a whole load of stuff to keep me busy right now. Keep in touch!

    Cheers,
    Simon

  23. I openend a bug. http://defect.opensolaris.org/bz/show_bug.cgi?id=2366
    I already got a hint how to debug this behaviour. I hope to get it done later today.

    Chris

  24. Good news Chris! I look forward to seeing where it hangs.

    Simon

  25. My question is are you using a full solaris install or something that only includes the ZFS, NFS and CFIS functionality maybe some iSCSI? It seems like all the home file servers are using the bloat approach, I’m curious if anyone has gotten it down in the 128MB ~ 64MB range for a true appliance style.

  26. Hi Simon,

    it’s so weird. Since a few days the bug does not appear anymore. We tried to reproduce the bug for the bugreport, but it just works. 5GB files. 16GB files. No problem at all. But I’m distrustful. Bugs like this one don’t disappear like nothing! ;-)

    Would be great if you gib snv_91 a try and maybe report your experience with the release. Maybe it’s fixed already.

    Chris

  27. Hi Dave, I’m using a full, standard OpenSolaris install — Nevada, build 87 currently. I believe Nexenta may be what you are looking for, but be aware that you only get a free version capable of running a 2TB array, last time I looked. If you will ever need to store more than 2TB then you will be paying to use this software. That’s one of the main reasons I chose to use the standard version of OpenSolaris.

  28. Hi Chris, that’s very strange, but I can bellieve it. I had the same thing happening when I was trying to reproduce the bug, and then the bug would reappear. This is not the kind of bug that you want to have in a system that is meant to be virtually bullet-proof. When I get some time I will reinstall with the latest Nevada install (b93 currently) and give it another try. I don’t think it is fixed in snv_91 as I seem to recall that I reproduced the bug with that version, and then went back to snv_87, but found the bug also in snv_87. I will let you know what I discover in the coming weeks after I upgrade.

    Simon

  29. Okay Simon. Looking forward to hear from you.
    A note on snv_93: I barely use it, but the ZFS web administration interface seems to be broken (again!). But ZFS boot in the installer is very nice! ;-)

    Have a nice weekend!

    Take care,
    Chris

  30. Hi Chris. Likewise, I don’t use the ZFS web admin interface — I just use the command line and scripts. I will be expanding the array this weekend, and if I get time after that, I might install snv_93. Have a great weekend too!

    Cheers,
    Simon

  31. Joel Simpson on July 20th, 2008 at 09:08

    Simon on June 26th, 2008 at 18:40 wrote:

    > Hi JV,
    > the short answer is it depends what kind of pool configuration you want. If you want
    > to create a simple RAIDZ1 configuration then it’s best to have the same size drives,
    > probably the same make and model too — i.e. I create a pool comprised of a single
    > RAIDZ1 vdev that uses three 750 GB drives.

    True.

    > Or, if you had four drives, say two 500 GB drives and two 750 GB drives then you could
    > create a pool composed of two mirrors, one mirror using the two 500 GB drives, and the
    > other mirror composed of two 750 GB drives.

    Using OpenSolaris / Solaris / SXCE / etc. you could partition the drives into 250GB chunks
    and then RAID-Z2 / Mirror the pieces back together. If you do it correctly then the different
    chunks on different drives / controllers will each protect one and other.

    Depending on your configuration you can add the 500GB drives to the 750GB drives and then
    mirror / raid them and end up with 1250GB of drive space with twice the speed (IOPS) or
    take the total of (2 * (500GB + 750GB)) 2500GB and use a RAID-Z2 9+2 to get a huge MTTDL
    and a lot of drive space.

    Lots of ways to slice it :) . Partitioning improves defrag times, it is not evil.

    Joel

  32. Hi Joel,

    Although zfs pool creation using slices is possible to do, it appears that it’s not regarded as a best practice, from what I have seen. The best practice seems to be to use complete drives. See: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pools

    There it says:

    Set up one storage pool using whole disks per system, if possible.
    For production systems, consider using whole disks for storage pools rather than slices for the following reasons:

    * The recovery process of replacing a failed disk is more complex when disks contain both ZFS and UFS file systems on slices.
    * In general, maintaining slices increases administration time and cost. Lower your administration costs by simplifying your storage pool configuration model.

    A pool created with multiple slices across disks is harder to manage than a pool created with whole disks.

  33. Hi Simon,

    if you use CIFS as a service don’t update to snv_93!
    http://www.opensolaris.org/jive/thread.jspa?threadID=65996

    I haven’t tested CIFS that much, but it seems to be a big problem.
    So if you not already updated to snv_93, wait for snv_94! ;-)

    Regards,
    Chris

  34. Hi Chris, thanks a lot for the warning about snv_93 — I will not try it :)

    Cheers,
    Simon

  35. snv_94 was released today! Hopefully the bug is fixed! ;-)
    I will install snv_94 on my filer, too.

    Best regards,
    Chris

  36. Thanks Chris, I hope the bug has gone this time — let’s hope.

  37. The bug is gone. :-)
    http://www.opensolaris.org/jive/thread.jspa?threadID=65996

    Let’s try snv_94! ;-)

    Regards,
    Chris

  38. I’ve been reading this and it’s one of the reasons i’ve chosen to build a computer from scratch and use opensolaris for its ZFS capabilities as my file/backup server. Does anyone know if the web utility for configuring ZFS is working or not? I saw it in a write-up about opensolaris developer express edition, but i think now that 2008.05 is out, it is not available. Are there any good books out pertaining to ZFS or opensolaris?

    thanks,

    brian

  39. Thanks Brian. Not sure about the web interface. Honestly, ZFS is so easy to setup with the command line, that I didn’t consider the web interface necessary. Good luck with your new system!

  40. Hi Simon. As we spoke about a while ago, I have written a small guide to integrating your Solaris ZFS/CIFS fileserver with Active Directory for centralised permissions. I would be happy for you to link it somewhere in your guide in case others are interested in this combination.

    It is the first document on my new wiki, you can see it at http://www.genestate.com/OpenSolaris:ActiveDirectory_Integration

    I would be happy to recieve any feedback people have on the guide and I will be expanding it as soon as I have some free time. It’s not really a unique guide, but it is a compilation of all the things I have found while working it out for myself :)

  41. Hi Matt,

    Fantastic news. If you could send me some summary blog text, I will use it to create a new post, and will link to your article from there to give the full details that you have written.

    Thanks a lot.

    Simon

  42. >I believe Nexenta may be what you are looking for, but be aware that you only get a free version capable of running a 2TB array, last time I looked. <

    Actually, you should be aware that the 2TB limitation is only on NexentaStor. NexentaOS and NexentaCore have no limitation on the size of any pool – they are completely open source, built on Debian and OpenSolaris. Get them from nexenta.org. The administration and reporting tools they’ve added to NexentaStor are closed source, and that’s where the 2TB limitation comes. If you already use just the command line and scripts, there would be no penalty to using NexentaOS or NexentaCore. (NexentaOS is the desktop version, NexentaCore the basic server version.) It’s supposed to run on lower-spec and a broader range of hardware.

  43. Hi Bryan,

    In fact it’s worse now for NexentaStor: they’ve reduced the limit to 1TB now for the Developer Edition, or instead you can choose the Free Trial Edition with no capacity limits stated but you have a 30 day time-limit. As the scope of this post is for a home fileserver/NAS, NexentaStor is not suitable, as the annual ongoing costs are prohibitive — it’s meant for businesses.

    Personally, I had no use for NexentaOS/NexentaCore as all that I needed was already in OpenSolaris, which is also free. Also, the APT features have now been included in OpenSolaris 2008.05, known as IPS (Image Packaging System), which you can use to download and install free security fixes. However, I believe that you have to pay for the ability to have full use of IPS for updating other packages (http://www.sun.com/service/opensolaris/index.jsp), and this appears to be an ongoing $324 payment. Again, as the scope of this post is for a home fileserver/NAS, this is unacceptable. For businesses, that may be another matter.

    However, if you’re saying that NexentaCore/NexentaOS are free, have no restrictions of any sort, and allow easy updating via APT then it could be interesting.

    But I do like the fact that by using the basic vanilla OpenSolaris, I know exactly what I’m using (which version etc), and can fire off questions to the great enthusiasts in Sun’s OpenSolaris ZFS forums, for example, or general driver questions, and there’s fewer external things to depend on. Anyway, for what I’ve needed here, this approach has worked fine so far.

    Simon

  44. Hi Simon,

    Fantastic job promoting ZFS you are doing! No one wants a lesser security than ZFS for their data after reading you!
    A few things you may want to talk about or give pointers to, for us solaris beginners:
    - how to name shares
    - how to setup access rights to shares (i’d like read as default, write with p/w prompt)
    - how to access the server interface from remote boxes (à la x11/RDC/VNC)
    Thank you,
    Chris

  45. Have you heard about the SUN Enterprise storage server 7000?

    An ordinary disk has like 200-300 IOPS. There is an extremely fast Savvio disk that exceeds 400 IOPS. Tradiditionally, you buy lots of 15000RPM disks to reach high IOPS. The SUN 7000 storageserver can reach 250,000 IOPS (in extreme cases). Watch the extremely cool video on post marked “31 dec 2008″:
    http://blogs.sun.com/brendan/

    The thing is, the 7000 reaches sick IOPS because it uses many SATA 7200 RPM disks + ZFS + SSD drives as a cache. It is the SSD drives that acts as a cache, that does it.

    Here is the point: you can add a SSD drive to your existing ZFS raid as a cache and reach high IOPS. ZFS will administer everything for you, you just type in:
    #zpool add cache device
    or something like that.

    Imagine 10 SATA discs in raidz2 (similar to raid-6) and one or two SSD drives as a cache. Each Vista client reaches ~90MB/sec to the server, using Solaris CIFS and iSCSI. So you want to use iSCSI with this. (iSCSI allows ZFS to export a file system as a native SCSI disc to a desktop PC. The desktop PC can mount this iSCSI disk as a native SCSI disk and format it with NTFS – on top of ZFS with snapshots, etc. This is done from desktop PC bios).

    Now, you install WinXP on the iSCSI ZFS volume, and clone it with a snapshot. Then you can boot from the clone, with the iSCSI volume on a desktop PC. Thus, your desktop PC doesnt need any hard drive at all. It uses the iSCSI volume on the ZFS server as a native SCSI disk, which has WinXP installed.

    This way, you can deploy lots of desktop PC in an instant. Using the cloned WinXP snapshot. And if there are problems e.g virus, just destroy the clone and create a new one in one second.

    With 10 discs in ZFS raid, you maybe reach ~600 MB/sec. And then, you add SSD disk as a cache, and the IOPS ramp up very much. This solution is very very secure (ZFS + raidz2) and very very fast (SSD disk provides lots of IOPS) and very very cheap (7200 SATA discs, no expensive HW raid) and very very easy to manage (ZFS + iSCSI + CIFS is extremely easy). What do you think about this solution? Quite extreme?

    SUN thumper with 48 discs, reaches 2GB/sec write speed, and 3GB/sec read speed. You surely need one or several 10Gbps NIC on your server.

  46. Hi Chris,

    Thanks for the compliment!
    And you’re right: one needs to have trust in the integrity of one’s data, and ZFS seems to give that.
    I will consider your points for future posts.

    Cheers,
    Simon

  47. Hi Kebabbert,

    Yeah, that SUN Enterprise storage server 7000 gear looks pretty nice stuff for a business.
    Do you mean the funny video in this post?: http://blogs.sun.com/brendan/entry/unusual_disk_latency where he shouts at the array?
    Did you try building a setup where Windows boot image is stored on the ZFS fileserver?

    Cheers,
    Simon

  48. Can you provide it as a single, printable page ?

  49. Cheers Simon,
    Another thing you may want to touch on: what if the mobo dies? Can one just connect the sata disks to another board, install Solaris, and Solaris will recognise the ZFS structure om the disks? That would be the ultimate security.
    My worry with raid is: the raid controler dies and the new one knows nothing about the raid’ed disks and just prompts to partition the “empty” array.

    Thanks again for your insight :)
    Chris

  50. Simon,

    Thanks for a very interesting series of articles, I wandered over after seeing your link in the comments on the reg last Friday and ended up looking into it over the weekend :) I threw together some old hardware just to see if I could get it working and I can’t believe how easy it was, especially since I’ve never touched Solaris before.

    I’ll be very interested if you add any more to the series. Unfortunately the site linked in the AD Integration article seems to be broken.

    regards
    Phil

  51. Hi Chris,

    No problem if the mobo dies, just connect up the drives that form your ZFS pool to a new compatible system, and the ZFS pool drives will automatically be recognised on next boot — ZFS records a special signature on the drives so the OS can recognise them. If you plan to do a graceful transfer of pool drives to another system/OS install, then you can type ‘#zpool export pool_name’, shutdown, disconnect drives from old system, install into new system which has the OS installed on another drive, and then just type ‘#zpool import pool_name’.

    In fact it gets even better — as ZFS is endian-neutral, you can transfer ZFS pool drives from any-endian (little-endian / big-endian) system. When I first experimented with ZFS it was in a Mac Pro. Later, I moved the drives into a new home-built system running Solaris and used the same pool without any changes, and all the data remained available :)

    So far, I have my OS installed on an old IDE drive outside the pool, and so OS and data are completely separate. If the boot drive dies, I’ll just chuck another one in the case and install Solaris again and be up and running very quickly.

    As Solaris is open source, and there is no hardware RAID controller to worry about, your data is about as safe as it can ever be :)

    Cheers,
    Simon

  52. Hi Phil,

    Yep, it’s pretty easy to set something up, like you say!
    I notified the author of the AD article linked to, so hopefully it’ll get fixed soon — thanks for pointing it out.

    Cheers,
    Simon

  53. Simon: I have to add my voice to the chorus of thank yous. OpenSolaris intriged me for a home file server system and your articles do a good job of explaining of the possibilities and what to do.

    While I realize you did not write the article as an invitation for support questions, it would be appreciated if you or someone could clarify some ZFS points for me.

    Is ZFS only meant for high end use cases? I thought one advantage to using it was that you can add drives to a storage pool and it all becomes transparent to the user, just more storage space. So for example, four or five small hard drives could be connected together and it would act like one large drive. (One advantage of this would be that a very large file might be bigger than one of the smallest drives, but under ZFS that would be OK.)

    Now I wonder if all my assumptions were wrong. It seems that one can use ZFS in a way that is great for data recovery but drastically reduces the storage space (RAID-Z, mirroring, etc), can use it to pool together many disks (as mentioned above) giving you great storage abilities, but the failure of one drive ruins all the data, or you can use it in a “normal” way (one disk=one pool) giving no
    advantages at all.

    Is there something I am missing about ZFS? It seems that there are no advantages unless one cares soley about redundency and has unlimited resources for disk drives. That is to say not really a home file server. (Not to say that just because the server is in the home does not make the data unimportant.)

  54. Hi Davros,

    Thanks a lot!

    You can use ZFS with or without redundancy, the choice is yours. However, as ZFS is all about data integrity, it makes sense to use redundancy for helping to prevent data loss.

    I have a fileserver that uses redundancy in the form of a single RAID-Z1 vdev, so that if one drive out of the four fails, the data will still be accessible, allowing me to replace the failed drive and rebuild it from the existing drives. In my case, the capacity of three drives is available for data, and the capacity of one drive is used for parity data (redundancy).

    Redundancy is not wasting drives, although it may seem that way at first. The more redundancy (insurance) you build in to your data array, the more you are likely to avoid data loss. The amount of redundancy you choose to use depends on your budget or the importance of your data — i.e. how keen you are to prevent its loss.

    I also have a backup server that uses no redundancy. It uses a collection of old, different-sized drives to form a large backup pool.

    ZFS has many other advantages, including easy administration and providing seamless access to vast amounts of data without having to use clumsy volume managers. Also, snapshots are extremely powerful, allowing you to snapshot your data to perform full & incremental backups easily, roll back the file system to before your failed OS upgrade, recover previous edit states of any file, or even clone whole file systems.

    ZFS also allows you to specify that a file system automatically and transparently creates 2 or 3 copies of every file within it. This may be useful for any data that is critical, such as user-generated content (photography, video, source code repositories, documents, spreadsheets etc). This means that ZFS can read the same file from multiple distinct locations off the drive surface in the event that, for example, bit rot has destroyed data within any file. This feature is called ditto blocks.

    ZFS also allows hot spares to be included within your data array.

    Using the above info, you have the ability to build a pretty much bullet-proof system. For example, you could build a storage pool consisting of a RAID-Z2 vdev, which uses the capacity of two drives for parity. You could also include a hot spare which ZFS will use automatically in the event of a drive failure. This means that your ZFS array can now rebuild itself automatically when a drive dies. Combining this array configuration with selected use of ditto blocks for critical file systems should create a system that should be extremely unlikely to ever lose any data. Then, to keep this system protected, you would setup automatic and frequent snapshots, send the differences between the snapshots regularly to a backup pool/server, and be sure to scrub the array with a frequency to suit the usage pattern of the system.

    To see all of ZFS’ significant advantages, see here:
    http://opensolaris.org/os/community/zfs/docs/zfs_last.pdf

    I hope this answers your question.

    Cheers,
    Simon

  55. Simon:

    Thank you muchly for responding. To read “ZFS is all about data integrity” puts more of what I learned in context. It is all very impressive and probably a no-brainer for use in large data centres. I think I still have to think about it for home use though. This is a home backup server. Going to it to retrive files means something terrible has already happened. While something terrible could happen twice simutanouslty it is very unlikely and stuff so important that one must take such a castatrophe into consideration would be archived onto optical discs and stored away from the computer. However the fact one can spend the extra money on hard disks and get such a great level of protection is appealing just to say I did it.

    Thank you again for responding to my post. I do have a question still though. If I have a “a collection of old, different-sized drives to form a large backup pool” that uses no redundancy like you do in one case and like I am contemplating and one of those drives dies do I loose all the data in the pool or just some of it?

    (Also, I assume redundency could not be added to that pool latter as the disk sizes affect the set-up of RAID-Z and other redundency schemes. I suppose when I can obtain more disks I could set-up a new pool and copy the data over.)

  56. Hi,
    A very intersting blog.
    I’m trying to do something with ZFS but I’m now struggling.
    Can someone please put me on the straight and narrow with regard to how use ZFS over iSCSI can/should work, as I can’t find the answer I need anywhere.

    I’m trying to build a ZFS target on Opensolaris essentially as a vault for my HTPC media collection. Also, although the (multiboot) HTPC primarily runs m$ Vista (spit!), I really want to get off m$ and onto Linux so my experimental iSCSI initiator is a Ubuntu 9.04 (Mint actually) boot.

    Anyway, I think I’ve managed to create an iSCSI relationship between the two (after a battle to figure out the different syntax between Open-iscsi on Linux and on OpenSolaris), but the initiator seems to want me to fdisk the iscsi volume next, because it apparently doesn’t recognise what ZFS is, whereas what I was hoping to be able to do is to just mount the ZFS volume and rsynch the disk(s) on my HTPC to it.

    Can someone please tell me:
    1. The log /var/log/messages has some errors regarding IPv6 resolution and block size (10240) exceeding 1024 but I took these to be irrelevant as the iscsi target exists in the initiator when queried with ‘iscsiadm -m node’
    2. Does the disk need to be specified as a raw device or something. I couldn’t figure out what circumstances would need me to specify a raw device rather than a disk.
    3. Should I be able to do what I’m trying to do, or do I have to wait for something like Openfiler/FreeNAS to add support for ZFS before this is achieveable.
    4. What about NexentaOS ?. I have steered away from this because as far as I can see, like WHS, it want to have control of all my disks.
    5. Is my goal achieveable and what are my options ?
    6. Who,what, where should I be asking this if not here.

    Many thanks
    Duncan

  57. Thanks Duncan.

    My first question would be to see why you want to use iSCSI, when you probably just need a simple CIFS share, or Samba if you use a Linux client and it doesn’t have true CIFS client support.

    If you think you really need iSCSI, then bear in mind that an iSCSI volume is a raw block device, so once the initiator connects to the target, you’ll need to format it using the file system you wish to use.

    Hope this helps and is correct, as I had to cast my mind back a year to my experiences with iSCSI.

    For my media centre/HTPC I am using a CIFS share of a standard ZFS file system, as described here:
    http://breden.org.uk/2009/05/10/home-fileserver-zfs-file-systems/

    Cheers,
    Simon

  58. Hello ,
    I had a Solaris 10 installation on a NAS server running a 4 1TB disks RAIDZ
    The OS was installed separately on a 40Gb HDD
    After a power failure the motherboard and the system disk died.
    Now I have new motherboard+cpu and a new hdd for OS.
    I would really like to know how can I rebuild the ZFS pool that I had on the previous system .
    Is there a chance to ‘mount’ the 4 disk array to a directory ?

    Thank you,
    Regards,
    Adrian

  59. Hi Adrian,

    As your data was not on the boot drive and in a separate storage pool, this should be easy to regain access to your data. Assuming your storage pool was called ‘tank’ then type:

    # zpool import -f tank
    

    You will probably need the -f option as the pool was never exported before the system motherboard and boot drive died.

    If you can’t remember the name of your storage pool, no worries, just type:

    # zpool import
    

    and Solaris will list all available pools.

    When ZFS imports the pool, it should automatically remount all ZFS file systems that the pool contains. It’s possible you might get some warnings/errors if you changed mountpoints if they don’t exist, but you’ll easily be able to fix those problems if you see them.

    Also, as you will probably need to re-create your Solaris user(s), you’ll need to see which user ids and group ids you used within your file systems and recreate the users and groups with the same ids. Something like this:

    # groupadd -g 501 fred
    # useradd -g fred -u 501 -s /bin/bash -d /export/home/fred -c fred-flintstone -m fred
    # passwd fred
    

    Explanation of useradd parameters used above:
    -g fred: adds user to primary group ‘fred’ (which has groupid 501)
    -u 501: creates the userid 501 for this user
    -s /bin/bash: assigns the default shell to be bash for this user
    -d /export/home/fred: defines the home directory
    -c fred-flintstone: creates the comments/notes to describe this user as required
    -m: creates the home directory for the user
    fred: this is the login name for this user

    Hope it helps.

    Cheers,
    Simon

  60. hey Simon, don’t use the AOC-SAT2-MV8. At first the driver crashed all the time, couldn’t handle hotplug, and didn’t work with NCQ. supposedly the driver is better now than it used to be, but it’s still the less popular card, and a few times they said the driver was all fixed and perfect when it wasn’t so i don’t know what to believe anymore. Instead of this card, if you need PCIX, use lsi3080x-r:

    http://www.provantage.com/lsi-logic-lsi00165~7LSIG06Q.htm

    but most freshly-bought systems will find a PCIe motherboard cheaper, in which case you can use AOC-USAS-L8i, an 8-lane PCIe card. though you’ll have to remove the bracket because supermicro’s put it on backwards.

    I think you can also use Dell PERC cards with the mega_sas driver, which unlike the other two cards above this one has an open-source driver (however, it still has big closed-source RAID-on-a-card firmware burned onto the card, and a closed-source MegaCli tool for configuring the RAID). Dell PERC is cheap on eBay.

    finally, if you can get rid of all your expee crap and use Macs, NFS works much better on Mac OS X than cifs, especially if you set it up like this:

    http://web.ivy.net/~carton/rant/macos-automounter.html#9050149

    it’s faster, more fault tolerant, case-sensitive, and more.

  61. Thanks Miles, so you’d say there may still be problems associated with the AOC-SAT2-MV8. That’s a pity, so if I choose this card I’d better be sure to check the current situation from the solaris forums to see what people are finding.

    When you say ‘at first the driver crashed all the time’, was this recently or a long time ago, as I believe there were many problems with the standard ‘SAT’ card (i.e. AOC-SAT-MV8), but that many of these problems were fixed in the later ‘SAT2′ card. I saw that info here, and he also says that the AOC-USAS-L8i (LSI megaraid chips?) would be his next choice — see here:
    http://zpool.org/2008/12/16/my-zfs-media-server

    He says:

    This card [AOC-SAT-MV8] has had a lot of problems with ZFS and has been discontinued by Supermicro in favor of the AOC-SAT2-MV8. Whenever I expand beyond six disks, I’m going to skip straight to the AOC-USAS-L8i using mini-SAS to SATA cables.

    I will take a look at your Automounter notes next. If I remember, the reason I used CIFS sharing instead of NFS was because of lousy write speeds (5MB/s with NFS instead of about 40MB/s with CIFS (50MB/s now with OpenSolaris 2009.06 and OS X 10.5.7 using 1 x GbE)), but maybe this has been fixed now. I think it was because of NFS using synchronous writes if I remember correctly. These quoted speeds were achieved writing from a Mac Pro to the ZFS file server.

    Thanks a lot for the info. I have created a new thread on the Solaris forum with more questions/info — see here:
    http://www.opensolaris.org/jive/thread.jspa?threadID=106210&tstart=0

  62. I am very interested in trying this compared to my Win2k3 server as I want to expand now and want to have an easy expandable solution, but I really cant figure out how ZFS (In particular Raid-Z) works when you want to increase the volume size other than the option to upgrade a current drive with a bigger drive.

    I want to be able to add many more drives to my system, and I think I read in this article that you can not add a single drive at a time, but how many do you have to add then and will you utilize 100% of their discspace or?

    I am also trying to figure out if I should run Freenas or Solaris

  63. Hi Mads,

    This sounds like a good idea for a new blog post — thanks for the idea :)
    Quick rundown: with ZFS you have the concept of a storage pool. The pool is comprised of one or more vdevs. A vdev is a collection of drives in a particular configuration — e.g. mirror, RAID-Z1 or RAID-Z2 etc. Currently a vdev of type RAID-Z1 / RAID-Z2 cannot be expanded, but Sun say they plan this functionality for the future (see http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z). However, you can expand your pool as much as you like, by simply adding new vdev(s) when you need to, without any problem.

    Knowing the above, for now, it’s best to decide how much space you’ll need and build in plenty of storage up-front. Just as an example, it is possible to break a 3-drive RAID-Z1 and recreate a 4-drive RAID-Z1, for example, as you can see here:
    http://breden.org.uk/2008/09/01/home-fileserver-raidz-expansion/

    Cheers,
    Simon

  64. Well my issue is money which is why I wanted a cheap way to add drives (cheap raid cards, cheap discs and easy system (ZFS)) But I guess I can be forced to making Raid-Z1 arrays of 5 drives for now and then when ZFS is upgraded to support expanding then I can make one big Raid-Z2 array.

    Btw would you say going raid5 would be a bad idear compared to ZFS when I dont have 24/7 rated hard drives?

  65. How much do you want to spend on your storage in total? In this price I mean just the controller and the drives. And how much usable storage capacity do you want? And what level of redundancy do you want — i.e. how many drive failures do you want your system to survive?

    You don’t need RAID cards for ZFS, quite the opposite. ZFS controls everything, and so you should use a JBOD setup. But you might want to buy a SATA controller to control your array of cheap SATA drives.

    If I were you, I would assume that RAID-Z expandability is not high on the ZFS developers’ priority list, as they have stated this openly, and so I would just assume that you need to create an array with sufficient capacity for your foreseeable needs.

    Personally I would not trust RAID level 5 for anything. AFAIK, RAID 5 does not use block checksums and so it has no way of knowing if there is ‘bit rot’ when it reads data back. RAID level 5 writes parity data to allow failed drives to be rebuilt, but if it can’t tell you that some bits in a file have been flipped then what good is it to anyone? Check out various CERN and Google reports on parity errors if you need convincing further.

    ZFS was designed with the idea that you should never trust drive hardware, and so that’s why it has end-to-end data integrity built in from scratch, to help (1) be sure that what gets written and read back is actually what you told it to write, and (2) immediate detection of ‘bit rot’ on reading back of a file or during regular weekly/monthly ’scrub’ operations where ZFS will read back ALL the blocks of every file in the storage pool and will check them against their block checksums, and if a difference is detected, ZFS will fix the corruption immediately from available parity data, assuming the vdev containing the corrupted file has redundancy — i.e. mirror, RAID-Z1 or RAID-Z2.

    Hope this helps.

  66. I currently have a cheap motherboard with a Phenom X3, then I have 3 WD GP 1TB drives and a 500GB WD GP RE3 for system. I dont want to loose the data I already have on my 3TB storage (Which is what runs win2k3)

    But atm Iam ordering 2 x LSI http://www.lsi.com/obsolete/megaraid_sas_8308elp.html and 2 x Compucase 5in3 Hotswap bays. Then I will have 22 sata ports in total which should last me a good while as I wont buy drives for all this. Will probobly buy 3-5 drives now and then upgrade on the way. I just found a good deal on that hardware.

    But I am going to use ZFS, just need to buy big chunks of drives then (4-5) when I cant just add a single drive at a time.

    As I havent used solaris before I am tempted to try to just use FreeNas as that also supports ZFS now, just not sure these raid controllers support FreeBSD but they have support for solaris.

  67. IMHO OpenSolaris 2009.06 using ZFS is very user-friendly, and you’ll know you’re using the standard ZFS distribution. Good luck.

  68. Hi Simon,

    I have a Home Server with Windows 2003 Server (Domain Controller) with Hardware-RAID5. There are about 4 Windows Clients connected to this server.
    Primary use of this server is file storage (huge collection of DVDs, images, mp3 collection, pdf scans and documents), some applications and web server.

    I’m a windows guy and have only minor knowledge in linux/OS, but I REALLY WANT the superior ZFS. :-)

    Is it possible to run OpenSolaris with ZFS and within this system a virtualized Windows 2008 Server (with Active Directory)? I’d like to setup OS and ZFS as a (hopefully) stable “install-and-forget” system. Windows Server would be see the ZFS-Pool as a huge virtual drive and shares this drive to its Windows clients. So all my Windows clients would be connected to the Windows Server only. They (and Windows Server) don’t know of any OpenSolaris Server.
    With this scenario I could continue to work with Windows Server, but can trust all my datas to the ZFS.

    Is this possible? If yes, is there any performance bottleneck when transfering huge files from the clients to the server (or vice versa)?

    Most time, my home server is idle and not very busy. (2 concurrent users)

    For my home environment, I don’t want to run two dedicated server (OS for storage and Win2008 for Domain), so I really would like to virtualize it.

    What’s your opinion?

    And thanks for your great site. Lots of very useful information!

  69. Hi Rick,

    If you have basic Linux knowledge then you should find it relatively easy to use OpenSolaris too. You can find the answer to any admin problem online: Google, OpenSolaris forums and blogs etc.

    For using Active Directory and Solaris, see the AD link at the top of this page.

    Regarding bottlenecks, the network will be the bottleneck when transferring big files, but even with commodity Gigabit ethernet you should get speeds of around 50 MBytes/sec, which should be sufficient. If you need serious speed, look at 10GbE or Direct Attached Storage (DAS) using tech like InfiniBand, which is used with an HBA, and is often used by video editing software to connect to moderately fast storage devices like this, which easily reach speeds of around 500+ MBytes/sec sustained using 8 drives in a redundant array.

    Regarding running 2 boxes at home, you can probably ditch Windows and use some Solaris directory service, if you need it, but I didn’t look at this.

    Good luck,
    Simon

  70. Hi Simon!
    I have also built a file server using Opensolaris myself, but i have some problems with implementing features i would like to use.
    First i would like to find a good failure notification script which sends me an email
    when a drive in my pool fails. I have found some scripts but i have problems with the mail-sending
    part.
    I want to use my mail account provided by my ISP to send mail but i havent found a guide how to configure this???

    Second i would like to configure an ftpserver so i can access and transfer files from outside my home network. I really havent found a good guide to do this….
    The nas is connected to a router with the firewall enabled, is this enough?
    Should i configure the “ipfilter” in opensolaris too? Is there a guide somewhere for this?
    I have tried to look at man pages but i havent found anything useful…
    As you can see im a unix beginner. I think Opensolaris is a great OS, but its not very userfriendly and
    information on how to set up certain bits is not really abundant.
    Maybe im looking in the wrong places?

    Im now configuring/administrating/checking health on this machine via PuTTY SSH (which is pretty slow)
    and sometimes via VNC
    Isnt there a faster and easier way to do this??
    What web interface alternatives are there?
    I have found something called Webmin, is this the best one with the most features that you may need?

    Your blog has helped me a lot already, but maybe you can point me in the right direction regarding my
    questions?

    Thanks a lot!!
    Gurkman

  71. > Simon on September 2nd, 2009 at 15:44
    > IMHO OpenSolaris 2009.06 using ZFS is very user-friendly, and you’ll
    > know you’re using the standard ZFS distribution. Good luck.

    Simon, this unfortunately was very bad advice and I lost several days by installing and wrestling with Open Solaris 2009.06 for use as a file server.

    Both smb/server and network/samba are _badly_ broken and as I later learned the hard way Sun admins are fully aware of the situation but did not publish any warnings on the download page.

    With all the problem reports google brought up imho Open Solaris is the testbed for Sun and success with such distributions heviliy depending on luck and local weather conditions.

    My conclusion: Unless you are prepared for a long and ugly struggle: Avoid Open Solaris!

    someone

  72. I presume you tried one of the two-weekly bleeding-edge builds? Which version? Sounds like you got a bad one.
    It’s true there can be (serious) bugs with these bleeding-edge builds, so it pays to check known bugs and read the forums.
    I use the Boot Environments feature of OpenSolaris to rollback from a buggy update if I find one. I found build 124 to be OK for me, and recently I updated to build 129, and now Apache doesn’t work, so I’ll probably zap the build 129 and re-instate build 124 using the Boot Environment panel on the desktop. One click easy.

    If you’re more comfortable with stable, infrequent releases, try Solaris 10:
    http://www.sun.com/software/solaris/get.jsp
    http://www.sun.com/software/solaris/

    Choosing which version of Solaris to use is always a tradeoff: (1) OpenSolaris=frequent releases containing new features under development or (2) Solaris=infrequent releases, but heavily tested and rock solid.

    Cheers,
    Simon

  73. May I ask you for a pdf with your complete “tutorial”. Thanks :)

  74. Hi Simon,

    thanks for your information on OpenSolaris and your Home Server Setup and also for answering all the questions from the comments. So I hope you can also answer my little ones.

    I can think of a scenario with one system drive and 4 data drives bound to a raidz1. Now my two questions:

    a) How should one backup the system drive? Doing snapshots to the raidz1 pool and in case something bad happens with the motherboard or the system drive put everything back to a new system drive? How would this scenario work in real life?
    Is it possible to snapshot the whole system drive and than to clone it back to another disk so that it is ready to boot? What needs to be done to achieve this?

    b) Think of one part of the raidz1 pool is dedicated as time machine backup place for a mac and thus exported via iSCSI as a block device to the Mac system. Also deduplication is switched on. What needs to be done to recover the Mac if its disk fails. How do I get the backup back on the new Mac disk. I think I can’t clone it directly back to the new Mac disk?

    Thanks for your help and keep up the good work,

    Jan.

  75. Brand Howard on May 18th, 2010 at 21:11

    I also would like a PDF copy of this OpenSolaris homer server build if you don’t mind. I find myself referring to this site and using it as documentation quite often.

  76. Hi Brand,

    Thanks for the request.

    What would you like to see in the PDF? All of the posts, with comments too?

    Cheers,
    Simon

  77. Hi Simon,

    I would like to thank you for making this tutorial. It has been very very good and has helped me many times in my setup of ZFS and Opensolaris.

    Cheers!

    ps. Can you also send me a PDF of all posts and comments thus far?

  78. @Rick, I am running a virtualized Win2008r2 on ESXi 4.1 with the VM files sitting on the NAS. They are made available to ESXi via iSCSI. The NAS is Nexenta Community Edition but this shouldn’t make any difference for your question. Works perfect!

  79. Hey Simon,

    Thanks for sharing your work and effort to date.
    With the info, I’ve been able to build my own – much appreciated.

    I am wondering (with the demise of Opensolaris) if you are considering migrating to another platform with a bit more life in it?

    I’ve installed 2009.06, which apparently has known issues with CIFS (just dies, requires poweroff/on to fix) and my (personally) NFS isn’t great for any sustained copies over 500MB.
    Looking into the CIFS (most used by me) issue, the suggestion is to “upgrade” to the Dev build…

    I am considering it (it looks pretty easy).. and or migrating the OS to a freeBSD-based distribution instead.

    thanks,
    Matt

  80. Hi Matt,

    Thanks and glad you managed to make your own NAS!

    If I were you, I would upgrade to build 134 of OpenSolaris, until the independent OpenSolaris forks are properly under development — they’re still in setup stage at the moment from what I’ve seen. Build 134 was the last 2009.06 version released and was to become the OpenSolaris 2010.xx that never got released. I use build 134 and it has worked fine for me. I use CIFS sharing to Mac OS X and have not noticed any problems.

    Once OpenIndiana has a few releases that work, I will probably try upgrading to it — see here:
    http://openindiana.org/download/

    I would stick with OpenSolaris or its forks (OpenIndiana/Illumos), as you have:
    1. The reference implementation of ZFS (less risk to your data in theory)
    2. OpenSolaris has BE’s – boot environments – that easily let you create multiple bootable working versions of OpenSolaris – take a look here for an idea of what I mean: Home Fileserver: Mirrored SSD ZFS root boot.

    Cheers,
    Simon

  81. too bad oracle had to get in on the picture.

    ellison = nazi

  82. Hi Jonny, according to this, it would appear that Ellison’s family were the ones running away from the Nazis.

    However, I agree it’s a pity that he has axed OpenSolaris. Luckily, the community will keep the OpenSolaris code going with OpenIndiana.

  83. Hi, your site inspired me to set up my home ZFS server. I have limited experience with OpenSolaris but I have been able to muddle through and it’s worked for many months very well. I ran into a problem where a couple drives were having issues at the same time and I was wondering if you might offer some of your expertise. I was able to put new drives in and get things working except now I have a checksum error in one snapshot file. I have tried to clear the pool and resilver. I have tried to destroy the snapshot which contains the error. I cannot re-silver both the drives without getting this error in this one file. I was wondering if there is a simple way to solve this problem since I really don’t want to get rid of the pool and start over again. I am using snv_134 build of OpenSolaris. Here is the output from Zpool status -v that shows the error. I would like to just get rid of the file and the error with it but I don’t know how to do that in a snapshot.

    pool: tank
    state: ONLINE
    status: One or more devices has experienced an error resulting in data
    corruption. Applications may be affected.
    action: Restore the file in question if possible. Otherwise restore the
    entire pool from backup.
    see: http://www.sun.com/msg/ZFS-8000-8A
    scrub: resilver in progress for 12h0m, 53.91% done, 10h15m to go
    config:

    NAME STATE READ WRITE CKSUM
    tank ONLINE 0 0 2
    raidz2-0 ONLINE 0 0 2
    c7t0d0 ONLINE 0 0 0 98.3G resilvered
    c7t1d0 ONLINE 0 0 0 98.2G resilvered
    c7t2d0 ONLINE 0 0 0
    c7t3d0 ONLINE 0 0 0
    c9d0 ONLINE 0 0 0
    c9d1 ONLINE 0 0 0

    errors: Permanent errors have been detected in the following files:

    tank/itunes@zfs-auto-snap:daily-2010-10-11-00:00:/iTunes/Album Artwork/Cache/7C7DE397DD748486/07/09/00/7C7DE397DD748486-872B56E2544E1097.itc

  84. Hi Steve,

    From the message it appears that the file shown has unrecoverable errors. The remedy is to recover the file from backups, if you have them.

    If you don’t have a backup then you could try to locate an earlier uncorrupted version of the file, using the snapshots. Look in the .zfs directory to find the file. See the ‘Checking the snapshots’ section of this post to see how to reference individual files within various snapshots of a given file system.

    Hope it helps.

    Cheers,
    Simon

  85. That’s kind of what I had suspected was the case. I keep getting more and more errors and I think the pool is beyond hope. Looks like I’m going to have to destroy the pool and start over. I was backing up the essential files on other drives but there was large amounts of data that is not critical but is a pain to replace. This has really shown me the importance of backing up the whole pool.

    I bought a 2 TB USB external hard drive and I am going to start doing regular backups of the entire filesystem in case this ever happens again. I guess I got a little overconfident figuring with raidz2 that I would have to lose 3 drives before data corruption.

    Thanks!

  86. Hi Steve,

    Without knowing the exact circumstances & reasons for the errors you are seeing, it is difficult to make any useful suggestions. It sounds quite likely there is some possibility of a hardware error. How long did you have the pool in operation, and how often did you perform a scrub on the pool? With more frequent scrubs you will likely see signs of read/write/checksum errors occurring long before serious failures, allowing you to swap drive(s) showing signs of trouble. For home systems, a monthly scrub should be sufficient.

    Personally, having run two ZFS NAS systems for the last 3 years, I have never lost any data, and never saw even one read/write or checksum error within the main data storage pools, although I did see some errors occurring in my mirrored boot SSDs: http://breden.org.uk/2009/09/02/home-fileserver-handling-pool-errors/

    Indeed, backups are highly recommended.

    It’s quite possible that your system can be recovered, so before doing anything with your system, I would strongly suggest creating a post describing what happened to your system on the OpenSolaris ZFS forum here: http://opensolaris.org/jive/forum.jspa?forumID=80

    Cheers,
    Simon

  87. Simon,

    Of all the guides and sites of looked at over the past few weeks, yours appears the most easy to understand:) I have a scenario for you (and anyone else who’d care to reply):

    After 16 years of being an IT Engineer in a Microsoft-based environment, I bought an iPhone 3G a couple years ago and have given up on Microsoft for personal use. My home is 90% Apple with a sprinkle of ‘other’ here and there for budgetary purposes. I’ve run out of storage space on a couple of external USB drives and am about to build some type of file storage system. Here is my current setup at home:

    Main System – iMac 21.5″, 2.8Ghz C2Duo, 4GB, 500GB.
    – Running iTunes, Plex Media Server, Air Video Server

    Secondary Systems – Apple TV2, LG Blu-Ray (DLNA), iPhone 4, iPad
    – Running Plex Client on all but LG Blu-Ray for media streaming in-house and over internet via wi-fi/3G.

    Backup/File Storage – (2) 1TB External USB HDD’s.
    – Uses: Time Machine Backup of iTunes, iPhoto, Documents and File Storage for Media Sharing.

    Here is what I would like to do:

    - Add an 8-10TB File Server/NAS to my network.
    - Compatibility with both OS X and Windows for streaming.
    - RAID 5 or 6 Redundancy (know RAID5, learning about 6).
    - Ability to expand storage later if necessary without complete rebuild.
    - Move remaining media off iMac and external drives to run off server, create iTunes Server.
    - Use USB External’s for Time Machine backup’s of documents, photos and home movies.

    Here is the equipment I’ve chosen based on existing and budget:

    - Existing Intel Xeon E3110 Dual Core @ 3.0Ghz, LGA775
    - New Gigabyte GA-G31M-ES2L Mobo
    - New Crucial 4GB DDR2 800 Memory
    - New (2) Promise SATA300 TX4 4-Port PCI SATA II Controller
    - New (6) WD Caviar Green 2TB 3.5″ HDD’s
    - Existing Full ATX Tower Case with 12 total drive bays
    - Existing Antec 500W Phantom PwrSupply

    Based on the information above, is your guide up-to-date, any recommendations/changes as well as any additional assistance to a novice Unix/Linux/Solaris user? Oh and I have to stay under an $800 budget with the Tower, PS and CPU already purchased/not included.

    Please advise…this looks to be the best option and your guide to be the easiest to follow.

    Brad

  88. Hello BD,

    I have a OpenSolaris 2009.06 running, and currently planning on upgrading both HD and software (new Oracle Solaris Express ? other ?). The old machine will serve as a backup machine for my data.

    Here are the specs I selected for the various components.

    [*] Intel Xeon L3406 (Just seen you were stuck with the E3110, but this one is 32 nm and max 30 Watts compared to 65 W …) http://ark.intel.com/Product.aspx?id=47555
    [*] SUPERMICRO X8SIL-F – micro ATX – Intel 3420 – LGA1156 Socket (micro ATX, good motherboard brand, ECC memory, 6 SATA and 2 Intel GB Ethernet ports) http://www.supermicro.com/xeon_3400/Motherboard/X8SIL.cfm
    [*] Crucial 8 Go (Kit 2x 4 Go) DDR3-SDRAM PC8500 ECC CL7 – CT2KIT51272BA1067 (Crucial, ECC …)
    [*] Lian Li PC-V354B – mini desktop (very small factor, but seven 3.5″ HDD drive bays inside + one 5″1/4) http://www.lian-li.com/v2/tw/product/upload/image/v354/flyer.html
    [*] TWO * Corsair Force Series F60 – SSD 60 Go 2.5″ Serial ATA II (mirrored rpool, to be inserted in a 5″1/4 rack and connected to the on-board SATA controller)
    [*] Supermicro AOC-USAS-L8i 8 Port SAS RAID Card (for managing the tank pool)
    [*] SEVEN * Western Digital Caviar Green 2 To 64 Mo Serial ATA II – WD20EARS (9 TB tank in RAIDZ2 mode, no spare for the moment)
    [*] SAS to SATA cables, Seasonic fanless 400W PSU, Noctua fans etc…

    Looking at your proposal, I find really lacking the ECC capability, IMHO. I mean, you want to install a fireproof FS, but withoutECC RAM, which is mandatory for most serious server systems. And you already have a Xeon type processor anyway.

    I am currently buying components, planning to be done by mid March with a running machine.

    Cheers

    Julien

  89. I just saw tons on warnings on the new EARS, but I will stick with them (only 6 on the other hand).
    Problems:

    - 4k drives are emulated as 512-bytes sector for s****y Windows XP. Huge and catastrophic impacts on the performance under ZFS because of WD’s care for retarded customers -> I will use the gnop tips to create a virtual disk with 4k sectors as first of my pool + I could also use a modified zpool that lets me specify the ashift from 9 (bad) to 12 (good).
    So there is at least two solutions for this issue. But I recognize we should not be constraint on using work-arounds, and WD should definitively release a non emulated firmware to its customers. BTW, tons of new HDDs are also concerned and are using this new method of partitioning, so it is not a Caviar Green only issue.

    - TLER setting hardwired on OFF since one year. Apparently, there is no real impact under RAIDZ, in fact, TLER should be DEACTIVATED under a RAIDZ configuration ! so no issue at all, this is just a big confusion around

    - Finally, the too many parking issue (load-unload cycles that are raising dramatically). Because it is an energy efficient drive, heads on Caviar Green are parking too often under a RAID configuration, slowing down the whole thing and decreasing dramatically the lifespan of the whole. Lucky we are, there is an utility that is still working on the WD website (WDIDLE3) to set the delay from couple of seconds to 5 minutes, which is way better. Problem solved.

    So I will stick with the WD 2 TB EARS. Pity that the only real issue (the 4k advanced format), that is not restricted to the Caviar Green series, is still present and needs a work-around, but I don’t like other offers.

  90. Julien,

    Have you bought all of your hardware yet? If not, I’d recommend a couple of changes:

    1) Motherboard: X8SI6-F. It’s like the X8SIL-F, but it has an onboard LSI SAS2008 (6Gbps) controller with 2 ports so you can skip the extra SAS controller card. It is a bit more expensive than the X8SIL-F, but should be less than the X8SIL-F + the AOC-USAS-L8i. It does only have 1 x8 PCI-e slot on it, tho.
    2) Memory: Super Talent DDR3-1333 4GB/256×8 ECC Micron Server Memory – W1333EB4GM. It is cheaper and faster (you can generally find it for ~50 a stick on SuperBiiz and they frequently have discount coupons).

    I have both of those in my home system and they appear to be working just fine.

    -Colin

  91. Even though this series is based on OpenSolaris, much applies to FreeBSD as well. I have been using a FreeBSD installation, booting from Compact Flash with a vdev of 3 disks with expansion of a vdev of another 3 disks, for quite a while now, much to my satisfaction

    fd0

  92. I’m currently running OpenSolaris 2009.06, installed using this how-to. :) So, thank you for the work you put into this.

    I’m looking at upgrading the OS on this machine. I was saddened to see that OpenSolaris no longer exists. My question is what OS would you recommend? I’ve been looking at Solaris Express, OpenIndiana, and SmartOS.
    I’m currently leaning towards SmartOS, it looks like a good implementation.

    Any comments, suggestion would be greatly appreciated.

  93. Thanks David.

    Yes, I was also saddened to see what Oracle did with OpenSolaris.
    However, illumos appears to be the new code base and, like you pointed out, there are a few derivatives to choose from.

    IIRC, SmartOS is developed by Joyent and is optimised for data centres.
    OpenIndiana is probably the closest to the original OpenSolaris system and, so for that reason, I chose OpenIndiana as it most matched my requirements for a general purpose drop-in replacement for OpenSolaris.

    I upgraded from build 134 of OpenSolaris to build 151a4 of OpenIndiana. There were one or two issues due to a problem in pkg, but thanks to the very helpful #oi-dev guys, I managed to work around the problems at the time. I was thinking of publishing a guide showing how to update the code to OpenIndiana, as it might be of interest to others too.

  94. This site deserves all the praise it receives. I’m hoping you can assist with the setup I’m trying to create.

    I want to protect my children from digital dyslexia (accidental deletion of files) and the potential for viruses to update files. Previously I was using a QNAP NAS and had the following setup (converting it into ZFS ’speak’).

    There are 2 user accounts, one for the children and one for me.

    Disk structure (old) – numbers are for reference
    01 /tank/fileshare/Contribution
    02 /tank/fileshare/Music
    03 /tank/fileshare/Video
    04 /tank/fileshare/Company

    Shares were setup as follows:
    fileshare=/tank/fileshare
    fileshare$=/tank/fileshare
    Each level 1 folder (Music, Video) is also shared as a standard share

    The ‘fileshare$’ is (nominally) hidden and allows r/w access to all folders (01, 02, 03, 04)
    The ‘fileshare’ allows r/w access to the Contribution folder (01), and r/o to all others (02, 03, 04)
    The Level 1 shares have the same security as ‘fileshare’ and the children usually access via the Level 1 shares

    ZFS seems to only allow ONE share for a filesystem and nothing for the folders. I want to be able to access the folders/files via 2 methods. Meyhod 1 – the day to day one via the standard share, where I map drives etc. Method 2 – via the hidden share when I need to update something. This approach means that I only require one UID/PWD combination. This approach helps protect the NAS from being hit by viruses etc, and means that the children can not accidentially delete the Music / Videos etc. I move the files from Contribution to the appropriate location. It worked effectively on the QNAP, but I’m stumped with OpenIndiana and ZFS.

    Can someone suggest how I can go about this?

    I don’t want to have to create seperate ‘filesystems’ for each ’share’, this will be too much hard work, and does not meet the requirement of having locations both r/w and r/o depending on the access path. Also I take off site backups by copying everything below the fileshare level.

    Any ideas / suggestions?

    pcd

  95. Hi Simon,

    I must tell you that your blog led me down the path of using Solaris as my media server backup machine, and I have been very pleased.

    I did upgrade to Oracle Solaris Express 11 before I full realized the consequences. I did not upgrade the ZFS pools beyond what is compatible with OpenIndiana.

    I would very much enjoy seeing a write up from you on how you did the upgrade, and if you have any tips on how I may get my machine over there to OpenIndiana, I would be very grateful.

  96. I would be very interested in how you upgraded to 151a4 of OpenIndiana. I rather hastily upgraded to the Solaris 11 express, which I now regret. The pools on the file system are still at version 28, so it should be compatible if I try to move to OpenIndiana. I am also thinking of moving my data to FreeBSD.

    I used much of your guide in creating my OpenSolaris system in the first place, and I am very interested in your experiences with the upgrade.

Leave a Reply