Home Fileserver: Existing products

When I began my search for a suitable fileserver, I found the choice of available technologies and products quite overwhelming, with each having their own pros and cons.

Here is a list of things I considered:

RAID
Mirror

A mirror was attractive for the fact that for critical data, you have two identical disks, each containing a copy of the data you wish to protect. Write speed is about the same as writing to a single disk, but read speed could potentially be up to double the speed of reading from a single disk. However, the problem for me was that a simple mirror is limited in capacity to that of a single disk — i.e. around 1TB. I considered this to be insufficient. Also, currently 1TB disks are very expensive — i.e. high cost per GB. Mirrors are sometimes referred to as RAID 1.

So I moved on to the idea of a multi-drive RAID array, something using perhaps 3 or 4 disks to get a fairly substantial capacity, whilst offering redundancy in the event of disk failure. RAID 5 seemed OK, and this uses the capacity of all but one of the disks in the array for data, and uses the capacity of one disk for storing redundant data in the form of parity data. Parity data is required to enable the array to serve up the requested data in the event of drive failure, and to rebuild the data onto a new disk after drive failure.

I briefly, very briefly considered RAID 0, which is with no redundancy, but with blinding read and write performance. But this would have been a stupid choice because what’s the point of being able to read/write at 400MB/s if you risk losing all your data when any of the disks fail? RAID 0 seems to be great if you are doing video editing of uncompressed video content where you need amazing data transfer speed, but it’s not useful for a general purpose fileserver. There are solutions which combine RAID 0 with RAID 1 (mirroring) but I didn’t want this kind of a solution because speed is of secondary importance to me. Of primary importance to me is data integrity.

So now that I had decided on something like RAID 5, I began to look around for products having this kind of capability. Some people had advised me against RAID 5 for various reasons, including the well-known RAID 5 write hole, also described here. Um yes, I can see losing all your data due to the RAID 5 write hole is not too appealing 😉

There are solutions to the RAID 5 write hole problem, which basically involve using non-volatile RAM (NVRAM), but these are fairly expensive. However, the main problem I see with RAID 5 is that each RAID 5 implementation is vendor-specific and therefore, proprietary. This is the biggest turn-off of all, as what happens when your proprietary RAID 5 ‘solution’ has a catastrophic failure? Will the company even be around to offer advice? And what advice will they give you to attempt to bring your data back from the dead, like Lazarus? I don’t think I want to find out 🙂

But if I want to get a large capacity system up and running, what should I choose? Fortunately, around the time that I had reached this point in the journey, a colleague suddenly produced a copy of the September/October issue of the ACM Queue magazine which contained the following article: Hard Disk Drives: The Good, The Bad and The Ugly.

That article is enough to make you wince, when you think that all your valuable data is at the mercy of these inherently fallible devices. The most worrying type of failure is not simple mechanical drive failure, as this is easy to spot and then you can replace the data from other redundant data, assuming your system has built-in redundancy.

No, in fact the most worrying form of disk failures are known as ‘latent failures’. This is where data silently gets corrupted on the disk without any warning. The problem with this kind of failure is that unless you frequently check all data on the disk using stored checksums, how will you even know that your data is silently getting corrupted?

After reading that article, I turned a few pages in the magazine and then I found an article that turned out to be pure gold. It was this: A Conversation with Jeff Bonwick and Bill Moore – The future of file systems.

After reading that article, I felt a warm glow inside and realised that ZFS seemed to be the best solution currently available for solving a number of current problems, and I wanted to use it.

But what about Windows Home Server?

Are you kidding? What are you smoking? 😉 Just take a look at these links and then reconsider your question. It sucks!

From wikipedia, look here and search for the word ‘corruption’ under the ‘Issues’ section 🙂
Look here: Microsoft’s Windows Home Server corrupts files
Look here: Microsoft’s Windows Home Server corrupts files

According to Microsoft’s KnowledgeBase article KB946676, they’ve been working on the bug/design flaw for more than 2 months now, so you’d expect they’d have fixed such a serious flaw by now, wouldn’t you? You’d be wrong 🙂 In fact according to wikipedia, M$ first acknowledged the bug in October 2007, and now we’re in March, 5 months later. Microsoft very kindly suggest that to avoid data corruption you copy the file you wish to edit from the fileserver onto your local disk, edit it there, and then copy the file back to the fileserver. Woo hoo, that’s cool, let’s see if any other systems can do that kind of fancy stuff 😉

So I think you know by now that I consider Windows Home Server fit for Room101 🙂

But what about Drobo?

Drobo initially looked quite interesting, especially in its flexibility for handling different sized drives. However, it seems to suffer from pitifully slow transfer speeds due to using a USB2 interface, and its data format seems undocumented and, therefore, proprietary. I didn’t need to look any further. Also the price was around $500 without drives, so it’s not cheap either. I believe there is some kind of bolt-on box called DroboShare for giving ethernet access too, available for another $200. So this gizmo with ethernet access will cost $700 without any drives. No thanks.

I wondered if many people had lost any data using a Drobo, and what their customer service was like. This person tells you of his experiences where he lost data, and to add insult to injury was told he couldn’t have his money back because there is no guarantee, only a 30 day money back offer. Read about it here: ‘DO NOT BUY A DROBO!!’.

It seems he’s not alone, as a quick search in Google reveals much more data loss misery using a Drobo. Caveat Emptor!

Other existing off the shelf drive enclosures

There are many, many existing drive enclosures out there in the marketplace. Like many other people, I too was looking for suitable drive enclosures for running ZFS — and connecting to it with a Mac and other PC’s. The problem you will find with almost all of these boxes is that they all assume you want to run their proprietary built-in RAID. If you’re using ZFS you’ll almost certainly be better off getting ‘as close to the metal’ as possible — i.e. JBOD mode, where each disk is under full control of ZFS, without some RAID stuff in the way. ZFS likes to have full control of the disks so that no ‘funny business’ is going on between ZFS and any box’s RAID firmware — i.e. no contention. The last thing you want happening is the proprietary RAID firmware trying to do something fancy while ZFS is doing its own thing with the disks too. The result might not be pretty. So if you’re after a general purpose fileserver you have 2 options as far as I see it.

Buy a pre-built box/enclosure that allows pure JBOD mode.
Build your own machine and make sure to disable any onboard RAID within the motherboard’s BIOS so you get JBOD mode.

Personally, to give flexibility and full control, I chose option 2 and built my own box, disabled RAID to ensure JBOD mode within the BIOS, and put Solaris on it. You can see my setup here:
http://breden.org.uk/2008/03/02/home-fileserver-zfs-hardware/

And a series of other articles on setting up a Home ZFS Fileserver here:
http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/

Importantly, also you will have to decide how you want to connect to this fileserver. Will it be DAS or NAS? Direct attached storage will connect the fileserver directly to the computer plugged in to it, but if you connect the fileserver using something like Gigabit ethernet, you will be able to have any computer/network-enabled device on your home network connect to it. I chose the NAS approach for mine. Using a 10/100 Mbit/sec router will not give fast transfer speeds, and Gigabit wired switches (1Gbit/sec) are so cheap these days that you can create a speedy setup very cheaply now.

With Gigabit ethernet, the bottleneck is not the network, but is almost certainly the write speed of the disks that you writing the data to. With Gigabit ethernet, with a theoretical maximum speed of 1 Gigabit per second, this is very approximately 100 Megabytes per second. Only the fastest latest generation of drives are able to achieve that write speed, and that is a maximum write speed on an empty disk, but this write speed is not sustainable. So right now, if you use Gigabit ethernet, the bottleneck is normally the disk read/write speed, for a general-purpose fileserver using some form of redundant RAID.

However, currently, for specialist applications like uncompressed video editing, you will get the fastest sustained data transfer speeds using enclosures that employ RAID level 0, directly attached to the video editing computer (often a Mac), like these devices:

Proavio E4ML, using a 4 channel multi-lane/Infiniband Host Bus Adapter, with sustained transfer speeds of ~217 MBytes/sec
Proavio E8ML, using an 8 channel multi-lane/Infiniband Host Bus Adapter, with sustained data transfer speeds of ~550 MBytes/sec. See a review of this bad boy here

Of course, looking to the future, 10 Gigabit ethernet is already available, but it’s not mainstream yet. Network communication speeds are capable of increasing in speed at a much faster rate than current rotating disk technology. With SSD emerging, faster speeds will be achievable and no moving parts too. However, SSD technology is much too expensive for home use right now, but it will be interesting to see how it develops in the future, and how quickly it supersedes current disk technology.

For more ZFS Home Fileserver articles see here: A Home Fileserver using ZFS. Alternatively, see related articles in the following categories: ZFS, Storage, Fileservers, NAS.

Join the conversation

4 Comments

Tom says:

14th March 2008 at 3:41 pm

Gigabit Ethernet theoretically should have ~ 100 MBytes/s but real world experience shows 60 MBytes/s to be the typical upper limit. It’s certainly better then the 10 MBytes/s I’ve seen with 100T Ethernet.

Another way to boost the speed is with trunking (802.???ad). You can get gigabit switches that support it for under $400 and Solaris 10 supports it with up to 4 (more?) Gigabit ethernet ports.

Simon says:

17th March 2008 at 11:27 pm

@Tom:

That’s interesting to know. So it sounds that when I got around 73MBytes/second on my iSCSI test doing a backup, that I was getting about the best speed possible. However this was done on a small (4GB) group of fairly large video files. When doing a 650GB+ backup using iSCSI, I got around 48MBytes/second sustained transfer speed to a non-redundant pool over the gigabit switch, and this included a lot of smaller files, as well as large files.

Is the ‘trunking’ similar to ethernet channel bonding? I have 2 gigabit ports on this motherboard (Asus M2N-SLI Deluxe), so I considered looking into using both to give potentially higher throughput for video editing applications. See here: http://www.scl.ameslab.gov/Projects/MP_Lite/dox_channel_bonding.html

Ducky says:

2nd October 2009 at 4:16 am

The server I built using similar methods (OpenSolaris with 4x1tb drives) gets near line speed for ehternet, I can get about 95MiB/s transfer.

ZFS is pretty quick, I lost my main system drive, installed a new drive and with one command (zpool import caprica (caprica being my raid name)) it was back up and running again (even all the shares set themselves back up).

Here’s the blog post I wrote about the hardware: http://blog.duklabs.com/?p=44

I don’t go too far into details about the setup though.

Ducky

Simon says:

14th October 2009 at 1:00 pm

Hi Ducky, that’s a great speed! Which NICs do you use? Are you measuring network traffic transmitted across the network, or amount of data written to disk after being transferred?

Home Fileserver: Existing products

But what about Windows Home Server?

But what about Drobo?

Other existing off the shelf drive enclosures

Join the conversation

Cancel reply

Leave a comment