Life on a hard drive

With the ubiquity of digital cameras and camcorders, it is becoming ever easier to record and organise your family memories and, ironically, the safety of these recorded memories is becoming ever more jeopardised due to the flawed nature of recording media such as CD ROMs, DVDs and hard drives.

There are the two extremes and the one in-between. On the one hand some people have no wish to record any memories, like photos or camcorder footage, whereas others have gone to mind-boggling efforts to record everything in their life, literally! Most people probably fall somewhere in between these two positions.

Like many people, I have a fair number of digital photos and also a few tapes of family and holiday video clips taken with a Mini DV camcorder. And I really don’t see any need to lose any of it through either accident or carelessness. Which brings us to the big question of: (1) what to do with all this stuff, and (2) how to ensure it stays around for a while – the longer, the better, really.

I was reading someone’s blog the other day on this subject and it was very interesting, as many of these things were discussed.

In summary, it seems that any storage medium is fatally flawed and will fail eventually, it’s just a matter of when. I have had a few hard drives self-destruct over the last 15 years or so, so I know this to be true from personal experience.

The solution of course is to ensure that you make backups — you do make backups don’t you? 🙂 It seems the most foolproof way to ensure data longevity is to make multiple copies of the data and to ensure that they are not all at the same physical location — e.g. give a backup to a family member for them to store, just in case your house has a fire.

Also, the backups can take two main forms:

  1. a complete file or bit-level bootable clone of a hard drive, which enables you to be up and running immediately in case of hard drive failure.
  2. a daily backup which makes copies of any files that have been modified.

Ideally, it’s best to make a bootable clone, and to make an automated daily backup of any modified files. Software to make automatic daily backups are widely available: for Macintosh computers see: SuperDuper, Carbon Copy Cloner and EMC Retrospect, and there are doubtless many others available for Windows and Linux.

Most people’s data will consist of documents/spreadsheets, emails, digital photos, music, movies and possibly video clips. Due to the massive hard drive capacities currently cheaply available (500GB for 100 euros), none of these items present any real problem for most people, except for possibly movies and video clips taken with a camcorder, especially if it is a new HD camera. This is due to the huge amount of storage space that video can consume. So video is really the problem case.

Solutions exist for coping with the huge amounts of storage required for storing camcorder clips, which, if uncompressed, can run into the hundreds of gigabytes. RAID is the most commonly used tool for dealing with large storage requirements, but RAID has its own set of problems too. There are many different RAID levels, each with their own qualities, and some of these can even be combined. RAID can be controlled either through a software or hardware solution and, again, each method has its own advantages and disadvantages.

  • RAID 0 is called ‘striping’ and is the fastest of all the RAID levels. When data is read from or written to the array of disks, the controller simultaneously sends commands to each disk to read or write a small piece of the data, thus gaining a multiplier effect by distributing the task. So if each disk can write at 50MB/s then an 8-disk disk array setup in RAID level zero may achieve 400MB/s, which is pretty fast, and may be required for capture of uncompressed video. The huge disadvantage of RAID 0 is that if any disk in the array fails, all the data is lost. That’s serious! So to get round this problem, RAID 0 is often combined with RAID 1 (mirroring).
  • RAID 1 is called ‘mirroring’ and it is the safe RAID level. All data written to the disk is duplicated on both drives, assuming you have a 2-disk RAID 1 setup. That way, if either disk fails, the RAID controller simply ignores the failed disk and continues to read/write using the remaining operational disk. So RAID 1 is safe, but not fast. Often RAID 0 is combined with RAID 1, so that, for example, an 8-disk array will be split into two groups of four disks, the first group of four disks configured into a RAID 0 array for speed, and this is then mirrored onto the second group of four disks to give a backup in the case that a disk fails.
  • RAID 5 is quite popular because it gives an increase in speed like RAID 0, and also gives some safety like RAID 1 in case a disk fails. In fact, RAID 5 is designed to remain operational in the case of one failed disk. You lose all data, though, in the case that 2 or more disks fail. So when one disk fails you need to replace the failed disk immediately with a new one, before a second one fails. Often the disks are ‘hot swappable’, meaning that you can replace the drive whilst the system is still running. Once the new drive is installed, the RAID 5 controller will rebuild the data on the new drive based on the data found on the remaining disks. Clever stuff, but it seems that when one disk fails, then it can be a matter of days until the second one fails, so it’s still fairly risky in my opinion. A way of mitigating risk is to buy the disks in the array from different manufacturing batches, or from different model ranges.
  • RAID 6 goes one step further than RAID 5 and is designed to remain operational in the event that two drives fail. Neat stuff! If I ever get a RAID setup, I want RAID level 6 — gimme, gimme! 🙂

There are other RAID levels, but the ones above (0, 1, 5, and sometimes 6) are the ones most commonly used.

RAID allows you to create enormous amounts of storage capacity, read and write data to it quickly, and obtain some security in the event of failed disk, but it does still need to be backed up to cover the case of lost data due to accidental deletion of files, so that gives one something to think about too 🙂

Other solutions for backup exist like tape and CD/DVD, but these have issues too. Tape is too expensive for mainstream use, has sequential and not random access, so may take a long time to recover files. CD and DVD media often fail after only a few years and creating the copies is time-consuming and impractical for large amounts of data.

So RAID is probably useful in the case that you have a need for enormous amounts of storage capacity because you record a lot of video clips, or shoot loads of RAW format photos. But consider the power it will use to spin 4 or 8 disks, the noise it will create whilst in operation (heads moving etc), the vibration all those disks will almost certainly create, the heat that will be generated, the cooling fans that will be required (generating yet more noise). Also, there is the cost, and the cost of buying more gear to enable the RAID disks to be backed-up. Food for thought.

For now, I will try to get by with just one main drive that has a high capacity, and create (1) an identical clone of this disk periodically and (2) do an automatic daily backup, (3) keep a copy at another location.

The new version of Mac OS X called Leopard has a built-in backup utility called Time Machine, that should help in this respect.

[note to self: chuck the RAID stuff into a page of its own]

For ZFS Home Fileserver articles see here: A Home Fileserver using ZFS. Alternatively, see related articles in the following categories: ZFS, Storage, Fileservers.

Leave a comment

Your email address will not be published. Required fields are marked *