Wanted: Cheap giant bit barrel
Originally published 2004 in Atomic: Maximum Power Computing Last modified 03-Dec-2011.
Parkinson's Law of Data states that data expands to fill the space available for storage. It was true in the days of 300 baud cassette storage, and it's still true in the days of 300 gigabyte hard drives.
(Just so you know - it'd take about 250 years to transfer the full formatted capacity of a modern 300Gb drive at 300 baud speed.)
Most people who're filling hundreds of gigabytes of disk space with data are storing video files - for their Home Theatre PC or their Complete Works of Cara Lott Upsampled to 1080i archive, or what have you. But given the power of current consumer PCs, maybe you need a terabyte of storage for your industrial database experiments, Pixar@Home film studio or new fully GPLed record label. Whatever it is, it needs more capacity than any one drive can give you.
Whatever you're putting on your collection of drives, they're probably not pretty to look at or easy to manage. A couple of 250Gb drives in your main PC, an old Socket 7 box in the corner with four questionably reliable 120s straining its dusty PSU, bare drives sitting loose on the bottom of cases that've run out of bays, CD-Rs and DVD-Rs filed in shoeboxes. Maybe you've got something like Linksys' NSLU2 neat little USB network storage widget; that's definitely a step in the right direction, but it only accepts two drives.
What we need is a handy-dandy storage appliance into which you just plug drives, and which takes care of the details itself. Start out with a couple of drives, add more as you need them, and watch the Network Attached Storage (NAS) shared "drive" on your LAN get bigger as a result. Integrated backup system essential, integrated data redundancy a big plus, reasonable ease of use and low purchase price essential.
Gadgets already exist that have everything but the last two features. Enterprise storage giant EMC's Clariion (from the people that brought you Siimon Reynolds...) AX100, for instance, will set you back the thick end of five thousand US dollars even when it's only got three miserable 160Gb SATA drives in it, of its 12 maximum. By EMC's standards, that's very cheap.
The AX100's got lots of features that home and small office users don't need, though. Not just full Storage Area Network (SAN) operation as well as NAS, but a fibre channel interface and dual power supplies. And the whole thing is, or at least bleeding well ought to be, built like a tank.
Restrict the external interface to ordinary Ethernet, settle for one non-redundant power supply, don't worry about SAN frills (because you don't have a thousand users each with 50 files open...), and the hardware side can be an ordinary PC. Start with the four or six drives you can plug into most motherboards, and add on more cheap ATA controller cards as necessary. With a reasonably beefy power supply, decent ventilation and enough 3.5 inch bays, it's no big deal to run more than 20 hard drives from an ordinary consumer motherboard. Four-drive PCI ATA controller cards cost less than $50 each.
Then, of course, you have to tie the drives together into one volume - a few, at most. The obvious way to do that is with RAID (Redundant Array of Independent, or Inexpensive, Disks) 5, which lets you make an array of as many identical drives (or at least identical-sized partitions) as you like, gives you the capacity of all but one of the component drives put together (five 200Gb drives gives 800Gb capacity), and can survive the death of any one drive.
There are some vaguely affordable products out there that're starting to approach this kind of functionality. Buffalo's TeraStation is an option, for instance. It packs four drives in a neat NAS box and can run them in RAID 5 mode.
The four-by-400Gb 1.6Tb TeraStation (real un-RAIDed formatted capacity probably something like 1490Gb; 1120Gb, at most, in RAID 5) is going for at least $US1900 at the moment; maybe $US2000 from a five-star dealer. That's about $US800 more than the drives by themselves are worth, though 400Gb drives aren't good value for money yet. The more modest four-by-250Gb TeraStation is going for around $US1000; maybe $US550 on top of the current retail price of the drives. Obviously, an empty TeraStation into which you could put your own drives would be a nice thing to have, but unless you find one on eBay that someone stabbed, that ain't gonna happen.
Lots of regular PC users have experimented with RAID, thanks to the proliferation of cheap ATA RAID controllers on motherboards. These aren't really RAID controllers, since they depend on their driver software to do the RAID gruntwork, but they're close enough for government work.
If you want RAID 5, though, then you either have to buy yourself a more expensive controller card (a 3ware Escalade, for instance; a lot of rappers seem to think those cards are very cool), or go for software RAID. Software RAID is a perfectly sensible option for many purposes these days, now that even budget processors can handle a good-sized array at very respectable speed.
Some people do software RAID with one of the Server versions of Windows - non-Server Win2000 and WinXP flavours can't do RAID 5.
(A few readers have now pointed out to me that you can registry-tweak XP Pro into being RAID-5 capable, though these instructions are, apparently, for XP versions before SP2. These ones allegedly work with SP2.)
Windows makes it easy to create and maintain RAID arrays, and the enormous purchase price doesn't seem to be an obstacle at all, if the amount of mail I get from teenagers requesting help with Windows 2000 Datacenter Server is anything to go by.
One thing you can't do with Windows software RAID 5, though, is "expand" an array - add drives. You can make an array with three disks, or a more efficient array with four, but you can't turn your three disk array into a four disk one without backing up your data, zapping the array entirely and creating a new one.
Linux's software RAID can, at least in theory, handle array expansion, and it's not terribly difficult to set up. But strong men have wept over what you have to do if a drive fails. And you should take careful note of the "at least in theory" part, up there.
You can try to minimise the chance of a screwup by using a hardware RAID controller card that supports array expansion, like Broadcom's under-$US400 eight channel SATA BC4852 (which used to be the RAIDCore RC4852...). But you'll still find yourself doing more sysadmin-y things than you probably want to.
Even when it works, array expansion can take a long time. If it fails - which it may well do if you're asking consumer hardware to flog six drives hard for eight straight hours - then the whole array's contents will very probably be toast.
Accordingly, I think we may have to forego RAID 5 for our cheap 'n' cheerful expandable storage system, and go for concatenation, or JBOD.
JBOD stands for "Just a Bunch Of Disks". It's sometimes called "Linear RAID", though it's not actually redundant at all. It just stacks disks of all shapes and sizes one after the other, making them all look like one volume. It's a handy technology, it makes array expansion fast and easy, it lets you chain drives of any capacity, but it's unreliable. If one disk fails, the whole JBOD volume becomes invalid - although you can, of course, in theory recover what's on the other drives.
So what we need is a JBOD system with automatic backup, to tape if you've got it but probably to DVD-R. That may require a whole bunch of DVDs for the first backup, of course, but incremental backups from then on will ease the pain, and the result will be a really resilient storage silo. If and when a drive dies, you can just swap in a fresh one, and the software will tell you which backup disks it needs to recover the lost data. The software can also direct you to the failed physical disk, as long as you wrote the appropriate letter on the back of the drive when you installed it.
Mix ingredients into a nice compact custom Linux distro or large cumbersome Windows app, ice with big-happy-buttons GUI interface (with HTTP option, please!), bake for 50 minutes at 180 degrees.
Who's with me?
In the meantime
As of early 2006, when I'm writing this update, the cheap bit barrel still hasn't turned up.
But there's finally a good stopgap.
Behold: The Netgear SC101 Storage Central. It accepts one or two Parallel ATA hard drives and turns them into Network-Attached Storage, with the usual easy Web browser setup interface. Most importantly, and pretty much uniquely, you buy it empty. So you don't have to pay through the nose for pre-installed drives. Hurrah.
The Storage Central can run its drives separately or mirrored, or in a volume spanned across multiple SC101s, if you want to get clever. I strongly recommend mirrored mode in that case.
The SC101 isn't a new product, but I ignored it until recently, because it didn't spin its drives down, ever. That's a recipe for early disk death, and made this particular beige toaster absolutely unacceptable. I don't expect consumer drives to last forever, and neither should you, but if you never spin them down they're quite likely to be dead inside two years.
A recent firmware update, however, has solved that problem. So now it's safe to buy an SC101. Well, probably.
You only get a 100BaseT Ethernet connection, and the thing uses a proprietary filesystem (make backups, you idiots) and, even, a proprietary protocol. That means it only works on Windows networks (until someone hacks up a driver for its non-standard protocol for other OSes), and requires a DHCP server on your network (any Net sharer box will do), and has no fans and so may run a teeny bit warm unless you h4xx0r some holes in it, at least. But all this can be forgiven, for Windows users at least, because the Storage Central's list price in the States is only $US99.
Here in Australia, Netgear still say the SC101 lists for $229, which is rather higher than is fair, if you ask me. But Aus PC Market stock it for only $AU198 including delivery in the Sydney metropolitan area; delivery elsewhere in the country will cost more, but not a lot more. Not a huge bargain compared with the US price, but still a very good deal for what you get. Australian buyers can click here to order one.
If I could just yank drives out of my old watt-sucking file server and drop them into SC101s, I would, right now. But the proprietary filesystem makes that impossible. Future drives are definitely going to be living in SC101s until something better comes along, though.
Recommended.
Australian shoppers who'd like to buy an SC101 from Aus PC Market can click here to do so.
April 2007 update:
As of now, the closest thing to the bit-barrel ideal for home and small office users appears to be the Drobo storage appliance. I write about it here.