Raid arrays in Linux
Linux has a lovely software raid feature set with a ton of options and levels for just about any situation, however one thing that most people use it for is data retention when your hard disk does die (not if, when). With the new tools that are around these days, a lot of the documentation is out of date on how to check RAID arrays - and one of the worst things in the world is when you figure "it doesn't matter that drive died", whack in another clean disk and SURPRISE! you have another faulty disk!
So, how do you minimise the impact of failures?
1. Look at the smart tools. Take note of their values and get the drives to self-test on a regular basis
smartctl --smart=on --offlineauto=on --saveauto=on /dev/hda
echo check > /sys/block/md0/md/sync_action
Nothing is 100% foolproof, but with a bit of thought before a failure can save you hours, sometimes days of stress and headaches. The server that this site is hosted on recently had a RAID1 fail. Most data was recoverable, however the system required 2 new HDDs. A nightly rsync run from this machine to another offsite system took the recovery time to 2 hours + data copying time. Very little was lost (I think we lost maybe 5 mailing list messages from the archives).
Oh, and if you need to repair your RAID array at any time, try using:
echo repair > /sys/block/md0/md/sync_action
Comments powered by Disqus