* RAID Motivation: 1. Disks have a fixed size. If you want more room, need a larger disk. 2. Stored data can be lost or corrupted: if you put all data on a single drive, and it dies, you lost all data ("data loss") 3. Disks are slow! Solution: Redundant Array of Independent/Inexpensive Disks (RAID) Note: any system that "speaks" the HDD protocol (e.g., SCSI), can be used anywhere a block device is used. This can be implemented: - inside an actual physical device - as a ram disk (w/ suitable driver) - virtual disks: the hypervisor takes a file on the underlying host, and "exposes" it as if it were an actual disk to the guest OS. - take a few such drives, combine them together, and expose that as an alternate drive. RAID levels: - RAID 0: striping (concatenation) - RAID 1: mirroring - RAID 5: one parity - RAID 6: dual parity * RAID 0: striping/concatenation - combine N drives (same size) - create "stripes" of a certain size across all disks Stripe 0 (LBA 0 of disk 1, LBA 0 or disk 2, ..., LBA 0 of disk N) Stripe 1 (LBA 1 of disks 1..N) RAID s/w combines the drives, stripes across all drives, and creates an illusion of having a new drive, whose size is num_LBAs x N drives. The RAID s/w will expose a new device path (e.g., /dev/md0, /dev/raida) that you can treat like any other disk: format, mount, fsck, partition, etc. Pros: - parallelize read/writes across devices, benefit from multiple heads or "spindles". - better performance/throughput. - you get N times the space, by combining drives Cons: - if one drive fails, all data is lost - vendors release Mean Time Between Failures (MTBF) figures. - probability of one drive failing is higher than just one drive As number of drives in RAID0 grows: bigger size, better throughout, higher risk of data loss. All disks have to be same size: if not, you'd only be able to use the largest part of the drives common to all. * RAID 1: mirroring - combine N drives (same size) - writes to LBA x (at RAID level) are copied to LBA x on all drives Pros: - redundancy improved: the more drives, the more copies. You can lose N-1 drives, and still not lose data. Cons: - "wasting" N times the space, and only getting capacity equal to one drive. Performance: - reads: can be faster, b/c only need one drive to respond first - writes: slower b/c have to write to all drives before responding "done" As #drives grows, reads get faster, writes slower, wasted space grows, reliability increases. * Parity A parity bit can count how many 0s or 1s you have, and normalize it - e.g., parity == 0 if number of 1s is odd, else parity == 1 When you read data back (from storage, RAM, CPU, network, anything), you count #0's and compare to parity: then you can tell if a bit somewhere got flipped. You can design more complex parity schemes, with more bits of parity, that can detect more complex multi-bit corruptions. * XOR X Y XOR 0 0 0 0 1 1 1 0 1 1 1 0 XOR properties: - if inputs differ, you get a 1; if they're the same, get a 0 - XOR-ing with a '0' preserves the input - XOR-ing with a '1' flips the input * RAID 4: one parity - N drives: N-1 used for data, 1 drive used for parity Assume we have 3 drives, D1, D2, and D3 - D1/D2 for data; D3 for parity - use XOR for parity LBA(0) of D3: LBA(0) of D1 XOR LBA(0) of D2 LBA(x) of D3: LBA(x) of D1 XOR LBA(x) of D2 RAID s/w knows which drives are data vs. parity. If parity drive dies, the RAID array is said to be in "degraded mode", but no data is lost. Get a new drive, replace the bad one, then tell the RAID s/w to rebuild the array: the array will recalc the parity (or data by XORing remaining data w/ parity) and write it out to the new drive. The process of rebuilding a degraded RAID array is also called "resilvering". RAID4 gives you 1 level of redundancy. * RAID 5 Distribute the parity across all drives: then benefit from all drive heads reading/writing concurrently. RAID5 gives you 1 level of redundancy. If you lose a 2nd drive while rebuilding, you lose all data. Often, vendors will insert extra unused drives, called "hot spares" (HS), so if a failure takes place, the RAID s/w automatically removed a bad drive, uses on of the HS drives, and rebuilds the array. Pros: - you get more capacity than mirroring: N-1 data capacity - more reliable than striping: can afford to lose one drive Cons: - more complex s/w - you still lose one drive's capacity Features: - you don't have to read the parity all the time, to speed things up - if you spend more I/Os, you can read the parity and data blocks, recalc the parity from data blocks, and compare to the parity you read. So you can detect data corruptions (also called an integrity violation) - but doesn't tell me how to restore the data, only that there was a corruption. Optimization: you can read N-1 blocks and calc the Nth block using XOR. As size of RAID5 array grows, more usable capacity, ratio of lost data is smaller. RAID5 array min size is 3 drives. * RAID 6 two levels of redundancy Minimum 4 drives, can lose 2 drives and still not lose data also called double parity * ZFS (Solaris file system) Raidz1: RAID5 Raidz2: RAID6 Raidz3: 3 levels of redundancy.