* RAID

Motivation:

1. Disks have a fixed size.  If you want more room, need a larger disk.

2. Stored data can be lost or corrupted: if you put all data on a single
drive, and it dies, you lost all data ("data loss")

3. Disks are slow!

Solution: Redundant Array of Independent/Inexpensive Disks (RAID)

Note: any system that "speaks" the HDD protocol (e.g., SCSI), can be used
anywhere a block device is used.  This can be implemented:

- inside an actual physical device
- as a ram disk (w/ suitable driver)
- virtual disks: the hypervisor takes a file on the underlying host, and
  "exposes" it as if it were an actual disk to the guest OS.
- take a few such drives, combine them together, and expose that as an
  alternate drive.

RAID levels:

- RAID 0: striping (concatenation)
- RAID 1: mirroring
- RAID 5: one parity
- RAID 6: dual parity

* RAID 0: striping/concatenation

- combine N drives (same size)
- create "stripes" of a certain size across all disks
	Stripe 0 (LBA 0 of disk 1, LBA 0 or disk 2, ..., LBA 0 of disk N)
	Stripe 1 (LBA 1 of disks 1..N)

RAID s/w combines the drives, stripes across all drives, and creates an
illusion of having a new drive, whose size is num_LBAs x N drives.  The RAID
s/w will expose a new device path (e.g., /dev/md0, /dev/raida) that you can
treat like any other disk: format, mount, fsck, partition, etc.

Pros:
- parallelize read/writes across devices, benefit from multiple heads or
  "spindles".
- better performance/throughput.
- you get N times the space, by combining drives

Cons:
- if one drive fails, all data is lost
- vendors release Mean Time Between Failures (MTBF) figures.
- probability of one drive failing is higher than just one drive

As number of drives in RAID0 grows: bigger size, better throughout, higher
risk of data loss.

All disks have to be same size: if not, you'd only be able to use the
largest part of the drives common to all.

* RAID 1: mirroring

- combine N drives (same size)
- writes to LBA x (at RAID level) are copied to LBA x on all drives

Pros:

- redundancy improved: the more drives, the more copies.  You can lose N-1
  drives, and still not lose data.

Cons:

- "wasting" N times the space, and only getting capacity equal to one drive.

Performance:
- reads: can be faster, b/c only need one drive to respond first
- writes: slower b/c have to write to all drives before responding "done"

As #drives grows, reads get faster, writes slower, wasted space grows,
reliability increases.

* Parity

A parity bit can count how many 0s or 1s you have, and normalize it
- e.g., parity == 0 if number of 1s is odd, else parity == 1

When you read data back (from storage, RAM, CPU, network, anything), you
count #0's and compare to parity: then you can tell if a bit somewhere got
flipped.

You can design more complex parity schemes, with more bits of parity, that
can detect more complex multi-bit corruptions.

* XOR

X Y XOR
0 0 0
0 1 1
1 0 1
1 1 0

XOR properties:
- if inputs differ, you get a 1; if they're the same, get a 0
- XOR-ing with a '0' preserves the input
- XOR-ing with a '1' flips the input

* RAID 4: one parity

- N drives: N-1 used for data, 1 drive used for parity

Assume we have 3 drives, D1, D2, and D3
- D1/D2 for data; D3 for parity
- use XOR for parity

LBA(0) of D3: LBA(0) of D1 XOR LBA(0) of D2
LBA(x) of D3: LBA(x) of D1 XOR LBA(x) of D2

RAID s/w knows which drives are data vs. parity.  If parity drive dies, the
RAID array is said to be in "degraded mode", but no data is lost.  Get a new
drive, replace the bad one, then tell the RAID s/w to rebuild the array: the
array will recalc the parity (or data by XORing remaining data w/ parity)
and write it out to the new drive.

The process of rebuilding a degraded RAID array is also called
"resilvering".

RAID4 gives you 1 level of redundancy.

* RAID 5

Distribute the parity across all drives: then benefit from all drive heads
reading/writing concurrently.

RAID5 gives you 1 level of redundancy.  If you lose a 2nd drive while
rebuilding, you lose all data.

Often, vendors will insert extra unused drives, called "hot spares" (HS), so
if a failure takes place, the RAID s/w automatically removed a bad drive,
uses on of the HS drives, and rebuilds the array.

Pros:
- you get more capacity than mirroring: N-1 data capacity
- more reliable than striping: can afford to lose one drive

Cons:
- more complex s/w
- you still lose one drive's capacity

Features:
- you don't have to read the parity all the time, to speed things up
- if you spend more I/Os, you can read the parity and data blocks, recalc
  the parity from data blocks, and compare to the parity you read.  So you
  can detect data corruptions (also called an integrity violation)
- but doesn't tell me how to restore the data, only that there was a
  corruption.

Optimization: you can read N-1 blocks and calc the Nth block using XOR.

As size of RAID5 array grows, more usable capacity, ratio of lost data is
smaller.  RAID5 array min size is 3 drives.

* RAID 6

two levels of redundancy

Minimum 4 drives, can lose 2 drives and still not lose data

also called double parity

* ZFS (Solaris file system)

Raidz1: RAID5
Raidz2: RAID6
Raidz3: 3 levels of redundancy.