*** DISTRIBUTED AND COMPLEX SYSTEMS (base principles)

* Producer-Consumer Asynchronous Queues

P: producer -- one that wants some work to be performed
C: consumer -- one that will perform the work
T(P): time it takes the producer to give its work to someone else
T(C): time it takes the consumer to process the work given
Q: a queue of length MAX
R(P): rate of incoming producers (e.g., items/requests per second)
R(C): rate of consumers draining queue (e.g., items/jobs per second)

E.g., a process issuing an I/O.  I/O is slow, no need to wait for it,
instead let someone else do the work, and the process can go do something
else useful (e.g., CPU computation, another I/O).  Sometimes called
"interleaving I/O and CPU".

T(P) << T(C) (else if T(C) is very short, better do it yourself
synchronously): this is when async queues are mostly useful.

1. Design a queue Q
2. Producer adds some work W to Q, then returns immediately
3. Consumer picks up work from Q, processes it, then returns results in some
   way (callback, signal, etc.).

Need to limit size of work queue Q, so as not to consume too many resources.
- if Q is "full", put new producers to wait
	or return special error like EAGAIN
- once Q size drops below threshold, can wakeup some producers
- if Q is empty, put consumers to wait
- once Q has at least one item, can wakeup some consumers

If Q isn't full, producers can add work quickly to Q, then go back and do
something else.  That "something else" can be CPU activity, or produce MORE
work items for the Q.

If you merely return an error like EAGAIN to a producer, some producers
might just retry again and again, even in an infinite loop.  That also
consumes CPU cycles.  Such producers (who produce a lot of work for the
queue) are called "heavy writers".  Rather than allow heavy writers to cycle
very fast to produce more work, you put them to sleep in a "wait" state (or
send a SIGSTOP to suspend them temporarily).

By putting those producers to sleep when Q is full, we are holding back or
forcing those heavy writers to stop making more work.  This is called
"throttling the heavy writers".  This in turns frees up the system so that
consumers can get their work done, to "drain" the queue.

How to inform producers of the status of the async work?
- signals, callback functions, message passing, etc.
- note: producer may no longer be running (so no one to inform)

Often you see in APIs that submit jobs asynchronously, an argument or two
for filling in a callback function and/or callback structure (void*).  If
you set them to NULL, it means you don't want to get informed.  Else, the
producer, when done, will call your callback fxn w/ data like return
success/fail codes, etc.  If so, then then the producer/consumer queue also
has to record these callback values, so it knows who/what to inform.  See
for example Linux's AIO (Async I/O) calls for reads/writes.

Number of producers that can add work is the size (or depth) of the Q (max).
In modern systems, even the max size of the Q can grow/shrink, within some
limits or ranges, to accommodate needs vs. system resources.  A form of load
balancing among queues.

No. of consumers can also be tunable: too few and system is underutilized,
too many and you waste mem/etc. resources when there's not much work (might
cause thrashing of memory or swap space).  A rule of thumb is: one consumer
per CPU core (a reasonably optimal number of consumers).

Scenarios:

Assume Q has some fixed size.

1. R(P) >> R(C): in steady state, Q is full, most producers waiting

2. R(P) << R(C): in steady state, Q is empty on avg, most consumers sleeping

Assume (e)psilon is a small number

3. R(P) = R(C) + e: meaning R(P) is just slightly faster than R(C).  Q is
still full in steady state, just takes longer.

4. R(P) + e = R(C): meaning R(P) is just slightly slower than R(C).  Q is
still empty in steady state, just takes longer.

5. R(P) == R(C): perfectly balanced system, Q size is on avg a fixed number
b/t 1..MAX.

* Locking issues and types

Concurrency is important for improved performance and throughput.
Synchronization is important whenever 2 or more entities want to access a
shared resource concurrently.  Synchronization often requires some sort of a
lock L held around a critical section (CS).

Lock types (from Linux);

1. Spinlock: the lock requested literally "spins" on the CPU, waiting for
the lock owner to release the lock L.  Spinlocks chew up lots of CPU if used
for too long, so use them when the CS is relatively short.  Conversely,
spinlocks are very easy to implement and grabbing a lock requires only a
few assembly instructions.

Spinlocks are "exclusive" locks: only one lock owner can hold the lock L at
a time (meaning only one thread can be inside the CS, all others have to
wait).

2. Mutex: is also an exclusive lock, only one lock owner at a time.  Mutex
locks are useful when the CS is relatively long (e.g., I/O has to be
performed).  Mutex implementation is more complex, because if the lock is
held, the ones who try to grab the lock L automatically go to sleep (added
to a wait-queue, and woken up later by a consumer).

3. A read-write semaphore (rwsem), built on top of a mutex with counters for
number of reads and number of writes.  rwsem allows one writer at a time,
and multiple readers.  Readers are not allowed to change the shared
resource, only "read" it; writes can modify that shared resource.  This
improves throughput when there are multiple readers, as they can all enter
the CS at the same time.  Rwsem like mutex, is useful when the CS is long
enough.

4. Read-Copy-Update (RCU).  It has 3 phases:

(a) grab a quick spinlock, make a COPY of a shared resource R, say R'.  R'
    is your private copy of the resource, to do with as you please.  This is
    the READ+COPY stages.

- if the user of R' doesn't make any change to R', then can just free it,
  and you're done.

(b) if the user of R' makes a change to R', then it's their job to
    incorporate any changes in R' back into the original shared source R.
    This is the "UPDATE" phase.

- Updating can be complex.  Assume if R is a sorted list of names.  You'd
  have to "merge sort" R' into R, avoiding duplicate items.

- Complexity: if R *itself* had changed since you made a copy of it, you'd
  have to do a "3-way merge" (like a conflicting git commit).

- Updating has to be done quickly, usually also under a spinlock.

Benefits:

- the "READ+COPY" phase is quick (spinlock)
- "UPDATE" phase is optional, only if R' changed
- user of R' can "sit" and work on R' for as long as it wanted.

Built in incentive mechanism: if the user modifies R', and it wants to make
its life easier in the "UPDATE" (i.e., merge R' into R), then it would be
best for the user NOT to "sit" and work on R' for too long.  The longer you
wait, the higher the chance that R will have changed itself.

* Distributed systems

Concurrently vs. synchronization are always at odds: a careful trade-off

Asynchrony: critical to improved concurrently and throughput (permits better
interleaving of threads, no one has to "wait" on long actions to conclude).

CAP theorem: https://en.wikipedia.org/wiki/CAP_theorem

- Consistency: Every read receives the most recent write or an error

- Availability: Every request receives a (non-error) response, without the
  guarantee that it contains the most recent write

- Partition tolerance: The system continues to operate despite an arbitrary
  number of messages being dropped (or delayed) by the network between nodes

Paxos: distributed consensus alg,
https://en.wikipedia.org/wiki/Paxos_(computer_&

Example of system that handles "A" and "P" but not "C": NoSQL "eventual
consistency".