*** DISTRIBUTED AND COMPLEX SYSTEMS (base principles) * Producer-Consumer Asynchronous Queues P: producer -- one that wants some work to be performed C: consumer -- one that will perform the work T(P): time it takes the producer to give its work to someone else T(C): time it takes the consumer to process the work given Q: a queue of length MAX R(P): rate of incoming producers (e.g., items/requests per second) R(C): rate of consumers draining queue (e.g., items/jobs per second) E.g., a process issuing an I/O. I/O is slow, no need to wait for it, instead let someone else do the work, and the process can go do something else useful (e.g., CPU computation, another I/O). Sometimes called "interleaving I/O and CPU". T(P) << T(C) (else if T(C) is very short, better do it yourself synchronously): this is when async queues are mostly useful. 1. Design a queue Q 2. Producer adds some work W to Q, then returns immediately 3. Consumer picks up work from Q, processes it, then returns results in some way (callback, signal, etc.). Need to limit size of work queue Q, so as not to consume too many resources. - if Q is "full", put new producers to wait or return special error like EAGAIN - once Q size drops below threshold, can wakeup some producers - if Q is empty, put consumers to wait - once Q has at least one item, can wakeup some consumers If Q isn't full, producers can add work quickly to Q, then go back and do something else. That "something else" can be CPU activity, or produce MORE work items for the Q. If you merely return an error like EAGAIN to a producer, some producers might just retry again and again, even in an infinite loop. That also consumes CPU cycles. Such producers (who produce a lot of work for the queue) are called "heavy writers". Rather than allow heavy writers to cycle very fast to produce more work, you put them to sleep in a "wait" state (or send a SIGSTOP to suspend them temporarily). By putting those producers to sleep when Q is full, we are holding back or forcing those heavy writers to stop making more work. This is called "throttling the heavy writers". This in turns frees up the system so that consumers can get their work done, to "drain" the queue. How to inform producers of the status of the async work? - signals, callback functions, message passing, etc. - note: producer may no longer be running (so no one to inform) Often you see in APIs that submit jobs asynchronously, an argument or two for filling in a callback function and/or callback structure (void*). If you set them to NULL, it means you don't want to get informed. Else, the producer, when done, will call your callback fxn w/ data like return success/fail codes, etc. If so, then then the producer/consumer queue also has to record these callback values, so it knows who/what to inform. See for example Linux's AIO (Async I/O) calls for reads/writes. Number of producers that can add work is the size (or depth) of the Q (max). In modern systems, even the max size of the Q can grow/shrink, within some limits or ranges, to accommodate needs vs. system resources. A form of load balancing among queues. No. of consumers can also be tunable: too few and system is underutilized, too many and you waste mem/etc. resources when there's not much work (might cause thrashing of memory or swap space). A rule of thumb is: one consumer per CPU core (a reasonably optimal number of consumers). Scenarios: Assume Q has some fixed size. 1. R(P) >> R(C): in steady state, Q is full, most producers waiting 2. R(P) << R(C): in steady state, Q is empty on avg, most consumers sleeping Assume (e)psilon is a small number 3. R(P) = R(C) + e: meaning R(P) is just slightly faster than R(C). Q is still full in steady state, just takes longer. 4. R(P) + e = R(C): meaning R(P) is just slightly slower than R(C). Q is still empty in steady state, just takes longer. 5. R(P) == R(C): perfectly balanced system, Q size is on avg a fixed number b/t 1..MAX. * Locking issues and types Concurrency is important for improved performance and throughput. Synchronization is important whenever 2 or more entities want to access a shared resource concurrently. Synchronization often requires some sort of a lock L held around a critical section (CS). Lock types (from Linux); 1. Spinlock: the lock requested literally "spins" on the CPU, waiting for the lock owner to release the lock L. Spinlocks chew up lots of CPU if used for too long, so use them when the CS is relatively short. Conversely, spinlocks are very easy to implement and grabbing a lock requires only a few assembly instructions. Spinlocks are "exclusive" locks: only one lock owner can hold the lock L at a time (meaning only one thread can be inside the CS, all others have to wait). 2. Mutex: is also an exclusive lock, only one lock owner at a time. Mutex locks are useful when the CS is relatively long (e.g., I/O has to be performed). Mutex implementation is more complex, because if the lock is held, the ones who try to grab the lock L automatically go to sleep (added to a wait-queue, and woken up later by a consumer). 3. A read-write semaphore (rwsem), built on top of a mutex with counters for number of reads and number of writes. rwsem allows one writer at a time, and multiple readers. Readers are not allowed to change the shared resource, only "read" it; writes can modify that shared resource. This improves throughput when there are multiple readers, as they can all enter the CS at the same time. Rwsem like mutex, is useful when the CS is long enough. 4. Read-Copy-Update (RCU). It has 3 phases: (a) grab a quick spinlock, make a COPY of a shared resource R, say R'. R' is your private copy of the resource, to do with as you please. This is the READ+COPY stages. - if the user of R' doesn't make any change to R', then can just free it, and you're done. (b) if the user of R' makes a change to R', then it's their job to incorporate any changes in R' back into the original shared source R. This is the "UPDATE" phase. - Updating can be complex. Assume if R is a sorted list of names. You'd have to "merge sort" R' into R, avoiding duplicate items. - Complexity: if R *itself* had changed since you made a copy of it, you'd have to do a "3-way merge" (like a conflicting git commit). - Updating has to be done quickly, usually also under a spinlock. Benefits: - the "READ+COPY" phase is quick (spinlock) - "UPDATE" phase is optional, only if R' changed - user of R' can "sit" and work on R' for as long as it wanted. Built in incentive mechanism: if the user modifies R', and it wants to make its life easier in the "UPDATE" (i.e., merge R' into R), then it would be best for the user NOT to "sit" and work on R' for too long. The longer you wait, the higher the chance that R will have changed itself. * Distributed systems Concurrently vs. synchronization are always at odds: a careful trade-off Asynchrony: critical to improved concurrently and throughput (permits better interleaving of threads, no one has to "wait" on long actions to conclude). CAP theorem: https://en.wikipedia.org/wiki/CAP_theorem - Consistency: Every read receives the most recent write or an error - Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write - Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes Paxos: distributed consensus alg, https://en.wikipedia.org/wiki/Paxos_(computer_& Example of system that handles "A" and "P" but not "C": NoSQL "eventual consistency".