* process/OS memory management

The OS manages memory for all processes.

A computer has a certain amount of physical memory, say 1GB.

Assume we're on a 32-bit CPU, that can address at most 2^32 = 4GB RAM.

The OS provides an illusion to all running processes, as if each process can
access the maximum possible amount of RAM -- namely 4GB.  This is called
"virtual memory".  "Virtual" in CS can be more easily understood as "fake"
or "not really there".

The OS has to "map" physical memory pages to virtual ones for a given
process.  This happens on the fly, on demand, and many times per second.

A user process memory is virtual, but a process doesn't know what physical
mem is used by the OS, when, how, or what the actual phys addrs are.

OS keeps a table per process, mapping physical memory pages to a process's
virtual address space.  All mappings are done in units of PAGE_SIZE == 4KB.

OS doesn't handle actual mem addrs but "page numbers".  Page number 0 is for
mem addrs 0..4095; page 1 is for 4096..8191; etc.

Mapping table example: there is a unique table per process.  Assume just
16KB of physical mem (very small), so OS only has 4 physical pages 0, 1, 2,
3.  And let's assume the process has a virtual mem footprint of 10*4KB ==
40,960 bytes

VIRT PAGE#	PHYS PAGE#
---------	---------
0
1
2
3		2	this virt page has a "backing store" of physical mem
4
5		0
6		3
7
8		1
9

Note: OS ensures that process has a contiguous virt addr space.  And OS
will map phys pages on demand as needed ("demand paging").  OS will map phys
mem to virt mem based often on which pages were accessed more recently.  OS
doesn't have to guarantee contiguous allocation of phys mem, only virt one.

Let's say that now, your program touches a mem addr in virt page number 4
(3*4k+(4*4k-1)).  OS will "trap" access to a virt mem addr in a page that
doesn't have a phys page mapped to it, will suspend program temporarily, and
go look for a "free" phys mem page to map to virt page #4.

Problem: what if OS doesn't have a space phys page to map?!  OS will now to
"reclaim" at least one free page from existing pages, and re-assign it to
virt page #4.  Most common alg for page reclamation is called a "least
recently used" (LRU): meaning any pages that's not been used recently
(perhaps the oldest page), can be reclaimed first.  So let's say that virt
page 8 wasn't used in a while, so reclaim phys page 1 (which was mapped to
virt page 8), and reassign it to virt page 4.  Now table will look like
this:

VIRT PAGE	PHYS PAGE
---------	---------
0
1
2
3		2
4		1	reassigned from virt page 8
5		0
6		3
7
8		Swapped out to swap location X
		reclaimed phys page 1, given to virt addr 4
9

In the above case, virt page 8 was assigned to phys page 1.  Phys page 1 may
have important content that needs to be preserved for the program to
continue to run well (esp. if the program will resume using virt page 8 at a
future point in time).  Before taking away a phys page (e.g., 1), the OS
will store the page's content in a special on-disk location, called the
"swap partition/file/disk".  This act of writing out a page's content to
swap is called "swap out".  At this point, the OS will make virt page 8 as
"swapped out".

Now the OS can really reassign phys page 1 to another virt addr in the same
or other process (if another process, page must be zeroed out first, to
preserve privacy).

Next time, user process tries to access any mem addr in virt page 8, the OS
will attempt to (1) find a free phys page to assign to virt page 8; and (2)
reload the swapped out content (called "swap in") from swap location X into
the new phys page that's now assigned to virt addr 8.

Because non-volatile storage media (e.g., hard disk, ssd) are ~1000x slower
than RAM, swapping in/out slows down a program.

Trashing: when system runs many programs that need lots of RAM and there
isn't enough, lots of swap activity happens, entire system is "I/O bound"
(slowed by the I/O system, not RAM).

** cont. here

Some OSs are proactive, and reclaim pages after some period of inactivity.
You notice this when you resume using a computer -- things seem slow for a
few seconds while OS reloads many pages back from swap.

As process doesn't have access all 4GB RAM of virtual addr space, just b/c
it can.  Most programs only get mapped a small part of their addr space, as
needed.  But programs can easily deference any mem addr in the 0..4GB-1
range.  So what happens if program above tries to touch mem address in, say,
page 10 (and only pages 0..9 are mapped): program fails with a "segmentation
violation", gets a SIGSEGV signal, core dumped!

Each virtual to physical mapping has "protection flags" that say HOW this
memory can be accessed:

PROT_READ: bytes in that memory page can be read by a process.

PROT_WRITE: bytes in that memory page can be written/modified by a process.

PROT_EXECUTE: bytes in that memory page can be executed by the processor.
The program counter (PC) register is allowed to point to a mem addr in this
page.

You can combine these 3 bits together, or even use none of them (called
PROT_NONE).

Protection bits are per page, and cannot be set more granularity (dictated by
CPU manufacturer).

Any user process can set prot bits for their own pages, but also the OS can
set prot bit for a running process automatically.  The OS may prevent some
process pages' prot bits from being changed to prevent bugs and hijacking of
processes.

Who enforces these protection bits?  A special piece of h/w called the
Memory Management Unit (MMU), sits b/t the CPU and the main memory.  MMU also
translates b/t virt and phys addresses on the fly.  MMU works as follows:

Assume virt page number 4: 16,536-20,479
Assume phys page number 1: 4,096-8,191

1. when a process is started or reloaded, its page mapping table is loaded
(copied) into the MMU by the OS.

2. Each time a user process tries to access virtual addr V, the instruction
and addr are intercepted by the MMU.  Assume V=16,636 (the 100th byte into
virt page 4).

3. MMU will convert V into its page number: take V and shift it right by 12
bits (2^12=4096), and will gate virtual page 4.

4. MMU will consult its page tables looking for an entry matching the virtual
addr, page 4.

4a. if an entry is NOT found, interrupt CPU and ask processor+OS to send
SEGV to user process.  OS will kill the process, and dump core.

else page was found

5. MMU will check the type of memory request: is it a read, write, or
execute? (looking at actual instruction and PC reg).

5a. If the page protection flags do NOT permit the access (R, W, or X),
generate a SEGV, core dump, etc.

else, page was found AND protection allowed.

6. MMU will translate the virt page to the phys page, adding the offset
inside the page

orig addr requested by user process was V=16,636
- the 100th byte into virt page 4
MMU finds phys page 1 mapped to virt page 4
phys page 1 starts at addr 4,096
add 4096+100=4196

MMU will now modify the request for the CPU to access phys addr 4196 (not
virt add 16,636).

* Process segments setup by OS

OS sets up a process to run and controls several areas of memory
automatically (easier for processes not to have to deal with these
directly).

1. "TEXT" segment.  Not ascii text, but old term to mean "the executable
binary instructions of the program"

E.g., /bin/ls is 38704 bytes long, so need at least 10 x 4K pages to hold
it.  OS will alloc 10 pages, usually starting at page 0, to hold this "TEXT"
segment.

OS has to set prot flags for TEXT segment: at least eXecute, probably Read
too, for some hard coded value.  Not write (else SEGV).

2. after TEXT segment we often see a "statics" segment, for any program
constants.

const int val = 10; // a constant
char * str  = "hello world".  // all double-quoted strings are by def'n constants!

a static segment is setup for Read only protection (no X or W).

3. often (but depends on OS), the next segment is the "HEAP", dynamic memory
allocated.  every time you use malloc(), use use mem from the HEAP segment.
Usually when we malloc, we want mem buffers that we can Read and Write: so
HEAP segments get R+W protection, but not 'X' protection.

Note: TEXT and statics are fixed size.  The HEAP can grown and shrink from
its starting point, often towards higher memory addresses.

4. often (but depends on OS), the last segment is the STACK.  This is for
"Automatic" variables and stack frames, used when code calls other
functions.  Each time you call a function, a stack frame is created w/ args
to pass, return value, and more.  That takes up mem.  Also, each var you
declare in a function is by def'n "automatic":

int foo(int arg)
{
 u_int counter; // automatic var
 void *ptr; // automatic var
 float f; // automatic var

 return 0; // all mem of auto vars is reclaimed.
}

STACK segment also grows and shrinks as needed, OS manages mem for it.
Often, stack starts at the bottom of your virt addr space and grows towards
lower mem address.

Process segments:
- TEXT pages (starts at addr 0)
- Statics (1-2 pages max usually)
- HEAP: starts here, grows towards larger addrs, based on malloc/free
- what's in between? Often a large gap or "hole" in your addr space, where
  nothing is mapped.  If these segments ever got close to each other, the OS
  will terminate the process with ENOMEM.
- STACK: starts at large (4GB) addr and grows upwards towards smaller addrs

Most programs don't need all of the addr space, and there's a huge gap in
the program's addr space b/t the HEAP and STACK.  If a prog tries to access
any mem is this gap, you get a SEGV (and core dump).

In this large gap, is often where memory-mapped files and libraries are
located, using mmap(2).

Most OSs want to ensure that the HEAP and STACK don't grow too much.  See
ulimit(2).  Assume an OS wants to restrict a HEAP or STACK so they don't
grow by more than a certain amount, say 1GB.  OS knows how much you use and
can prevent more from being allocated (slow program, even kill it).
Alternatively, you can setup a special page after the 1GB range, with
PROT_NONE: if any instruction tried to access that page, you get SEGV.

Note a PROT_NONE page will never need a phys page mapped to it, so it's not
wasting actual memory (other than a small entry in the page table for the
process).

* Using PROT_NONE for other protections: redzones

See 11.c code sample