* process/OS memory management The OS manages memory for all processes. A computer has a certain amount of physical memory, say 1GB. Assume we're on a 32-bit CPU, that can address at most 2^32 = 4GB RAM. The OS provides an illusion to all running processes, as if each process can access the maximum possible amount of RAM -- namely 4GB. This is called "virtual memory". "Virtual" in CS can be more easily understood as "fake" or "not really there". The OS has to "map" physical memory pages to virtual ones for a given process. This happens on the fly, on demand, and many times per second. A user process memory is virtual, but a process doesn't know what physical mem is used by the OS, when, how, or what the actual phys addrs are. OS keeps a table per process, mapping physical memory pages to a process's virtual address space. All mappings are done in units of PAGE_SIZE == 4KB. OS doesn't handle actual mem addrs but "page numbers". Page number 0 is for mem addrs 0..4095; page 1 is for 4096..8191; etc. Mapping table example: there is a unique table per process. Assume just 16KB of physical mem (very small), so OS only has 4 physical pages 0, 1, 2, 3. And let's assume the process has a virtual mem footprint of 10*4KB == 40,960 bytes VIRT PAGE# PHYS PAGE# --------- --------- 0 1 2 3 2 this virt page has a "backing store" of physical mem 4 5 0 6 3 7 8 1 9 Note: OS ensures that process has a contiguous virt addr space. And OS will map phys pages on demand as needed ("demand paging"). OS will map phys mem to virt mem based often on which pages were accessed more recently. OS doesn't have to guarantee contiguous allocation of phys mem, only virt one. Let's say that now, your program touches a mem addr in virt page number 4 (3*4k+(4*4k-1)). OS will "trap" access to a virt mem addr in a page that doesn't have a phys page mapped to it, will suspend program temporarily, and go look for a "free" phys mem page to map to virt page #4. Problem: what if OS doesn't have a space phys page to map?! OS will now to "reclaim" at least one free page from existing pages, and re-assign it to virt page #4. Most common alg for page reclamation is called a "least recently used" (LRU): meaning any pages that's not been used recently (perhaps the oldest page), can be reclaimed first. So let's say that virt page 8 wasn't used in a while, so reclaim phys page 1 (which was mapped to virt page 8), and reassign it to virt page 4. Now table will look like this: VIRT PAGE PHYS PAGE --------- --------- 0 1 2 3 2 4 1 reassigned from virt page 8 5 0 6 3 7 8 Swapped out to swap location X reclaimed phys page 1, given to virt addr 4 9 In the above case, virt page 8 was assigned to phys page 1. Phys page 1 may have important content that needs to be preserved for the program to continue to run well (esp. if the program will resume using virt page 8 at a future point in time). Before taking away a phys page (e.g., 1), the OS will store the page's content in a special on-disk location, called the "swap partition/file/disk". This act of writing out a page's content to swap is called "swap out". At this point, the OS will make virt page 8 as "swapped out". Now the OS can really reassign phys page 1 to another virt addr in the same or other process (if another process, page must be zeroed out first, to preserve privacy). Next time, user process tries to access any mem addr in virt page 8, the OS will attempt to (1) find a free phys page to assign to virt page 8; and (2) reload the swapped out content (called "swap in") from swap location X into the new phys page that's now assigned to virt addr 8. Because non-volatile storage media (e.g., hard disk, ssd) are ~1000x slower than RAM, swapping in/out slows down a program. Trashing: when system runs many programs that need lots of RAM and there isn't enough, lots of swap activity happens, entire system is "I/O bound" (slowed by the I/O system, not RAM). ** cont. here Some OSs are proactive, and reclaim pages after some period of inactivity. You notice this when you resume using a computer -- things seem slow for a few seconds while OS reloads many pages back from swap. As process doesn't have access all 4GB RAM of virtual addr space, just b/c it can. Most programs only get mapped a small part of their addr space, as needed. But programs can easily deference any mem addr in the 0..4GB-1 range. So what happens if program above tries to touch mem address in, say, page 10 (and only pages 0..9 are mapped): program fails with a "segmentation violation", gets a SIGSEGV signal, core dumped! Each virtual to physical mapping has "protection flags" that say HOW this memory can be accessed: PROT_READ: bytes in that memory page can be read by a process. PROT_WRITE: bytes in that memory page can be written/modified by a process. PROT_EXECUTE: bytes in that memory page can be executed by the processor. The program counter (PC) register is allowed to point to a mem addr in this page. You can combine these 3 bits together, or even use none of them (called PROT_NONE). Protection bits are per page, and cannot be set more granularity (dictated by CPU manufacturer). Any user process can set prot bits for their own pages, but also the OS can set prot bit for a running process automatically. The OS may prevent some process pages' prot bits from being changed to prevent bugs and hijacking of processes. Who enforces these protection bits? A special piece of h/w called the Memory Management Unit (MMU), sits b/t the CPU and the main memory. MMU also translates b/t virt and phys addresses on the fly. MMU works as follows: Assume virt page number 4: 16,536-20,479 Assume phys page number 1: 4,096-8,191 1. when a process is started or reloaded, its page mapping table is loaded (copied) into the MMU by the OS. 2. Each time a user process tries to access virtual addr V, the instruction and addr are intercepted by the MMU. Assume V=16,636 (the 100th byte into virt page 4). 3. MMU will convert V into its page number: take V and shift it right by 12 bits (2^12=4096), and will gate virtual page 4. 4. MMU will consult its page tables looking for an entry matching the virtual addr, page 4. 4a. if an entry is NOT found, interrupt CPU and ask processor+OS to send SEGV to user process. OS will kill the process, and dump core. else page was found 5. MMU will check the type of memory request: is it a read, write, or execute? (looking at actual instruction and PC reg). 5a. If the page protection flags do NOT permit the access (R, W, or X), generate a SEGV, core dump, etc. else, page was found AND protection allowed. 6. MMU will translate the virt page to the phys page, adding the offset inside the page orig addr requested by user process was V=16,636 - the 100th byte into virt page 4 MMU finds phys page 1 mapped to virt page 4 phys page 1 starts at addr 4,096 add 4096+100=4196 MMU will now modify the request for the CPU to access phys addr 4196 (not virt add 16,636). * Process segments setup by OS OS sets up a process to run and controls several areas of memory automatically (easier for processes not to have to deal with these directly). 1. "TEXT" segment. Not ascii text, but old term to mean "the executable binary instructions of the program" E.g., /bin/ls is 38704 bytes long, so need at least 10 x 4K pages to hold it. OS will alloc 10 pages, usually starting at page 0, to hold this "TEXT" segment. OS has to set prot flags for TEXT segment: at least eXecute, probably Read too, for some hard coded value. Not write (else SEGV). 2. after TEXT segment we often see a "statics" segment, for any program constants. const int val = 10; // a constant char * str = "hello world". // all double-quoted strings are by def'n constants! a static segment is setup for Read only protection (no X or W). 3. often (but depends on OS), the next segment is the "HEAP", dynamic memory allocated. every time you use malloc(), use use mem from the HEAP segment. Usually when we malloc, we want mem buffers that we can Read and Write: so HEAP segments get R+W protection, but not 'X' protection. Note: TEXT and statics are fixed size. The HEAP can grown and shrink from its starting point, often towards higher memory addresses. 4. often (but depends on OS), the last segment is the STACK. This is for "Automatic" variables and stack frames, used when code calls other functions. Each time you call a function, a stack frame is created w/ args to pass, return value, and more. That takes up mem. Also, each var you declare in a function is by def'n "automatic": int foo(int arg) { u_int counter; // automatic var void *ptr; // automatic var float f; // automatic var return 0; // all mem of auto vars is reclaimed. } STACK segment also grows and shrinks as needed, OS manages mem for it. Often, stack starts at the bottom of your virt addr space and grows towards lower mem address. Process segments: - TEXT pages (starts at addr 0) - Statics (1-2 pages max usually) - HEAP: starts here, grows towards larger addrs, based on malloc/free - what's in between? Often a large gap or "hole" in your addr space, where nothing is mapped. If these segments ever got close to each other, the OS will terminate the process with ENOMEM. - STACK: starts at large (4GB) addr and grows upwards towards smaller addrs Most programs don't need all of the addr space, and there's a huge gap in the program's addr space b/t the HEAP and STACK. If a prog tries to access any mem is this gap, you get a SEGV (and core dump). In this large gap, is often where memory-mapped files and libraries are located, using mmap(2). Most OSs want to ensure that the HEAP and STACK don't grow too much. See ulimit(2). Assume an OS wants to restrict a HEAP or STACK so they don't grow by more than a certain amount, say 1GB. OS knows how much you use and can prevent more from being allocated (slow program, even kill it). Alternatively, you can setup a special page after the 1GB range, with PROT_NONE: if any instruction tried to access that page, you get SEGV. Note a PROT_NONE page will never need a phys page mapped to it, so it's not wasting actual memory (other than a small entry in the page table for the process). * Using PROT_NONE for other protections: redzones See 11.c code sample