* How malloc works, the HEAP segment. You call malloc(n), ask to alloc n bytes, get back an addr of the first byte of that allocation; or a NULL if no memory available. When done, you call free() and pass ptr you got before, to release the memory back. malloc(3)/free(3) are library routines, not system calls! Malloc is more of a "middleware" that sits between you (the application) and the OS. Convenient to ask for any length allocation, small or large, variable length. But OS only knows fixed mem units, pages of 4KB (native CPU/mem unit, and protection unit). So malloc library asks ok to give the process more virtual memory, using special system calls called brk(2) and sbrk(2). These syscalls set the "breakpoint" where the HEAP virtual memory of the process ends. You can only ask s/brk to grow/shrink your HEAP segment by multiples of page-size. So malloc lib can ask OS for more 4KB pages for the process, and internally dole out smaller portions as needed. Malloc library maintains an allocation table internally, storing all allocations given to the "caller" (your program) and how long they were. Example: you perform your very first malloc(10) call. Initially, the HEAP may be empty. Malloc library calls sbrk(2) to get a 4KB page. Assume OS returned the page address at page 10, assuming HEAP starts at page 10, or starting 40,960 for up to 4096 bytes thereafter. Table would look like this: START LEN -------+----- 40960 10 Note: void *p = malloc(10); printf("%d\n", p); // print 40960 Now, program can use p[0]..p[9], which are actual virtual memory addresses 40,960 up to 40,969. Next, assume code calls str = malloc(20); Malloc will find space in the 4KB the OS gave your process, consult the malloc allocation table, and find space after the first allocation: START LEN -------+----- 40960 10 (for p) 40970 20 (for str) So malloc will return the start addr 40970 to "str". Code can now use str[0]..str[19] (or virt mem addrs 40970..40989). Process continues: - you call malloc - malloc lib looks for space in its table - if it finds space, it returns addr and records the start+len in its table - if malloc runs out of space... call s/brk and ask OS for more mem (grow the HEAP segment by one or more 4KB units). Next, assume program will call free(p) -> free(40960), passing a number to "free". The malloc lib will look for an entry in its tables with '40960' and will mark it "free" (remove the entry). So now table will look like: START LEN -------+----- (a free "hole" of 10 bytes starting at 40960) 40970 20 (for str) Note, first entry was removed. Recall the initial allocation started at 40960 up to 45055. Also: the actual 10 bytes in the heap are untouched by a free(); and the 'p' var itself is also untouched. That is, neither the allocated buf or the var that held the ptr are reset to zero (or changed in any way). Next, assume program called: p2 = malloc(11); Problem: we only have 10 bytes free at the start of the allocated region. Too small, can't use, have to find 11 bytes elsewhere, meaning after the allocation for 'str': START LEN -------+----- (only 10 bytes avail here, starting at 40960) 40970 20 (for str) 40990 11 (for p2) As your program runs, after a while, you can have many freed zones in your HEAP that are too small for many new allocations: this is called FRAGMENTATION. Note: possible to have so many freed fragments, that total mem unused is large -- large enough for new allocations you ask for in total, but can't find a SINGLE contiguous space for a new allocation you request. In worst case, wasted mem can well exceed 99%! Lots of work/research over the years to come up with strategies to predict sequences of mallocs and frees, so as to minimize fragmentation. Some techniques can guarantee not to waste more than 50% of all memory: A "Buddy" allocator. This allocator works as follows - all mem is broken into powers of two: 1/2, 1/4, 1/8, 1/16 of memory, etc. - when you ask to alloc mem of size N Example: suppose all avail mem is 128 bytes. So only allowed allocations are going to be in units of 128, 64, 32, 16, 8, 4, or 2 bytes. If you ask to alloc, say, 13 bytes, round it up to 16, and return the next avail. unit that's 16 bytes long. If asked for 9 bytes, still will get 16 bytes back -- at most 50% wasted. Inside OSs they can't afford to waste too much memory, if at all. OSs employ "custom memory allocators". Same techniques can be applied in user programs. The idea is that in many cases, the program doesn't need to allocate random sizes but the same size over and over. E.g., in Linux there's a kernel structure called "struct inode" that has an odd size (e.g., 117 bytes). And Linux needs to allocate and free "struct inode" structures many times. Linux designed a custom allocator as follows: 0. Estimate how many max "struct inode"s you'll need. Assume 100 to start with. 1. Allocate N pages of 4KB contiguous space to hold 100 "struct inodes" - 117*100=11,700 - round that up to the next power of 4KB, need at least 3 pages or 12,288. - 12288 bytes will fit 105 * "struct inodes" 2. Now, reserve a single BIT for each possible inode allocation (a bitmap) - a 0 means the item is free - a 1 means the item is in use 3. need room for 105 bits, or 13.125 bytes, round up to 14 bytes - total bytes needed is 14 + 12285 = 12,299 - Oops: no more room. So either get another page, or reduce the total no. of avail allocations from 105 to 104 - 104 * 117 = 12168 - 104 / 8 = 13 bytes exactly - need a total of 12,181 bytes (leaving 7 bytes unused in the 3 pages). 4. to use such an allocator, you don't call "malloc" or "free" but you design a new wrapper: - ptr = malloc_inode(void) // internally will malloc(sizeof(struct inode)) - free_inode(ptr) 5. store the allocation bitmap (13 bytes) at the START of the 3 pages - you can check each bit using shift operations, to locate a "free" inode - or to mark an inode as used/free 6. you can use fixed math and simple shift ops to alloc/free such fixed size inode structures. - very useful if you anticipate a known size of allocations - also useful to "reserve" space for a certain no. of allocations 7. what happens if you fill up all 104 "struct inodes"? - you can allocate another group of pages (a "memory pool") If you want to avoid fragmentation: use other languages like Java, or call sbrk yourself and manage mem yourself (e.g., Apache HTTPD). Why can't C "compact" all memory allocations, so as to free up all unused memory? That would require moving memory buffers around inside the HEAP. The problem is: the program itself (outside of malloc) has references to pointers (starting mem addrs in the heap). Alas, it's not enough to know that you stored a malloc'd addr in some variable "ptr". You could have done arithmetic on it (e.g., *ptr++) or copied it to another variable (aliases ala "ptr2 = ptr1"). While possible and has been demonstrated in research projects, the end result was a fairly slow C runtime (might as well use Java). Issues with malloc'd memory: see 13.c * How to detect memory leaks? 1. ensure every malloc (or allocation function) has exactly one well defined free(). 2. avoid allocation and freeing at different logical layers or different modules. 3. use an assortment of tools (e.g., valgrind, etc.) and libraries (malloc debugging libraries). 4. Otherwise, it's hard to catch memleaks If it's a big leak (e.g., 4KB each time your process a user request in a Web server). This means you'll be wasting memory rather quickly (imagine 1000 requests per second, so wasting ~4MB/s). The program will run slowly as you stress the OS and memory subsystem. Eventually malloc will start to return NULL. Note: some OSs have a self-preservation mechanism called the Out-of-Memory Killer (OOMK). What about a slow memory leak?! Suppose you leak 1 byte every minute. This means it could take weeks and even months before it's noticed. Besides, we almost always upgrade and reboot our systems every few weeks. But on production systems that run for years, this'll be noticed. Idea: 1. record the amount of virt mem used by your program every N minutes. 2. keep such records for a long period of time 3. you'll notice that mem usage goes up and down in spikes of activity 4. plot time (x axis) vs. mem consumption (y axis) 5. run linear regression through the points - if the slope is near 0 -> no leak - if the slope is > 0 -> you may have a leak - the greater the slope, the faster your leak is ############################################################################## * NEXT malloc bugs