* What's CPP? linked "ld"? "ccom"?

GCC is a wrapper around multiple programs in roughly 4 stages:

1. C Pre-Processor (CPP)
2. C Compiler "proper" (CCOM)
3. Assembler (AS)
4. Linker (LD)

Some compilers combine steps 1+2 into one stage; some compilers call it
something different.  CPP+CCOM may be called "cc1" (C Compiler Stage 1).

CPP:
- expands macros such as #define, and any keyword that starts with '#'
- performs plain string replacements

#define MAX 10

i = MAX; // same as saying i=10

- can also include a whole file
- output of CPP is C code where all the '#' commands are "expanded"

#include "foo.h"

Key: no C syntax checking takes place at this stage, only basic
pre-processor checking.

2. Actual C compiler, checking for syntax errors (CCOM)

- the biggest, most complex stage, takes longest
- output can be raw or high level assembler commands, for example

// reset CPU register R1 to the value 0
MOV R1, 0

// increment register R2 by 1
INC R2

actual output may depend on how C compiler was built and on what architecture
it's running.

Can also perform all sorts of optimizations: for example, inlining functions
to avoid the cost of creating stack frames and making non-local jumps; or
eliminating unused variables; or reusing registers for the same variable.

Output files are often called *.s.

3. Assembler

Takes high-level assembly code, and produces low-level machine code: actual
binary instructions corresponding to individual assembly commands.

Also performs some assembly level optimizations such as better pipelining
for a given architecture.  E.g., produce Intel assembly code optimized for a
specific model of a processor you may be running now.

Produces object code, often *.o files.

4. Linker

Takes one or more *.o files and combines them into a single executable that
can be actually run.

Also looks for missing symbols and tries to fill in the code for those
symbols from libraries.  For example, if you call "printf", linker will look
for the object code for the printf function in one or more libraries
(default libc), and would extract the instructions for printf out of libc,
and add them to the executable it's producing.

Output is an executable that the OS can run, often in a special format that
only the OS understands.  Example formats are called A.OUT, ELF, COFF, etc.
An executable includes all the machine instructions, references to
libraries, placeholders for variables, and any bootstraping code to get
started.  Usually you can run such an executable by

$ /home/joe/test/myprog

or

$ cd ~/test
$ ./myprog

* System call vs. C library call?

When writing C code (or any prog. lang) you can call functions such as

	foo(arg1, arg2, ...)

A function has a name, 0 or more arguments, and may return a value (or be a
null function that doesn't return a value).  The function may also return
data in one of its args (e.g., fill in a buffer with data), and may have
some "side effects" (like creating a file, opening a network socket).

To ease programming, all functions look the same to the programmer.  But
there's two "kinds" of functions:

1. Library call
2. System call

Library calls are just some chunk of machine code that comes from some
library (e.g., libc, libssl, etc.); that code is "copied" or linked into
your executable.  When you binary runs, the instructions for the function in
question are executed in the same virtual "address space" of your running
program.

When calling functions:
- your program will create a stack frame, holding args of the function to
  call, return value, and more
- update the stack pointer register (SP)
- performs a "jump" to the start address where  the code of the function was
  included in your executable
- now it runs in a different scope
- when the function calls 'return', your program jumps back to where it came
  from (the calling code)
- the stack frame that was created is removed from your STACK segment

This happens if you call a function whose code came from a library, or
whether it's a function whose code you wrote yourself.

Each call to a new function creates a stack frame, adds it to the top of
your stack, jump to the function, and upon return, pops one stack frame.
Same if function X calls function Y, then calls function Z; or if some
function calls itself recursively.  Each call -> new stack frame.

System calls are different: they execute in the context of an operating
system (OS).  The code for a system call (syscall) is NOT inside your
program, but rather inside the OS, hidden from view.

Code becomes a system call if it requires privileged access to hardware or
services that only the OS has or can manage.

When your program executes a system call such as "open" to open a file:

A. In your program

- a stub libc function is invoked
- the stub copies the syscall's args to registers R1, R2, R3, ...
- the stub will write a special number for the system call into another
  register, say R6.  Each syscall has a unique number to identify it.
- the stub then calls an interrupt.  E.g., "INT 80h" in Intel/Windows
  systems.  This interrupt is sometimes called a system call "trap".

Key: your program interrupts the system hardware.  OSs try very hard to
respond to interrupts as quickly as possible.

When the OS is interrupted, whatever it was doing, it stops, and jumps to a
special "interrupt handler" service routine.  So YOUR program now stops
executing.

Before servicing the interrupt, the OS preserves the state of the running
program: all of its registers, CPU states, memory states, etc.  The OS then
puts your program to sleep (also called a WAIT state in the OS's scheduler).
The OS preserves your program's state, so it can resume it when the time
comes, right after the point where you interrupted it.

First, we have now switched executing from "user mode" to "kernel mode".
User code runs with limited privileges (to ensure safety), whereas
kernel-mode code runs with full privileges (i.e., can access or "mess up"
any program, any device, any piece of memory).

There's a special CPU register whose value denotes if we're running in
kernel or user mode.

Second, we have switched context.  We use to run the context of my program,
and now we're not -- running in kernel context, or may even be executing
another program -- switching to a context of another program.  This is
called a "context switch".

B. Now we're in kernel mode, running OS code

- an interrupt handler is invoked, specialized for handling system calls

- the handler will read register R4 (pre-designated to hold the syscall
  number), to figure out which system call needs to run.

- then it'll take the registers for the syscall args (R1, R2, ...) and hand
  craft a stack frame corresponding to these registers.

- then it'll invoke a function from a system call table, an array of
  functions corresponding to all the system calls.  e.g.,

void *syscall_table[0]; // corresponds to syscall #1
void *syscall_table[2]; // corresponds to syscall #2
etc.

For the open syscall, the syscall handler will call the actual syscall code
for open, usually called "sys_open(args...)".  There's code for sys_close,
sys_read, sys_write, etc. (all ~400 or so Linux syscalls).  These are also
called system call entry functions because they're the very first code you
can identify in the kernel as processing your system call request.

The actual code in sys_open can be complex (for another class, like
CSE-506).

Now OS runs on behalf of a user program, which asked to open a file.  The
syscall can take a relatively long time to run, esp. if it invokes any I/O
operations.  If any I/O is needed, your program will go to sleep until that
I/O is properly serviced (e.g., the data from a slow disk/network arrives at
the main memory).

Once the I/O is complete, the OS will bring the data required into memory
(e.g., some OS cache like the page/buffer cache).

Next the OS will try and figure out WHO was waiting for that data.  Once the
OS finds that process, it'll copy the needed data to that process's memory,
and try to wake up that process.

A process being woken up does NOT run immediately.  It only moves from the
WAITING scheduler state to the "READY" (or "RUNNABLE") scheduler state.  The
process is then put at the end of a running queue of processes, and when you
turn comes, the scheduler will restore your CPU states (that were preserved a
"long time" ago), and then switch to "user mode", and execute the next
instruction in your process.

At that point, the program resumes right after the "syscall interrupt"
instruction that the program itself issued.

C. Back running the program, and we're back into that libc stub function.

Stub looks at yet another special register (say, R7) where the OS stored the
return value of the system call.

If R7 indicates an error, stub will store the error number in a special libc
global variable that holds (only) errors from system calls: errno.  To find
out what  the error is, you can call functions such as perror, sperrno, or
look it up in <errno.h> or <sys/errno.h>.

Finally, stub code returns and the very next instruction in your C code
executes.

E.g.,

fd = open("foo.txt", ...);
     ^^^^^^^^^^^^^^^^^^^^^

The above "^^^" is the syscall part through the libc stub.  The very next
instruction you'll execute after returning from the syscall is

fd = <whatever return value the system call returned>.

Finally, we're back in user-land, executing our code.


Lesson: system calls are much heavier on the entire system and your program,
b/c of the complexities of executing them.  They impose a greater burden on
your program and the OS.  Thus, minimize how many syscalls you have to
invoke, and when you do call a syscall, make sure you get the most of it.