OS Organization with Virtualization

Scribes: Peter Chang and Ruolin Fan

Hard Modularity

"Don't trust other modules" because of...

Bugs
Adversaries

There are two techniques to implement hard modularity

Client-Service
```
    // Sample code that implements a factorial call 

    // Client Code:
    send (fact_port,{"!",6}); //example: we want to compute 6!
    receive (fact_port, response);
    if (response.opcode == "ok")
	print (response.val);
    else
    print ("error %d", response.errorcode);


    // Server Code:
    for (;;) {	    //go in loops and wait for request from the client
	receive (fact_port,request);
	if (request.opcode == "!") {
	    n = request,val;
	    for (int i = 2; i <= n; i++) 
		n *= i;
	    response = {"ok", n};
	} else 
	    response = {"ng", 29}; //error opcode "ng" and error code 29
	send (fact_port, response);
    }
```
Some pluses and minuses for this kind of technique:
+ Limited the error propagation
- No shared states
- Client loops will not compromise the server. Vice versa.
- Uses more resources
- Requires multiple machines (or virtual machines)
- Interpreting messages slows down main computations (marshalling)
- Less Security
- Messages can be intercepted, or faulty messages can be sent.
- Example: Kaminsky DNS design flaw
- Harder to deploy
- More Complex
Overall: although this technique successfully solved the problem at hand, its implementation has many drawbacks that, if used for small projects like calculating a factorial function, is not feasible sense because of its resource-hogging and complexity.
Virtualization:
To implement OS virtualization, the OS gives a "pretend machine" to the application. This way, the application can not inadvertently modify sensitive system data directly. The computer in which the application runs is virtualized into components like virtual memory and a virtualized CPU. Any action that the application does that requires modification of system data must be done through a "middle man" such as the system kernel. One simple implementation is an X86 emulator. The OS runs an application inside emulator, which would check all the memory references and IO instructions.

Some pluses and minuses:
+ No direct access to I/O or sensitive devices
+ Can catch infinite loops inside application
- The virtual machine can identify application loops and switch control to another application.
- Slower
- Traditionally by a factor of approximately 10
- Client loops will not compromise the server. Vice versa.
Since virtualization using an emulator very slow, we try to achieve better performance through hardware level control structures such as the virtualizable processor

There are two ways to call the kernel:
1. Ordinary function calls
  - This way is fast, supported, but unsafe
  - Very popular in embedded applications
2. Protected transfer of control
  - When an unsafe instruction is executed, the hardware traps, and the kernel takes control (the kernel can run any instructions)
But what is a kernel? The kernel is the key part of an operating system that can execute any instructions; it is the core of the operating system.

Hardware Trap

Possible causes:

Hardware device interrupt
CPU timer
Invalid instruction

The kernel keeps an interrupt vector like the following, which is made up of 265 words, with each word being a pointer to a privileged instruction that it can execute.

The trap executes as follows:

Push the following things onto stack (note: the kernel stack, not application stack):

ss	Stack Segment (identities stack)
esp	Extended Stack Pointer
eflags
cs	Code Segment
eip	Instruction Pointer (return address)

eip = iv[trap#]

error code

More details of trap
A RETI instruction at the end of the kernel stack "returns" to the program that made the syscall

Figure 3: the standard protection system, or hierarchy of privileges.

So how do we do syscalls?

One solution: while(1); or for(;;)

This will catch the kernel's attention when the kernel does one of its regular interrupts (around every 10 ms) because of the infinite loop. However, this approach is far too slow.

Another solution: *(char*)0 = 'x';

Referencing invalid addresses such as trying to place something in the forbidden zone will cause a trap. However, it is too likely to be accidental.

The proper way to do syscalls in X86:

Use the privileged interrupt INT 0X80. This generates a trap of type 128. INT 0X80 is a one-byte instruction that specifically interrupts the kernel. For example, read(a,b,c); Will internally call INT 0X80 to make the syscall.

But how are a, b, and c passed?

  %ecx c    //This is the assembly code for "read"
  %ebx b    // ... read.s
  %eax a
  INT 0X80
  Result %eax

Overall, syscall is like a function call except:

It crosses protection domains
More data must be saved/restored
It's slower
It has hard modularity

Components of the machine that may need virtualization

ALU
- You must be careful about saving/restoring the ALU state when crossing protection domains. (e.g.: V C N Z flags)
- Otherwise, clients should have full access the the ALU.
Registers
- Most registers are full-access.
- Some registers are privileged. (e.g.: Privilege register, Virtual memory control)
Cache
- Virtualization of the cache is not much of a protection issue as it is a performance issue.
- The cache is meant to be fast and virtualization just slows down what limited cache there is.
Primary Memory (RAM)
- User memory is accessible at full speed.
- However, the system should trap if the user tries to access "forbidden zones".
I/O Devices
- Typically privileged.
- Exceptions: Graphical display output for streaming video and games.

What can go wrong?

Infinite Application Loops: An application can encounter an infinte loop due to any number of reasons. In this case, the kernel is often programmed to provide a forced interrupt every 10 ms in which it can decide to force an application to stop running and transfer cpu resources to another application. This prevents infinite loops in user applications to crash the system.
Illegal Application Memory Access: An application can refer to illegal memory locations. Systems employ memory management and protection mechanisms in the kernel to prevent applications from actualling doing damage in the illegal memory addresses.
Infinite Kernel Loops: A kernel that encounters an infinite loop will have no way to resume any other process in the computer. This results in a complete system failure since there is nothing to "interrupt" the kernel itself.
Illegal Kernel Memory Access: A kernel that tries to access an invalid memory address will never be prevented from doing so. This almost always results in a system failure as errors resulting from the illegal access is propogated throught the system.
Simultaneous register access: An application can end up accessing the same registers used in another application. To help prevent such situations, we resort to context switching between applications so that an application's registers can be overwritten and reused when the application is not running.

Context Switching

Context switching is the act of suspending one process and resuming another. This is what schedule() does in wensyos1. This is done by saving and restoring an application's registers such as the eax, ebp, and esp registers. These registers are found in the process' "process descriptor". Each process has a process descriptor which is stored in the OS' "process descriptor table".

However, as applications and hardware get more complex, a system may require more registers to run each application. To solve this problem, we split process registers into "common" (eax, ebp, eip) and "uncommon" (Floating Point Registers) parts, saving only the common registers and specified uncommon registers and letting the other registers be reused.

Figure 4: Process Description Tables.

Virtual Memory Addressing

Virtual memory allocation is implemented with the aid of a Virtual Memory Manager (VMM). The VMM is a kernel process that allows user processes to believe that they have been given one neatly allocated block of RAM in which to run off of. In reality, the VMM may translate the memory addresses used by a process into an actual physical memory address that, unlike figure 5, may be scattered and fragmented throuought the physical RAM. Although this step of translating memory addresses may slow down the overall system because of the addition of another layer of complexity, this feature resolves the important issue of programs accessing "forbidden" memory addresses. In effect, the VMM can completely hide the memory used by, say, the kernel itself.

Figure 5: Virtual Memory Management.

Device Access

Robustness is a very important issue when dealing with hardware device access. Every system is built from many different devices with lots of variation and "weird" features between devices. This variation requires a robust interface to protect each device from each other and the user. We do not want to have situations in which a piece of code may accidentally or intentionally set something on fire. More realistically, we do not want code to be modifying sensitive data on a hard drive unknowingly.

As programmers, we want a clean interface to intaract with devices that will conform to our standards of abstraction and modularity. We don't want to have to deal with the low level details of how to read and write to and from a device, let alone how to deal with the differences between different devices and how reading and writing applies to them.

Two Classes of Devices

Asynchronous (Streaming)
- Network
- Mouse
- Keyboard
Synchronous (Random Access)
- Disk Storage
- Memory

Each class of devices has their own set of device access operations that make sense to the devices in that class. For example, it would not make sense for a program to use lseek() on an asynchronous device such as a mouse. However, both the asynchronous and synchronous devices listed above make sense to have a read() operation.