@Part( internals, root "manual" )
@Chapter(Kernel Internal Structure)@Label(internals)
 
The kernel is implemented as a simple monitor.
It executes logically in its own address space
in supervisor mode with its own code, data and stack.@Index[supervisor mode]
It is invoked by trap operations and interrupts.
When a process executes a kernel operation or an interrupt trap is taken,
the kernel executes on the kernel stack.@Index[kernel stack]

@Section(Teams)
@Index[Teams]

Each team is represented by a team descriptor record (TD)@Index[team descriptor]
that describes the team space, records the root process
of the team, user associated with the team, team priority level, etc.
A machine-dependent portion of the team descriptor describes
the team's memory space.

@Section(Processes)
@Index[processes]

For each process, the kernel maintains a 
process descriptor record (PD)@Index[process descriptor]
that contains the process state and sundry information about the process.
When a process is running,
a variable Active points at the process descriptor of the 
currently@Index[Active] active process.

@Section(Kernel Synchronization)
@index[synchronization]

The kernel is synchronized internally by a combination of
scheduling conventions and interrupt 
masking.@Index[interrupt masking]@Index[scheduling]
The conventions are:
@begin(itemize)
Both kernel trap-invoked and interrupt-invoked operations
only add or remove processes from the list of ready processes.
They cannot block a process in the middle of a kernel operation.
The clock interrupt routine that may change the state of a process 
blocked@Index[clock]
on a remote or nonexistent process only does so if it did not interrupt
a kernel operation.
Similarly, ethernet interrupts are disabled during the execution of a kernel
operation to prevent remote interkernel packets from interfering with the
execution of the kernel operation.
 
A process switch occurs at the end of a kernel operation
if the active or invoking process is no longer the highest priority
ready process.
The process switch occurs at the point of return from the kernel trap
handler after executing the kernel trap.

A process switch occurs at the end of the execution of an
interrupt service routine if the active process is no longer
the highest priority ready process AND the interrupt servicing
did not interrupt a kernel operation.
That is, a process switch cannot occur in the middle of a kernel
operation due to an interrupt even though the interrupt can
otherwise be serviced.
@end(itemize)
The net result is that a process executes a kernel operation
indivisibly with respect to other processes until it blocks.
However, the highest priority ready process is allocated@Index[priority]
the processor whenever the processor is not in supervisor state.
Masking of interrupts is used at crucial points in manipulation of
the ready queues and process
switching so that interrupt routines do not interfere.

@Section(Interrupt Routines)

Interrupts are handled by first invoking a simple assembly
language routine that saves some registers and then calls
a C procedure associated with that interrupt level,@Index[interrupts]
possibly passing some arguments.
A macro ``Call@us()inthandler'' generates the required 
assembly@Index[Call@us()inthandler]
language routines that call the C procedure it is passed
as an argument.
Interrupt-invoked routines are assumed to be short and
do little in interacting with processes other than
possibly readying a process.

@Section(Kernel Traps)
@Index[kernel traps]

An assembly-language module handles trap instructions,
invoking the specified kernel operation and handling the return.
On a trap, it moves the arguments onto the kernel stack
and calls the specified kernel operation as a C function.
On return, it moves the return value back to the process's stack if necessary
and checks for a higher priority ready process.
If there is one, it switches to the highest priority process.
If the active process is still the highest priority ready process,
the active process is allowed to continue execution
at the instruction after the trap instruction in its code segment.

@section(Kernel Process)
If the specified pid fails to validate on a Send, the Send routine
checks whether it is the pid of the kernel process or of the device
server process.  If the kernel process pid was specified,
Send calls the SendKernel routine to perform the requested operation.
Thus, the ``kernel process'' code is executed by the process invoking
the operation, not a separate process running in the kernel.
The message format and the request codes the kernel process supports
can be found in <Venviron.h>.
The kernel process identifier is a global variable,
Kernel@us()Process@us()Pid, set at the beginning
of each team's execution.

@section(Device Server Process)
A Send to the device server process results in
Send calling the SendDevice routine to perform the requested operation.
Thus, the device server code is executed by the process invoking
the operation, not a separate process running in the kernel.
A process that is forwarded to the device server has its finish-up
function (see below) set to SendDevice, and is readied, so that
it will begin executing in SendDevice as soon as it reaches the head
of the ready queue.

@Section(Process Switching)
@Index[process switching]

All process switches occur in the macro function Switch@Index[Switch]
that switches from the currently active process
to the process at the head of the ready queue.
Each process is created with its state initialized to start
it at the initial program counter in its team space when it is readied.
Switch relies on there always being a ready process to execute
(i.e. non-empty ready queue).
This is guaranteed by the presence of an ``idle'' process@Index[idle process]
that is always ready and executes the processor stop or idle instruction.

Interrupt-invoked routines execute as ``involuntary'' asynchronous
function calls made by the currently active process and thus
can also use Switch.

Process switches always occur upon exit from the kernel, never
in the middle of a kernel routine.  Thus, the kernel only requires
one stack, not a separate kernel stack for each process.  If
there will still be some work to be done on a kernel operation when
a process is unblocked, the kernel routine that blocks it sets
the @i[finish-up] function field in the process's state record.
If this field is non-zero when a process is unblocked, the specified
function is called before the process exits
the kernel.  A finish-up function can block the process again and
set another finish-up function if necessary.

@b[Note:] The kernel implementation described so far should support
a number of different types of kernels.
Also, this basis of trap and interrupt handling plus process switching,
device management, and memory management
represents most of the machine-dependent code in the kernel.

@Section(Processor Allocation)

The strict priority-based processor allocation is 
implemented@Index[processor allocation]
efficiently by maintaining a queue of ready processes in
order of priority, highest priority first.@Index[priority]
A state field in the process descriptor indicates the process
is @i[ready] (and thus in this list) or else the state in which
it is blocked.
Process switching incorporating this priority-based allocation
and ready queue management is implemented by two (internal)
primitives.
@begin(Description)
Removeready(pd)@\Remove the specified process from the ready 
queue.@Index[Removeready]
The active process continues to execute until it exits the kernel
even if it has just removed itself from the ready queue.

Addready(pd)@\Add the specified process (descriptor) to the ready 
queue@Index[Addready]
in order of priority, after all processes of the same priority as this
process.
@end(description)

@Section(Process Creation and Destruction)
@Index[process creation]
@Index[process destruction]
Unused process descriptors are maintained in a queue.
When a process is created, a process descriptor is removed
from the queue, assigned a process identifier,
and initialized to the specified priority, awaiting reply state,
creator's team, etc.

When a process is destroyed,
it is removed from any system queues, such as the ready queue
or any message queues (one major use of the PD state field is
indicating presence in a queue),
the process identifier is invalidated and all its descendants
are destroyed similarly.
The resulting free process descriptors are added to the end of
the queue of unused process descriptors.
The clock interrupt routine is charged with checking for processes
blocked on non-existent processes (one per clock interrupt)
so the process destruction mechanism need not worry about this.

@Section(Message Primitives)
@Index[message primitives]

While a message implementation normally requires independent kernel
message buffers,
the semantics of the message primitives in this kernel allow
the message buffer to be statically associated with the process descriptor
so we include it as part of the same C struct.
Thus, a message is queued at a receiver by queuing the process descriptor
of the sender, saving on extra space for sender identifier, etc. plus
time to map to the PD of the sender for unblocking it.

Sending to the kernel device server or to the kernel process
is handled by checking
the pid of Send to see if it specifies the kernel device server
or the kernel process when the pid fails to validate as a real process.
The SendDevice or SendKernel routine is then called directly to implement
the kernel device server or kernel process.

@Section(Time Primitives)
@Index[time primitives]
Processes delaying using Delay are maintained in a queue starting@Index[Delay]
at Delayq@us()head ordered by increasing time to unblock.
The time before a process unblocks is stored in its blocked@us()on
field in terms of the number of clock interrupts it must delay
after the process before it in the queue is unblocked.

@Section(Distributed Operation)
@Index[distributed operation]

The process identifier contains an indication of the host in its
16 high-order bits.
When an operation is invoked that specifies a process identifier
that fails to validate locally,
it is assumed to be a remote process.
The operation then invokes a ``nonlocal'' version of the operation
that formats a network message and transmits it to the workstation
host specified by the process identifier.
The primary interface to the network is the WriteKernelPacket routine.
@Index[WriteKernelPacket]

In the case of GetPid,@Index[GetPid]
a message is broadcast requesting the logical id to pid mapping.

When a process is blocked sending to a remote process,
the message is retransmitted periodically by the clock interrupt
routine until a reply is received.
The Send fails after some number of retransmissions if 
no ``breath of life''@Index[Send]
packets have been received from the remote host in that time.

A message received on a workstation from a remote process causes
a process descriptor to be allocated to store the message and
make it appear as a local message to the rest of the kernel.
A process descriptor used in this fashion is called an @i[alien].@Index[alien]
Aliens are destroyed an appropriate time interval after the Reply
message is sent. (This interval is 0 for idempotent requests.)

This description is far from complete.
For a fully detailed discussion of the interkernel protocol,
see @i[The Distributed V Kernel and Its Performance on Diskless
Workstations], by David R. Cheriton and Willy Zwaenepoel, in
Proceedings of the 9th Symposium on Operating System Principles,
October 1983 (also available as Technical Report STAN-CS-83-973,
Computer Science Department, Stanford University).


