Paper Isolate Virtmach
Paper Isolate Virtmach
14 This kind of cloud computing is know as infrastructure as a service (IaaS). With platform
as a service (PaaS), the customer is o↵ered a computer that runs some pre-configured software
stack. And software as a service (SaaS) connects customers with specific applications and/or
databases that run in a datacenter.
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
282 Chapter 10. Isolation
Memory Isolation. Our VMM has a separate memory map VMap V for the
memory of each virtual machine V . These memory maps (by design) will have
disjoint ranges, thereby relocating the memory of di↵erent virtual machines
to non-overlapping memory regions in the underlying processor. In addition,
VMap V is defined in a way that blocks acccesses by V to memory that is
allocated to the VMM.
Software executing in a virtual machine V might itself install a mapping
Mmap by loading Mmap into the V ’s (virtual) MmapReg register. To ensure
that execution in a virtual machine is like execution on the bare hardware, the
memory map used for a program executing in a virtual machine V relocates
memory accesses according to both VMap V and Mmap— an access by V to
address n is relocated to memory location VMap V (Mmap(n)).
Two functions will be convenient in connection15 with defining the combined
memory maps that a VMM constructs and manipulates:
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
10.4. Virtual Machines 283
Time Multiplexing. Our VMM uses entry VMTable[V ] (see Figure 10.4) to
store processor state and other information associated with a virtual machine
V . Interrupts and system-mode execution in virtual machines are efficiently
simulated if the VMM is given exclusive control over the subset Reg S of registers
that determine processor mode, memory mappings, and interrupt handling. For
our hypothetical CPU, Reg S includes mode, MmapReg, IntVector, Enabled, and
the interval timer.
Exclusive VMM control of registers in Reg S is easily achieved if (i) any
instruction for accessing these registers is system-mode (on our hypothetical
CPU they are) and (ii) the processor is in user-mode whenever a virtual machine
is being executed (something we already assume). Conditions (i) and (ii) suffice
because they ensure that a virtual machine’s attempt to update any register in
Reg S will cause a trap (which transfers control to a VMM interrupt handler).
The following invariant characterizes where the current processor state of
each virtual machine V can be found:
• While V is not executing instructions on the underlying processor, its
current processor state is stored in VMTable[V ].ps.
• While V is executing instructions on the underlying processor:
– VMTable[V ].ps.r contains the current value of each register r ∈ Reg S .
– Register r on the underlying processor contains the current value of
each register r ∉ Reg S .
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
284 Chapter 10. Isolation
Hndlr i : procedure
let vm = VMTable[LastRun]
MReal = vm.VMap
in
for r ∉ Reg S do vm.ps.r ∶= IntVector[i ].old .r end
enq(vm.VIntPend [i], vm.ps)
if vm.ps.Enabled[i] = true then
vm.ps ∶= MapApply(MReal , vm.ps.IntVector[i].new )
MapApply(MReal , vm.ps.IntVector[i].old ) ∶= deq(vm.VIntPend [i]))
call Dispatcher
end Hndlr i
To preserve this invariant, the value in each register r ∉ Reg S must be copied
by the VMM to VMTable[LastRun].ps whenever an interrupt causes virtual
machine LastRun to relinquish control of the underlying processor:
Therefore, this code appears at the start of every VMM interrupt handler. Pre-
serving the invariant also requires that a VMM’s Dispatcher , which is invoked
at the end of every interrupt hander and resumes some previously executing
virtual machine V , loads registers r ∉ Reg S using values in VMTable[V ].ps.r.
16 Operation enq appends the value specified by its second argument onto the queue spec-
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
10.4. Virtual Machines 285
Interval Timers. Each virtual machine has its own interval timer. The
VMM simulates these by using two registers from the underlying processor: its
interval timer and a register TimeNow that maintains the current time.18 The
VMM simulation of a virtual machine V ’s interval timer works as follows.
• VMTable[V ].nxtIT stores the time when the next interval timer interrupt
should be signalled on virtual machine V ; a constant MaxTime, larger
than any value ever found in TimeNow, indicates that no interval timer
interrupt currently is scheduled.
• Dispatcher , prior to resuming execution of any virtual machine, invokes
SimTimers (Figure 10.7). SimTimers simulates the occurence of an in-
terval timer interrupt at every virtual machines where sufficient time has
elapsed since that virtual machine’s interval timer was loaded.
18 Most hardware processors have a register like TimeNow. But if such a register is not
available then it can be simulated by a variable TimeOfDay maintained by the VMM. The
VMM records in another variable LastIT the value it last loaded into the interval timer. And
whenever the VMMs handler for timer interrupts is invoked, LastIT is added to TimeOfDay.
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
286 Chapter 10. Isolation
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
10.4. Virtual Machines 287
SimTimers: procedure
for v ∈ 1 .. NumVMs do
let vm = VMTable[v]
MReal = vm.VMap
in
if vm.nxtIT ≤ TimeNow ∧ vm.ps.Enabled[IT ] = true then
vm.nxtIT ∶= MaxTime
MapApply(MReal , vm.ps.IntVector[IT ].old ) ∶= vm.ps
vm.ps ∶= MapApply(MReal , vm.ps.IntVector[IT ].new )
end SimTimers
interrupt is signalled, so the VMM again gets control. To guarantee that such
interrupts will occur, Dispatcher loads the interval timer on the underlying pro-
cessor just prior to resuming a virtual machine.
To resume execution of a given virtual machine V , Dispatcher loads values,
as follows, into the underlying processor’s registers.
• The values being loaded into a processor register r ∉ Reg S are prescribed
by the invariant given earlier (page 283) for the processor state of a virtual
machine V .
• The values Dispatcher loads into IntVector and Enabled ensure that a
VMM-installed interrupt handler receives control whenever an interrupt
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
288 Chapter 10. Isolation
• The value that VMM loads into the interval timer bounds the elapsed
time until some VMM interrupt handler next executes. That value is
calculated, as follows. The length of a time slice for uninterrupted virtual
machine execution is ⌧ ; the time until the next interval timer interrupt is
scheduled to occur at a virtual machine V is VMTable[V ].nxtIT −TimeNow.
So Dispatcher loads the interval timer with the minimum of these, for all
virtual machines.
19 In fact, it is not unusual to have Mmap(L) = L for all locations used by the system
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
10.4. Virtual Machines 289
address translation using Mmap) between that input/output bu↵er and the
source/destination in virtual memory.
A VMM supports input/output device by intercepting input/output instructions—
startIO or memory-mapped I/O—that virtual machines execute.
• Because virtual machines always executes in user-mode, a privilege inter-
rupt will be signalled whenever a virtual machine executes startIO. So
the VMM’s privilege interrupt handler will be invoked in response to the
startIO.
• By excluding memory-mapped I/O addresses from the domain of the mem-
ory map (VMap ) of every virtual machine, any virtual machine’s access
to memory-mapped locations will signal an address-translation interrupt.
So the VMM’s address translation interrupt handler is invoked.
Once the VMM gets control, it executes code to perform the input/output
operation being initiated by the virtual machine. That VMM code is likely to
perform input/output operations on I/O devices connected to the underlying
processor. Drivers in the VMM initiate these operations.
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
290 Chapter 10. Isolation
(ii) A mechanism to invoke the VMM from within that replacement code.
between machine languages is doing translation from one machine’s binary to another’s, hence
the name “binary translation”.
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
10.4. Virtual Machines 291
Thus, when execution of B ′ in step (3) reaches the translation of ◆, control trans-
fers to the translator (thereby returning to step (1)), which resumes converting
B, starting with instruction ◆.
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
292 Chapter 10. Isolation
– Modify Dispatcher for the VMM (page 285) so that its final step
transfers control to the dynamic binary translator, providing as ar-
guments the values in the registers of the virtual machine. (The
value in the program counter serves as o↵set d for Translation and
Execution in Alternation, above.)
Dynamic binary translation increases the size of the trusted computing base
(by adding the binary translator) and increases run-time overhead (since per-
forming the translation takes time and likely involves making a context switch).
A larger trusted computing base seems unavoidable. But we can reduce the
run-time overhead by (i) limiting how much of the code is translated during
execution, and (ii) not translating the same block of instructions anew every
time that block is executed. We now turn to implementing these optimizations.
Dynamic binary translation is not necessary when the following holds.
Binary Translation Elimination Condition. When a non-virtualizable
instruction is executed in user-mode its e↵ect is to advance the program
counter but not to cause other changes to memory or registers.
This condition holds for many commecially-available processors. Moreover, the
preponderance of code running on a computer will be user-mode; only operating
system code executes in system-mode. Thus, when a VMM is implemented using
dynamic binary translation on a processor where Binary Translation Elimination
Condition holds, then only the operating system code in a virtual machine must
incur the run-time overhead of dynamic binary translation.
We demonstrate that when Binary Translation Elimination Condition holds,
then executing the input executable is equivalent to executing the output exe-
cutable for a virtual machine executing in user mode. So producing the output
executable is unnecessary. The interesting case is system-mode instructions,
given that Implementing Virtual Machines by using Binary Translation does
not replace user-mode instructions. There are two cases.
Case 1: A system-mode instruction ◆ that is non-virtualizable. According
to Binary Translation Elimination Condition, execution of ◆ only advances
the program counter when executed on a processor in user-mode. That
behavior is equivalent to what would be observed if ◆ were replaced by
a hypervisor call and the hypervisor call interrupt handler simulated the
user-mode execution of ◆. Dynamic binary translation does exactly that
replacement, so execution of ◆ in the input executable exhibits equivalent
behavior to execution of the output executable.
Case 2: Other system-mode instructions. Such an instruction ◆ will cause
a privilege trap when executed, because virtual machines are executed by
an underlying processor in user-mode. So, when ◆ is executed, Hndlr priv
of Figure 10.6 receives control and executes a routine to simulate ◆. This
behavior is equivalent to what would be observed if ◆ were replaced by
a hypervisor call, because the hypervisor call interrupt handler in Imple-
menting Virtual Machines by using Binary Translation (above) simulates
execution of ◆ using code copied from Hndlr priv .
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
10.4. Virtual Machines 293
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
294 Chapter 10. Isolation
10.4.3 Paravirtualization
Transfers of control between a virtual machine and the VMM slow execution by
disrupting instruction pipelining and by requiring memory caches to be purged.
So performance su↵ers when a VMM implements system-mode instructions by
emulating them in software. In addition, the transparency that makes VMMs
so attractive leads to performance problems.
• Work done in the VMM can negate work done in the operating system.
Re-ordering of transfer requests that a VMM’s disk driver does to enhance
disk performance is likely to undermine request re-ordering done by the
operating system’s driver to enhance disk performance.
Such performance problems suggest that we favor virtual machines where the
instruction set does not require software-emulation by the VMM very often.
Virtual machines implemented using paravirtualization support (i) the same
user-mode instructions as the underlying processor, (ii) a subset of its system-
mode instructions, and (iii) a hypervisor call. The set of supported system-
mode instructions typically excludes system-mode instructions that are expen-
sive to emulate in software and also excludes all non-virtualizable instructions.25
VMM-provided hypervisor calls replace the system-mode instructions that no
longer are available.
Software comprising user-mode instructions does not have to be changed to
run in a virtual machine implemented by paravirtualization. So paravirtual-
ization is transparent to application software. But operating system routines
invoke system-mode instructions; that code would have to be changed for execu-
tion under paravirtualization. In practice, those changes are typically localized
to a handful of routines.
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved
10.4. Virtual Machines 295
Draft of April 28, 2019 Copyright 2019. Fred B Schneider. All rights reserved