Chap 1
Chap 1
This flexibility is essential for data centers, cloud services, and large-scale
computing systems.
• In a traditional computer system, the host operating system is designed specifically for the underlying
hardware. However, with virtualization, multiple virtual machines (VMs) can run on the same physical
hardware, each with its own guest OS, independent of the host OS.
• This is achieved through a virtualization layer, known as the hypervisor or Virtual Machine Monitor
(VMM). The hypervisor manages the allocation of virtualized CPU, memory, and I/O resources to
different VMs, allowing multiple applications to run simultaneously on the same hardware.
• The virtualization software functions by creating an abstraction of physical hardware, enabling VMs to
Dr.use virtual
KK || Dept. resources
of ISE, SIT, efficiently. Virtualization can be implemented at different 9levels of a computer
Tumkur || KEERTHANKUMARTG@SIT.AC.IN
system, including:
Levels of Virtualization Implementation
This approach enables efficient resource sharing by virtualizing key components such as
processors, memory, and I/O devices.
This concept was first implemented in IBM VM/370 in the 1960s and has since evolved with
modern hypervisors like Xen, which enables virtualization of x86-based systems to run Linux
and other operating systems efficiently.
OS-level virtualization acts as an abstraction layer between the traditional OS and user
applications, enabling the creation of isolated containers on a single physical server.
These containers function like real servers, allowing OS instances to efficiently utilize
hardware and software in data centers.
This approach is widely used in virtual hosting environments to allocate hardware resources
among multiple mutually distrusting users.
Most applications interact with the system using APIs exported by user-level
libraries rather than relying on lengthy system calls from the OS.
Since many systems provide well-documented APIs, these interfaces become
a viable candidate for virtualization.
This approach works by controlling the communication link between
applications and the system through API hooks.
A notable example is WINE, which enables Windows applications to run on
UNIX hosts.
This layer provides an abstraction of a VM, enabling programs written in an HLL to run on
it. Examples include Microsoft .NET CLR and Java Virtual Machine (JVM).
Other forms of application-level virtualization include application isolation, application
sandboxing, and application streaming.
These techniques wrap the application in a layer isolated from the host OS and other
applications, making it easier to distribute and remove from user workstations.
An example is the LANDesk application virtualization platform, which deploys software
as self-contained, executable files in an isolated environment, eliminating the need for
installation, system modifications, or elevated security privileges.
Performance Considerations
• A VMM should efficiently allocate resources to multiple VMs.
• Total resource demand may exceed that of the physical machine.
• Time-sharing OS behavior is not classified as a VMM.
.(Table 3.2) compares four hypervisors and VMMs that are in use today
Different VMMs and hypervisors vary in performance, control, and efficiency.
• Hardware-assisted virtualization improves modern hypervisors like VMware, Xen, and
Hyper-V.
(2) it is not possible for a program to access any resource not explicitly allocated to it;
and
(3) it is possible under certain circumstances for a VMM to regain control of resources
already allocated.
A key limitation of OS-level virtualization is that all VMs on a single container must
use the same OS family. While different distributions are allowed, mixing OS types
(e.g., Windows on a Linux-based container) is not possible.
This poses a challenge in cloud computing, where users may have diverse OS
preferences, requiring support for both Windows and Linux environments.
This type of virtualization can create execution environments for running alien programs on a platform
rather than creating a VM to run the entire operating system.
API call interception and remapping are the key functions performed. This section provides an overview of
several library-level virtualization systems: namely the Windows Application Binary Interface (WABI), lxrun,
WINE, Visual MainWin, and vCUDA,
Before virtualization, the operating system manages the hardware. After virtualization,
a virtualization layer is inserted between the hardware and the operating system.
Depending on the position of the virtualization layer, there are several classes of VM
architectures, namely the hypervisor architecture, paravirtualization, and host-based
virtualization.
The hypervisor is also known as the VMM (Virtual Machine Monitor). They both
perform the same virtualization operations.
The hypervisor software sits directly between the physical hardware and its OS. This
virtualization layer is referred to as either the VMM or the hypervisor.
The hypervisor provides hypercalls for the guest OSes and applications.
Example: HYPERVISOR_memory_op() is a hypercall in Xen used for memory management.
Depending on the functionality, a hypervisor can assume a micro-kernel architecture
like the Microsoft Hyper-V. Or it can assume a monolithic hypervisor architecture like
the VMware ESX for server virtualization.
Fig. 3.5 The Xen architecture’s special domain 0 for control and
I/O, and several guest domains for user applications.
Xen does not include any device drivers natively. It just provides a mechanism
by which a guest OS can have direct access to the physical devices.
As a result, the size of the Xen hypervisor is kept rather small. Xen provides a
virtual environment located between the hardware and the OS.
•The core components of a Xen system are the hypervisor, kernel, and
applications.
•However, not all guest OSes are created equal, and one in particular controls the
others. The guest OS, which has control ability, is called Domain 0, and the others are
called Domain U
Depending on implementation technologies, hardware virtualization can be classified into two categories:
full virtualization and host-based virtualization.
Full virtualization does not need to modify the host OS. It relies on binary translation to trap and to virtualize
the execution of certain sensitive, nonvirtualizable instructions.
The guest OSes and their applications consist of noncritical and critical instructions. In a host-based system,
both a host OS and a guest OS are used.
A virtualization software layer is built between the host OS and guest OS.
With full virtualization, noncritical instructions run on the hardware directly while
critical
instructions are discovered and replaced with traps into the VMM to be emulated by
software. Both
the hypervisor and VMM approaches are considered full virtualization.
machine’s performance.
Host-Based
Virtualization
An alternative VM architecture is to install a virtualization layer on top of the host OS.
This host OS is still responsible for managing the hardware. The guest OSes are
installed and run on top of the virtualization layer.
The lower the ring number, the higher the privilege of instruction being executed.
The OS is responsible for managing the hardware and the privileged instructions to
execute at Ring 0, while user-level applications run at Ring 3.
When the x86 processor is virtualized, a virtualization layer is inserted between the
hardware and the OS. According to the x86 ring definition, the virtualization layer
should also be installed at Ring 0.
Different instructions at Ring 0 may cause some problems. In Figure 3.8, we show that
para-virtualization replaces nonvirtualizable instructions with hypercalls that
communicate directly with the hypervisor or VMM.
However, when the guest OS kernel is modified for virtualization, it can no longer run
on the hardware directly.
In this way, the VMM and guest OS run in different modes and all sensitive
instructions of the guest OS and its applications are trapped in the VMM.
For the x86 architecture, Intel and AMD have proprietary technologies for
hardware-assisted virtualization.
51
Hardware Support for Virtualization
Modern operating systems and processors permit multiple processes to
run simultaneously.
If there is no protection mechanism in a processor, all instructions from
different processes will access the hardware directly and cause a system
crash.
Therefore, all processors have at least two modes, user mode and
supervisor mode, to ensure controlled access of critical hardware.
Instructions running in supervisor mode are called privileged instructions.
Other instructions are unprivileged instructions.
One or more guest OS can run on top of the hypervisor. Ex: KVM (Kernel-
based Virtual Machine).
It is a Linux kernel virtualization infrastructure. KVM can support
hardware-assisted virtualization. 52
Example 3.4 discusses Intel’s hardware support
approach.
53
Intel provides a hardware-assist technique to make virtualization easy and improve performance.
Figure 3.10 provides an overview of Intel’s full virtualization techniques. For processor virtualization, Intel
offers the VT-x or VT-i technique.
VT-x (Intel Virtualization Technology for x86-based platforms)
VT-x adds a privileged mode (VMX Root Mode) and some instructions to processors. This enhancement traps
all sensitive instructions in the VMM automatically.
54
Processor Virtualization:
VT-x (Intel Virtualization Technology for x86-based platforms)
Adds VMX Root Mode (privileged mode)
Provides instructions for trapping sensitive instructions in the Virtual Machine
Monitor (VMM)
Memory Virtualization:
56
A CPU architecture is considered virtualizable if it allows both privileged and unprivileged instructions of a virtual
machine (VM) to run in user mode, while the Virtual Machine Monitor (VMM) runs in supervisor mode.
Critical instructions, such as control- and behavior-sensitive instructions, are trapped in the VMM to ensure system
stability.
RISC CPUs: Naturally virtualizable because control- and behavior-sensitive instructions are treated as privileged
instructions, making them easier to manage in a virtualized environment.
x86 CPUs: Not designed for virtualization, as some sensitive instructions (e.g., SGDT, SMSW) are not privileged.
These instructions cannot be trapped by the VMM, complicating virtualization.
System Calls in Virtualization:
In a native UNIX-like system, system calls trigger an interrupt (80h) that passes control to the OS kernel for
processing.
In a paravirtualized system (e.g., Xen), the guest OS triggers the 80h interrupt for a system call, but
simultaneously, the hypervisor triggers the 82h interrupt. This allows the hypervisor to process the system call
and then return control to the guest OS kernel.
Performance Impact: Paravirtualization allows unmodified applications to run in VMs but introduces a small
performance penalty due to the hypervisor’s involvement in handling system calls. 57
Hardware-Assisted CPU Virtualization
Full and paravirtualization methods are complicated because they require changes to the
operating systems or involve complex techniques like binary translation to make the OS work
inside a virtual machine.
x86 processors (used in Intel and AMD CPUs) have different levels of access called privilege
levels or "rings".
Ring 0: This is the most privileged level. The operating system (OS) runs here and has
direct control over hardware.
Ring 1: This is often referred to as the hypervisor ring, but it’s not always used in
hardware-assisted virtualization.
Ring 3: This is where user applications run, and they have the least privilege.
Ring -1: In hardware-assisted virtualization, this is a special level for the hypervisor. The
hypervisor operates beneath Ring 0 to manage the virtual machines. 58
The hypervisor can directly control the hardware without
modifying the guest OS. And Privileged instructions from
the guest OS are automatically trapped by the hypervisor.
This removes the complexity of binary translation and
allows unmodified operating systems to run in virtual
machines.
Technologies like Intel VT-x and AMD-V enable these
features in modern processors.
59
Hardware-Assisted CPU Virtualization
This technique attempts to simplify virtualization because full or paravirtualization is
complicated.
60
x86 Processors: Not originally designed for virtualization, but extensive efforts have been made to
virtualize them.
RISC Comparison: x86 processors are often compared to RISC processors, which are easier to
virtualize. However, x86-based legacy systems are still widely used and can't be easily discarded.
61
Memory Virtualization
In traditional execution environments, modern operating systems
manage virtual memory by mapping virtual addresses to physical machine
memory using page tables.
Memory Management Unit (MMU) and Translation Lookaside Buffer
(TLB) are used to optimize memory performance in x86 CPUs.
Virtual Memory Virtualization in Virtualized Environments
In virtualized environments, the system's physical memory (RAM) is
shared and dynamically allocated to virtual machines (VMs).
This involves a two-stage mapping process:
Guest OS manages the mapping of virtual addresses to guest
physical memory.
Virtual Machine Monitor (VMM) maps guest physical memory to
actual machine memory (host memory).
62
Two-Level Mapping Procedure
Guest OS:
Controls the mapping of virtual memory to guest physical memory within each VM.
Guest OS does not directly access the host machine memory.
VMM:
Responsible for mapping guest physical memory to actual machine memory.
The two-level memory mapping involves:
Virtual memory → Guest physical memory (managed by the guest OS).
Guest physical memory → Host machine memory (managed by the VMM).
MMU Virtualization
MMU virtualization is crucial for efficient virtual memory management.
It should be transparent to the guest OS, meaning the guest OS doesn’t need to be aware
of the underlying virtualization.
The VMM handles all interactions with actual physical memory, ensuring guest OS memory
isolation.
63
64
Performance Considerations
Efficient memory virtualization relies on mechanisms like MMU
virtualization to manage the two-stage mapping with minimal overhead.
The VMM ensures that the virtual machines can access memory without
directly interacting with the host machine’s physical memory.
65
Extended Page Table by Intel for Memory Virtualization
66
Memory virtualization challenge: Shadow page tables were slow.
Intel’s solution: Introduced Extended Page Table (EPT) for hardware-based memory translation.
67
Working of EPT (Memory Translation Flow)
The CPU uses four-level guest page tables for GVA → GPA translation.
Then, it accesses four-level EPT tables to obtain the final HPA.
In the worst case, the CPU performs 20 memory accesses (5 EPT lookups × 4 memory accesses per lookup).
68
69
VIRTUAL CLUSTERS AND RESOURCE MANAGEMENT
A physical cluster is a collection of servers (physical machines) interconnected by a physical network such as a
LAN.
When a traditional VM is initialized, the administrator needs to manually write configuration information or
specify the configuration sources.
When more VMs join a network, an inefficient configuration always causes problems with overloading or
underutilization.
Ex: Amazon’s Elastic Compute Cloud (EC2) is a good example of a web service that provides elastic computing
power in a cloud. EC2 permits customers to create VMs and to manage user accounts over the time of their
use.
Most virtualization platforms, including XenServer and VMware ESX Server, support a bridging mode which allows
all domains to appear on the network as individual hosts.
By using this mode, VMs can communicate with one another freely through the virtual network interface card
and configure the network automatically.
70
Physical versus Virtual Clusters
Virtual clusters are built with VMs installed at distributed servers from one or more physical clusters.
The VMs in a virtual cluster are interconnected logically by a virtual network across several physical
networks.
FIGURE 3.18
A cloud
platform
with four
virtual
clusters over
three
physical
clusters
shaded
differently
71
The virtual cluster nodes can be either physical or virtual machines. Multiple VMs running
with different OSes can be deployed on the same physical node.
1. Physical Machine (Bare Metal Node): A real, physical server that is dedicated to running
workloads. These provide direct hardware access and higher performance but are less
flexible.
2. Virtual Machine (VM Node): A software-based simulation of a physical machine running on
a hypervisor (e.g., VMware, KVM, Hyper-V). Multiple VMs can run on a single physical
machine, providing better resource utilization and flexibility.
• A VM runs with a guest OS, which is often different from the host OS, that manages the
resources in the physical machine, where the VM is implemented.
• The purpose of using VMs is to consolidate multiple functionalities on the same server. This
will greatly enhance server utilization and application flexibility.
72
VMs can be colonized (replicated) in multiple servers for the purpose of promoting
distributed parallelism, fault tolerance, and disaster recovery.
• The size (number of nodes) of a virtual cluster can grow or shrink dynamically, similar
to the way an overlay network varies in size in a peer-to-peer (P2P) network.
• The failure of any physical nodes may disable some VMs installed on the failing
nodes. But the failure of VMs will not pull down the host system.
73
Since system virtualization has been widely used, it is necessary to effectively manage VMs
running on a mass of physical computing nodes (also called virtual clusters) and consequently
build a high-performance virtualized computing environment.
Figure 3.19 shows the concept of a virtual cluster based on application partitioning or
customization.
74
Each VM can be installed on a remote server or replicated on multiple servers
belonging to the same or different physical clusters.
The boundary of a virtual cluster can change as VM nodes are added, removed,
or migrated dynamically over time.
75
Fast Deployment and Effective Scheduling
▪ The system should have the capability of fast deployment. Here, deployment means two things: to
construct and distribute software stacks (OS, libraries, applications) to a physical node inside
clusters as fast as possible, and to quickly switch runtime environments from one user’s virtual cluster
to another user’s virtual cluster.
▪ If one user finishes using his system, the corresponding virtual cluster should shut down or suspend
quickly to save the resources to run other VMs for other users.
76
Live VM Migration Steps and Performance Effects
Live migration refers to the process of moving a running virtual machine (VM), container, or process from
one physical machine to another with minimal downtime.
•Pre-copy Migration
•The memory pages of the VM are iteratively copied to the destination while the source VM is still running.
•During the final iteration, only modified pages are transferred before switching execution to the destination.
•Advantage: Minimal downtime.
•Disadvantage: High network overhead due to repeated memory transfers.
•Post-copy Migration
•The execution is immediately switched to the destination after transferring minimal state (CPU, registers).
•The remaining memory pages are fetched on-demand from the source.
•Advantage: Reduces total migration time and network overhead.
•Disadvantage: Possible performance degradation due to memory fetch delays.
77
There are four ways to manage a virtual cluster.
First, you can use a guest-based manager, by which the cluster manager
resides on a guest system () virtual machines (VMs)..
It treats VMs as physical nodes, managing cluster tasks from within the guest
system.
The host-based manager supervises the guest systems and can restart the
guest system on another physical machine. Incase a guest system fails, the
host-based manager can restart it on another physical machine. A good
example is the VMware High Availability system that can restart a guest
system after failure.
A third way to manage a virtual cluster is to use an independent cluster
manager on both the host and guest systems. This will make infrastructure
management more complex, however. This improves fault tolerance and
flexibility but increases complexity. 78
Various cluster management schemes can be greatly enhanced when VM life
migration is enabled with minimal overhead.
Furthermore, we should ensure that the migration will not disrupt other active
services residing in the same host through resource contention (e.g., CPU,
network bandwidth).
79
A VM can be in one of the following four states. An inactive state is defined by
the virtualization platform, under which the VM is not enabled.
An active state refers to a VM that has been instantiated at the virtualization
platform to perform a real task.
80
FIGURE 3.20 Live migration process of a VM from one host to another.
81
Steps in Live Migration
Stage 0: Pre-Migration
1. The VM is actively running on Host A.
2. An alternate physical host (Host B) may be selected in advance.
3. Block devices are mirrored, and free resources are maintained.
Stage 1: Reservation
1. A container is initialized on the target host (Host B) to prepare for migration.
Stage 2: Iterative Pre-Copy
1. Shadow paging is enabled to track memory changes.
2. The VM’s memory pages are copied in multiple rounds, sending dirty pages
(pages modified during copying) in each iteration.
3. This step minimizes downtime by reducing the amount of data that needs to be
transferred in the final step.
82
•Stage 3: Stop and Copy (Downtime begins – VM is out of service)
•The VM is suspended on Host A.
•An ARP (Address Resolution Protocol) update is generated to redirect
network traffic to Host B.
•The remaining VM state (including final memory pages and processor
state) is synchronized to Host B.
•Stage 4: Commitment
•The VM state on Host A is released, ensuring that the VM will now only
run on Host B.
•Stage 5: Activation (VM resumes on Host B)
•The VM starts on Host B.
•It connects to local devices and resumes normal operation.
83
84
VIRTUALIZATION FOR DATA-CENTER AUTOMATION
85
Server Consolidation in Data Centers
In data centers, a large number of heterogeneous workloads can run on servers at various times.
These heterogeneous workloads can be roughly divided into two categories: chatty workloads and non
interactive workloads.
Chatty workloads may burst at some point and return to a silent state at some other point.
These workloads involve frequent, small interactions between systems, often requiring low latency and
high responsiveness.
A web video service, Database queries.
Non-interactive workloads do not require people’s efforts to make progress after they are submitted.
❑ High-performance computing is a typical example of this. At various stages, the requirements for resources
of these workloads are dramatically different.
❑ Processes large amounts of data in a sequential or parallel manner. Not time-sensitive; can be scheduled to
run during off-peak hours.
❑ Large-scale simulations, Machine learning model training
However, to guarantee that a workload will always be able to cope with all demand levels, the workload is statically
allocated enough resources so that peak demand is satisfied.
86
Therefore, it is common that most servers in data centers are
underutilized. A large amount of hardware, space, power, and
management cost of these servers is wasted.
Agile Provisioning & Deployment – Virtual machine (VM) images can be easily cloned and
reused, speeding up resource deployment.
Cost Reduction – Lowers expenses by reducing the need for new servers, minimizing data
center space, and cutting maintenance, power, and cooling costs.
Improved Availability & Business Continuity – Guest OS failures don’t impact others, and VMs
can be migrated seamlessly across servers without hardware dependency.
88
To automate data-center operations, one must consider resource scheduling, architectural support,
power management, automatic or autonomic resource management, performance of analytical
models, and so on.
In virtualized data centers, an efficient, on-demand, fine-grained scheduler is one of the key factors to
improve resource utilization. Scheduling and reallocations can be done in a wide range of levels in a
set of data centers.
The levels match at least at the VM level, server level, and data-center level. Ideally, scheduling and
resource reallocations should be done at all levels. However, due to the complexity of this, current
techniques only focus on a single level or, at most, two levels.
90
Virtual Storage Management Contd.
Storage Management Challenges in Virtualization
• Complex Storage Operations:
• Guest OS behaves as if using a real disk but cannot access it directly.
• VM Image Flooding:
• Thousands of VMs lead to storage overload in data centers.
91
Solutions for Virtual Storage Management
• Parallax: A distributed storage system designed for virtualization.
• Content Addressable Storage (CAS): Reduces VM image size, supporting
large-scale VM-based systems.
storage VMs.
• Each physical machine has a storage appliance VM, acting as a block
virtualization layer.
• This provides a virtual disk for each VM on the same physical machine.
92
Cloud OS for Virtualized Data Centers.
Data centers must be virtualized to serve as cloud providers.
Table 3.6 summarizes four virtual infrastructure (VI) managers and OSes. These VI managers and OSes are
specially tailored for virtualizing data centers which often own a large number of servers in clusters.
93
Eucalyptus for Virtual Networking of Private Cloud
intended mainly for supporting Infrastructure as a Service (IaaS) clouds.
FIGURE 3.27 Eucalyptus for building private clouds by establishing virtual networks over the VMs 94
linking through Ethernet and the Internet.
intended mainly for supporting Infrastructure as a Service (IaaS) clouds.
The system primarily supports virtual networking and the management of VMs; virtual storage is not supported.
Its purpose is to build private clouds that can interact with end users through Ethernet or the Internet.
The system also supports interaction with other private clouds or public clouds over the Internet.
95