Empirical Study of Virtual Disks Performance With KVM On Das
Empirical Study of Virtual Disks Performance With KVM On Das
August 01-02, 2014, Dr. Virendra Swarup Group of Institutions, Unnao, India
Abstract— There is exponentially increasing demand of data Basically, virtualization is a technique that divides a
generation, its storage, access and communication. To fulfil the physical computer into several isolated machines known as
demands, concept called Cloud Computing came into the picture. virtual machines (VMs). Client or server operating system is
The key concept operating at the basic level of cloud computing required if VMs are created within a hypervisor or a
stack is a Virtualization. Virtual machine (VM) state is virtualization platform. A virtual machine (VM) has been
represented as a virtual disk file (image) that is created on the serving as a crucial component in cloud computing with its rich
hypervisor’s local file system, from where virtual machine is set of features [15]. Multiple virtual machines can run on a host
booted up. Virtual machine requires minimum one disk to boot computer, each possessing its own operating system and
and start its function. Within guest operating system, one can use
applications.
block devices or files as virtual disks with Kernel-based Virtual
Machine (KVM). Till the time, no empirical study has been Virtual disk is created on the hypervisor’s local file system,
performed on different types of virtual disk image formats to from where virtual machine is booted up. Virtual disks can be
quantify their runtime performance. We have studied created via different processes, for example, virtual machine
representative application workload: I/O micro-benchmarks on a creation process, or independent inside storage repository.
local file system i.e. direct-attached storage (DAS) environment in Block devices or files can be used as a local storage in the
conjunction with RAW, Copy-on-Write scheme QCOW2 from guest operating systems, with KVM. Files are commonly
QEMU, Microsoft’s VHD, Virtualbox’s VDI, VMWARE’s
known as virtual disk image files due to following reasons [4]:
VMDK and parallel’s HDD. We have also investigated the impact
of block size on applications runtime performance. This paper • Disk image files are available to the hypervisor as
seeks to provide the detailed runtime performance analysis of files.
different image formats based on different parameters such as
latency, bandwidth, IOs performed per second (IOPS). Today • Similar to block devices, disk image files represent
users have a choice to select virtual disks from the pool of virtual a local mass storage disk.
disk image formats. But, currently it’s a black box selection for
users as no comparison or decision model exist for different A disk image file can be considered as a local hard disk for the
virtual disk image formats. This study is done to provide insights guest operating system. The maximum size of the virtual disk
into the performance aspect of various virtual disk image formats is equal to the size of the disk image file. A disk image file of
and offer guidelines to virtual disk end users in implementing 50 GB can create virtual disk of 50 GB. The virtual disk
and using them respectively. location may be outside the domain of the guest operating
system and the virtual machine. Guest operating system has
Keywords- Virtualization, KVM hypervisor, Virtual machine, limited access and rights. It can access only information related
Virtual disk, fio, Virtual disk image formats. to size of the virtual disk. As shown in Fig. 1along with the
DAS, the storage space for virtual machines’ virtual disks can
I. INTRODUCTION be allocated from multiple sources such as network-attached
The performance of the cloud has become an important due storage (NAS), or storage area network (SAN) each having
to increasing workload [1]. The key concept operating at lower different performance, reliability, and availability at different
level of cloud computing stack is Virtualization. For the prices. DAS is at least several times cheaper than NAS and
majority of high-performing clouds the underpinning is a SAN, but DAS limits the availability and mobility of VMs [2].
virtualized infrastructure. Virtualization has been in data In this paper, we have conducted a performance study using
centres for several years as a successful IT strategy for I/O micro-benchmarks workloads on a local file system
consolidating servers. The main purpose to design a environment. Virtual disks RAW, Copy-on-Write scheme
virtualization is to pool infrastructure resources. Apart from QCOW2 from QEMU, Microsoft’s VHD, Virtualbox’s VDI,
such operation it provides agility and flexibility to the cloud VMWARE’s VMDK and parallel’s HDD are evaluated against
environment. “Virtualization, in computing, is the creation of a bandwidth, latency and IOPS parameters. Also, we have
virtual version of something, such as a hardware platform, studied the impact of block size and file size at the hypervisor
operating system, a storage device or network resources” [6]. level on application runtime performance.
VM VM
Guest Guest
OS OS
Appli- Appli-
cation cation
Disk
VM Image VM Image
Data Data
Figure 2. VM environment
Figure 1. Allocation techniques of virtual disks Subsequent instances that use the same image on that host
can start up faster as the image is locally available. An
The remaining paper is organized as follows. Section 2 alternative method to address this issue is to transfer the image
provides background information and related work. Section 3 data in an on-demand streaming fashion, where the parts of an
describes the methodology of our performance study. Section 4 image are copied as needed from the shared storage system to
presents the results of the study with detailed analysis of one of hypervisor hosts. This scheme is used by cloud operating
the scenario. Section 5 presents concluding remarks and environments such as IBM SmartCloud Provisioning (SCP)
directions for future work. [1].
II. BACKGROUND AND RELATED WORK VM images can be stored in different formats. The most
straightforward option is to use the RAW format, where I/O
Virtualization is a logical disk a computer uses to perform requests to the virtual disk are served via a simple block-to-
I/O operation. Virtualization is used to boot the operating block address mapping. In order to support multiple VMs
system. The VM environment is shown in Fig. 2. A hard disk running on the same base image, copy-on-write techniques
image is interpreted by a Virtual Machine Monitor as a system have been widely used, where a local snapshot is created for
hard disk drive. each VM to store all modified data blocks. The underlying
Infrastructure as a service (IaaS) cloud encapsulates user image files remain unchanged until new images are captured.
applications into virtual machines. The VMs are distributed on There are different copy-on-write schemes, including
a large number of compute nodes to share the physical QEMU QCOW2, Microsoft’s VHD, Virtualbox’s VDI,
infrastructure. Virtualization enables many features such as VMWARE’s VMDK and parallel’s HDD and so forth. In some
consolidation for improving resource efficiency, live migration schemes, such as QCOW2, a separate file is created to store all
for easier maintenance, and so forth. The hard disk drive of a data blocks that have been modified by the provisioned VM.
virtual machine (i.e., virtual disk) is typically emulated with a
regular file on the hypervisor host (i.e., VM image file). I/O There are many efforts on benchmarking application
requests received at virtual disks are translated by the runtime performance of virtual disks on local file system. Our
virtualization driver to regular file I/O requests to the image paper analyzes the impact of different VM image formats on
files. application runtime performance on local file system. The
focus of efforts is on performance study of virtual disks
A typical IaaS cloud, such as Amazon Elastic Compute
Cloud (EC2), has thousands of VM images. In order to create a III. METHODOLOGY
new VM instance in an IaaS cloud, a VM image needs to be To represent a typical virtualization environment, we set up
available at its hypervisor host. As illustrated in Figure 1, one experiment testbed as shown in Fig. 3. We then configure the
straightforward solution is to pre-copy the entire image to the hypervisors to use the various virtual disks described in the
compute nodes before a new VM is started. If an instance uses previous section. Application workloads are executed on this
an image that the target hypervisor does not have, it may take a testbed using these virtual disk configurations
long time to start up that instance. A typical VM image file
contains multiple gigabytes or even tens of gigabytes of data,
Analysis is done based on bandwidth (KB/s), latency (ms) For Sequential Read, Fig.4, Fig. 5, and Fig. 6 explore
and IOPS parameters. We have observed that results may vary bandwidth, latency, and IOPS for different Virtual Hard Disks.
according to the file size and block size. So, we have As shown in Fig. 4 RAW and QCOW2 perform well for
performed a number of experiments to see the variations in different values of file size and block size. VDI works well if
application runtime performance on virtual disks. For detailed block size is less and VHD works well if file size is large.
analysis we have selected below scenario. Latency time is very less for RAW, QCOW2 and for HDD,
VHD it’s very high as shown in Fig. 5. As shown in Fig. 6
File size = 1024MB. RAW performs large number of sequential read I/O operations
per second. QCOW2 performs moderate no. of I/O operations.
Block Size = 4KB to 128KB and HDD performs very less operations. We have observed that
IO Patterns = Sequential Read, Sequential Write, Random IOPS value for VDI increases as file size increases.
Read, Random Write, Random Read Write (RW) Preferable sequence of Virtual Hard Disks for Sequential Read
is RAW, QCOW2, VDI, VMDK, VHD, HDD.
Figure 9. Random Read, IOPS For Sequential Write, Fig.10, Fig. 11, and Fig. 12 explore
bandwidth, latency, and IOPS for different Virtual Hard Disks.
For Random Read, Fig.7, Fig. 8, and Fig. 9 explore As shown in Fig. 10 VDI and VMDK are best suitable for this
bandwidth, latency, and IOPS for different Virtual Hard Disks. operation. HDD is also better. RAW and QCOW2 are not so
As shown in Fig. 7 RAW has highest bandwidth in almost all preferable. Performance of RAW and QCOW2 increases with
conditions. QCOW2 is almost close to RAW in many the large block size. VDI takes very less time (latency time) to
configurations. All other virtual disks are not so suitable for complete operation in different situations. RAW and QCOW2
this operation if bandwidth is the selection criteria. Latency take highest time to complete the operation as shown in Fig.
time is very less for Raw and Qcow2 as shown in Fig. 8. As 11. As shown in Fig. 12, VDI and VMDK perform large no. of
shown in Fig. 9 RAW performs large no. of sequential read I/O sequential write I/O operations per second. VHD and HDD
operations per second. QCOW2 also performs well here. perform moderate no. of operations. RAW and QCOW2 are
Considering all the experiments and results we have performed not preferable at all.
the preferable sequence of Virtual Hard Disks for Random Preferable sequence of Virtual Hard Disks for Sequential Write
Read is RAW, QCOW2, VDI, VMDK, VHD and HDD. is VDI, VMDK, VHD, HDD, RAW and QCOW2.