0% found this document useful (0 votes)

47 views7 pages

Futex VM

This paper investigates the performance impact of the futex system call on virtual machines, particularly focusing on workloads from the PARSEC benchmark suite that heavily utilize thread synchronization. It identifies that the inefficiency of the futex in a virtual execution environment leads to significant performance degradation, and proposes scheduling measures to alleviate this issue. The study highlights the complexities and overheads introduced by virtualization, which exacerbate the performance loss associated with thread synchronization operations.

Uploaded by

faridap.cse.rymec

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views7 pages

Futex VM

Uploaded by

faridap.cse.rymec

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

2013 European Modelling Symposium

Performance Impact of Futex on Virtual Machines

Yu Zhang René Oertel, Wolfgang Rehm

Rechnerarchitektur Rechnerarchitektur
Technische Universität Chemnitz Technische Universität Chemnitz
Chemnitz, Germany Chemnitz, Germany
zhayu@hrz.tu-chemnitz.de rene.oertel,rehm@cs.tu-chemnitz.de

Abstract—As discovered in our previous benchmark works, benefited from improving the virtualization layer regarding
a small number of workloads in PARSEC benchmark suite to the futex system call, effort is deserved to seek the root
suffer from heavy performance loss in a virtual execution cause and feasible measure for a better performance.
environment, of which the major loss exhibits fairly a strong
connection with the thread synchronization operations. This By nature the futex system call is simple in that only
paper examines one workload of this kind that makes heavy manipulation to a variable known as a lock in user space,
use of thread synchronization operations, and shows the per- and a wait-queue in the kernel space are involved. However,
formance impact by futex system call for applications in a considering that in a multi-threaded execution environment,
virtual execution environment. By measuring the occurrence competition among multiple threads for critical hardware or
of the subroutines running at particular time costs, the most
timing-decisive parts of the application in a virtual execution software resources may occur, current state of the execution
environment are located. Proper thread scheduling measure for environment can be highly unpredictable. It’s fairly hard to
a virtual execution environment are also proposed. estimate how long a given lock will be held by one thread,
Keywords- performance; futex; virtual machine; gang- how many threads are waiting there for a condition variable,
scheduling; and how fierce competition may be at a given moment, etc.
Even on a physical machine the futex system call can
I. I NTRODUCTION also be expensive in a certain cases, not mention that the
Virtual machine is implemented by pretending itself to virtual execution environment is much more complex in its
be the real hardware underlying. Applications may totally implementation and likely to incur more overhead.
unaware that they are in a virtual execution environment. Unlike in a native execution environment, program ex-
As the virtualization technologies is growing more and more ecution in a virtual one involves cooperation between the
mature, applications behave themselves increasingly as how guest kernel and the VMM. Overheads are almost inevitable
they do in a bare-metal environment. With the modern with the control flow being constantly transfered from one of
hardware supports instructions in recent virtual machines these software layers to another. It is therefore understand-
may be directly executed on the physical hardware under able that operations in the virtual execution environment suf-
supervision of the virtual machine monitor(VMM), most of fers more or less performance overhead compared with the
the instructions and operations execute at nearly a native native one, and for a certain kind of them cases may be even
speed, with exceptions of only a few operations, still suf- worse. As different operations may require different kind and
fering from heavy performance loss, of which the thread amount of resources, performance overhead varies from one
synchronization operation, namely, the futex system call kind to another. To investigate reasons for the performance
comes to this point. gap, differences between the virtual and physical hardwares
The futex system call tends to be inefficient on a virtual should be considered.
machine thus slows down applications that make heavy use We examined the thread synchronization, namely the
of it - that’s what we discovered in our previous bench- futex system call in a particular kind of workload from
mark. Meanwhile workloads exploiting the power of thread- the PARSEC benchmark suite, and proposed the scheduling
level parallelism are not uncommon in high performance measure that may alleviate the performance loss. The rest
computing, which may be plagued with heavy performance of this paper is organized as follows: In Section 2 the
degradation in a virtual execution environment because of related work is explored. Then a brief introduction to POSIX
the futex system call. Although a considerable number of thread and futex-related operations is presented next in
measures have been applied to tackle various performance Section 3. In Section 4 we make a detailed study on the
overheads because of virtualization, effective solution to performance impact delivered by the futex system call
reduce the performance overhead of the futex system call from a number of aspects, including changes in system
in a virtual execution environment has not been found yet. resource utilization and run time behavior, host and guest
Since a wide range of HPC applications could be potentially reactions under the pressure of workload, with and without

978-1-4799-2578-0/13 $31.00 © 2013 IEEE 619

657
DOI 10.1109/EMS.2013.110
250 60000
CPU BLOCK IN
MEM BLOCK OUT
INTERRUPT
CONTEXT SWITCH
50000
200

Percentage of Hardware Utilization

40000

150

30000

100

20000

50
10000

0 0
00:00 00:15 00:30 00:45 01:00 01:15 01:30 01:45 02:00 02:15 00:00 00:15 00:30 00:45 01:00 01:15 01:30 01:45 02:00 02:15

(a) Resource utilization of dedup with futex (b) Behaviour of dedup with futex

110 140000
CPU BLOCK IN
MEM BLOCK OUT
100 INTERRUPT
CONTEXT SWITCH
120000
90

80
Percentage of Hardware Utilization

100000

80000
60

50
60000

30 40000

20
20000
10

0 0
00:00 00:10 00:20 00:30 00:40 00:50 01:00 01:10 00:00 00:10 00:20 00:30 00:40 00:50 01:00 01:10

Figure 1: dedup executions with and without futex on KVM guest machine

futex. A discussion about the possible reasons is made in the Quest operating system, based on the concept of
in Section 5 from the thread scheduling model point of a VCPU. Quest schedules VCPUs on physical processors,
view, corresponding measures are also proposed. Finally in while software threads are scheduled on VCPUs. U. Drepper
Section 6 we conclude and anticipate the work in future. [5] made a detailed analysis of the performance and imple-
mentation of the system call futex on the normal execution
II. R ELATED W ORKS environment, based on which the thread synchronization
Main research on virtual thread scheduling and synchro- was described. Early before, similar effort was made by H.
nization are surveyed below. D. Arteaga etc. [1] showed Franke etc. [6] based on the early Linux kernel. Connections
that in a multi-core, multi-threading micro processor, threads between the futex and POSIX thread synchronization
in a virtual machine actually undergo scheduling of two mechanism was also mentioned in it.
granularity-levels, first by VMM, which acts as a process Among numerous efforts, the most closely related contri-
scheduler, then by thread scheduler on the hardware. The bution is made by N. A. Dadhania [7], who implemented
book [2] addressed the synchronization problem among the an experimental Gang Scheduling mechanism in the Linux
virtual processor on the multi-core virtual machine platform. kernel means to address the limitations of KVM on Power,
To reduce the latency in communication and synchronization especially the lock-holder-preemption problem by reducing
among threads, “Gang Scheduling” is suggested to synchro- the lock-acquisition time. This provides us an important hint
nize the scheduling of virtual processors that are mapped in dealing with the overhead experienced in our practice.
on different physical processors. In [3], performance loss
on four kinds of virtual machines was examined. With a III. B RIEF I NTRODUCATION TO POSIX THREAD AND
F U T E X - RELATED OPERATIONS
detailed analysis of the run time behavior and the frequency
of system calls, the futex and storage-I/O access system POSIX thread library is a standard C threading API on
call are identified as the key factors which incur the major Linux. For POSIX thread specification requires sharing of
performance overhead in virtual execution environments. almost all resources, POSIX threads’ mutexes are imple-
M. Danish etc. [4] describes the scheduling infrastructure mented based on futex-related operations and variables on

658
620
300 50000
CPU BLOCK IN
MEM BLOCK OUT
45000 INTERRUPT
CONTEXT SWITCH
250
40000

Percentage of Hardware Utilization

35000
200

30000

150 25000

20000

100
15000

10000
50

5000

0 0
00:00 00:10 00:20 00:30 00:40 00:50 01:00 01:10 00:05 00:10 00:15 00:20 00:25 00:30 00:35 00:40 00:45 00:50 00:55 01:00

(a) Resource utilization with futex (b) Behaviour with futex

110 40000
CPU BLOCK IN
MEM BLOCK OUT
100 INTERRUPT
35000 CONTEXT SWITCH

30000
80
Percentage of Hardware Utilization

70 25000

60
20000
50

40 15000

30
10000

20
5000
10

0 0
00:00 00:05 00:10 00:15 00:20 00:25 00:30 00:35 00:40 00:45 00:50 00:55 01:00 00:00 00:10 00:20 00:30 00:40 00:50 01:00

(c) Resource utilization without futex (d) Behaviour without futex

Figure 2: dedup executions with and without futex on physical machine

recent versions of Linux, and a part of the POSIX standard IV. B ENCHMARK O BSERVATION
library. [8].
To demonstrate the impact of futex system call in
different cases, we compiled the dedup without pthread-
related operations, thus closed the door to futex, naturally
no pipeline available for data processing any more. We
The concept - futex (short for “fast userspace mutex”)
compared the performance and some run-time behaviors
is a Linux construct that can be used to implement basic
in such aspects: 1. Performance and run time behavior of
locking, or as a building block for higher-level locking
workloads with and without futex system call. 2. Host
abstractions such as semaphores and POSIX mutexes or
and guest reactions upon futex invocation of various
condition variables [9]. A key component for the imple-
cases. 3. Time spent in futex in both virtual and native
mentation of futex mechanism is a wait queue attached
cases. The general configuration of our test machine is
to an aligned variable in user-space. The aligned variable
listed below in Table I
serves as the control variable as mentioned formerly, when
multiple of threads or processes are going to operate on it,
they compare the value of this variable against a former
A. Comparison between workloads with and without futex
specified memory slot to decide whether trapping into the
kernel is still needed for a further system call such as wake As depicted by Figure 1 , great contrast with and without
up waiting threads or processes, remove the current waiting the futex system call can be witnessed in both performance
process from the wait queue etc., depending on the result and run-time behaviors. While memory utilization remained
of comparison. However, a properly programmed futex nearly the same, workload is now only on a single core and
- based lock should trap as rarely as possible except that distributed more evenly the whole time it runs. Interrupt and
the lock is contended. Since most operations do not require context switch have also been significantly reduced. As the
arbitration between processes, the futex is efficient for its total run time diminished by almost a half, performance loss
implementation. is also minimized to a great deal. Similar cases happen on

659
621
Table I: Host machine conﬁguration 50.00
native
virtual

Host Configuration 45.00

Time spent in futex during execution of dedup (second)

Processor Dual-Core AMD Opteron(tm) 2218 40.00

CPU MHz 2600.000

35.00
Cache size 1024 KB
CPU cores 2 30.00

Bogomips 5226.48
25.00
TLB size 1024 4K pages
Memory DDR2 8192 MB
20.00
Hard disk 250 GB
Operating System Debian Squeeze 6.0.2 Linux 15.00

10.00

5.00
T=1 T=2 T=4 T=8
physical machine, suggested in Figure 2, regarding to pro-
Figure 4: Time spent in futex in both virtual and native cases
cessor core and memory utilization, a minimized interrupt
and context switch. However, large cut off in time cost is
not observed as it does on virtual machines.
highly unpredictable, not obviously related to the number of
B. Host and guest reactions upon heavy invocation of futex threads, as a result of the complexity added by virtualization.
For the KVM guest is a full virtualized execution envi-
V. A NALYSIS OF THE THE P ERFORMANCE OVERHEAD
ronment thus isolated from the host, applications on it are
expected to behave just as they do on the host. Without help A. Thread synchronization part of algorithm in dedup
of specific counter or tracer it’s difficult to observe what is As futex serves as the building block of the mutual
happening exactly under the hood as futex is called in the exclusive operations in the kernel, on which most of the
guest. As a remedy we compare the reactions of guest and programming libraries depend, the POSIX NPTL package
host under pressure of workloads with and without futex. is of no exception. futex system call was thus introduced
Figure 3 illustrates this in the aspects as before by collecting to dedup. The central part of this workload as shown in
statistics of qemu - the userspace part of the KVM guest Figure 5 generates a five-stage pipeline, from which a model
on the host. Just as in a virtual case, with an increase of is summarized in Figure 6.
thread number, contention over the processor among threads For threads of the neighboring stages may operate on the
becomes more and more fierce. same data structures, lock and synchronization should be
Figure 3(a), 3(c), 3(e) and 3(g) show the variation of applied to guarantee the validity of the data. How efficient
processor and memory utilization observed on the host as these mutual exclusive operations are may have a instant
dedup was executed in different cases on the guest. The impact on the overall performance and throughput of the
physical processor cores get more and more overloaded by pipeline. However, because the resolution of the timer in the
threads in qemu, most likely by the threads that were run- kernel is not so high enough to measure the time consumed
ning as the virtual processors of the guest. Contention over by the individual futex operation, other approaches should
physical processor core also gets worse and worse as dedup be resorted to instead.
gets started with more threads. Furthermore, as Figure 3(a) We measure in our practice the time consumed by pro-
suggested, part of the contention is contributed by qemu gram sections on a higher granularity-level, which tend to
itself, for no futex-related operations are involved any include multiple of them. Execution time for each subroutine
more with dedup from the guest side. is measured in both virtual and native cases, except those
In addition measurements of storage traffic flow, interrupt subroutines that are only called once in the execution and
and context switch are presented in Figure 3(b), 3(d), 3(f) thus are of little significance for our purpose, as shown in
and 3(h). These metrics are generally considered keeping Table II and III.
pace with the occurrence of contention among threads the Based on data in the two Tables, subroutines that be-
workload forked. have differently in the two kinds of execution environment
Finally the time spent by futex in both virtual and are identified, of which the subroutine sub_Compress and
native cases are illustrated by Figure 4. futex in virtual sub_ChunkProcess are the most unusual ones. They con-
cases may consume as multiple of time as in native cases. tain the basic thread locking and unlocking operations on
Beside, the time cost of futex in the native case rises the mutexes and locks in many instances of queue and
steadily, almost proportional to the number of threads. In hash_table type, and serve as the major operations of
contrast, time cost of futex in a virtual machine becomes the pipeline stages Compress and ChunkProcess.

660
622
300% 45000
CPU BLOCK IN
MEM BLOCK OUT
INTERRUPT
40000 CONTEXT SWITCH
250%
35000

200% 30000

25000
150%
20000

100% 15000

10000
50%
5000

0% 0
00:00 00:05 00:10 00:15 00:20 00:25 00:30 00:35 00:40 00:45 00:50 00:00 00:10 00:20 00:30 00:40 00:50 01:00

(a) Host resource utilization without futex (b) Host behavior without futex
300 60000
CPU BLOCK IN
MEM BLOCK OUT
INTERRUPT
CONTEXT SWITCH
250 50000

200 40000

150 30000

100 20000

50 10000

0 0
00:00 00:10 00:20 00:30 00:40 00:50 01:00 01:10 01:20 01:30 01:40 00:00 00:15 00:30 00:45 01:00 01:15 01:30 01:45 02:00

350% 35000
CPU BLOCK IN
MEM BLOCK OUT
INTERRUPT
CONTEXT SWITCH
300% 30000

250% 25000

200% 20000

150% 15000

100% 10000

50% 5000

0% 0
00:00 00:05 00:10 00:15 00:20 00:25 00:30 00:35 00:40 00:45 00:50 00:00 00:10 00:20 00:30 00:40 00:50 01:00

(e) Host resource utilization in 2-thread case (f) Host behaviour in 2-thread case

400% 35000
CPU BLOCK IN
MEM BLOCK OUT
INTERRUPT
350% CONTEXT SWITCH
30000

300%
25000

250%
20000

200%

15000
150%

10000
100%

5000
50%

0% 0
00:00 00:05 00:10 00:15 00:20 00:25 00:30 00:35 00:40 00:45 00:50 00:00 00:05 00:10 00:15 00:20 00:25 00:30 00:35 00:40 00:45 00:50 00:55

(g) Host resource utilization in 4-thread case (h) Host behaviour in 4-thread case

Figure 3: Host reaction upon dedup with and without futex execution in KVM guest

661
623
D a t a P r o c e s s (& d a t a p r o c e s s a r g s ) ;
p t h r e a d c r e a t e (& t h r e a d s p r o c e s s , NULL, D a t a P r o c e s s , &d a t a p r o c e s s a r g s ) ;
f o r ( i = 0 ; i < c o n f−>n t h r e a d s ; i ++) {
p t h r e a d c r e a t e (& t h r e a d s a n c h o r [ i ] , NULL, F i n d A l l A n c h o r s , &a n c h o r t h r e a d a r g s [ i ] ) ;
}
......
f o r ( i = 0 ; i < c o n f−>n t h r e a d s ; i ++) {
p t h r e a d c r e a t e (& t h r e a d s c h u n k [ i ] , NULL, C h u n k P r o c e s s , &c h u n k t h r e a d a r g s [ i ] ) ;
}
......
f o r ( i = 0 ; i < c o n f−>n t h r e a d s ; i ++) {
p t h r e a d c r e a t e (& t h r e a d s c o m p r e s s [ i ] , NULL, Compress , &c o m p r e s s t h r e a d a r g s [ i ] ) ;
}
i f ( ! c o n f−>p r e l o a d i n g ) {
p t h r e a d c r e a t e (& t h r e a d s s e n d , NULL, SendBlock , &s e n d b l o c k a r g s ) ;
}

Figure 5: Thread forking for pipelines

|−>F i n d A l l A n c h o r −>C h u n k P r o c e s s−>Compress −>|

Figure 6: Thread and pipeline model

Table II: Occurrence of subroutines in dedup in native execution environment

Threads Subroutine 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Total
sub Compress 165563 2791 0 0 0 0 0 168354
sub ChunkProcess 367888 2062 0 0 0 0 0 369950
t=1
FindAllAnchor 1 0 0 0 0 0 0 1
Sum 533452 4853 0 0 0 0 0 538305
sub Compress 165239 3115 0 0 0 0 0 168354
sub ChunkProcess 367482 2465 2 1 0 0 0 369950
t=2
FindAllAnchor 2 0 0 0 0 0 0 2
Sum 532723 5580 14 0 0 0 0 538306
sub Compress 164968 3383 3 0 0 0 0 168354
sub ChunkProcess 367133 2808 7 2 0 0 0 369950
t=4
FindAllAnchor 3 1 0 0 0 0 0 4
Sum 532104 6192 10 2 0 0 0 538308
sub Compress 164650 3667 23 1 9 4 0 168354
sub ChunkProcess 366342 3547 51 4 4 2 0 369950
t=8
FindAllAnchor 8 0 0 0 0 0 0 8
Sum 526762 7214 74 5 13 6 0 538312

Table III: Occurrence of subroutines in dedup in virtual execution environment

Threads Subroutine 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Total
sub Compress 164933 3418 3 0 0 0 0 168354
sub ChunkProcess 365556 4380 14 0 0 0 0 369950
t=1
FindAllAnchor 1 0 0 0 0 0 0 1
Sum 530490 7798 17 0 0 0 0 538305
sub Compress 163019 5306 29 0 0 0 0 168354
sub ChunkProcess 365438 4497 15 0 0 0 0 369950
t=2
FindAllAnchor 0 2 0 0 0 0 0 2
Sum 528457 9805 44 0 0 0 0 538306
sub Compress 161158 7084 110 2 0 0 0 168354
sub ChunkProcess 365756 4185 9 0 0 0 0 369950
t=4
FindAllAnchor 4 0 0 0 0 0 0 4
Sum 526918 11269 119 0 0 0 0 538308
sub Compress 159238 8816 279 9 8 2 1 168353
sub ChunkProcess 365443 4478 27 1 0 1 0 369950
t=8
FindAllAnchor 7 1 0 0 0 0 0 8
Sum 524688 13295 306 10 8 3 1 538311

662
624
In these cases the time they may cost varies from 0.00 s T5

to 0.06 s at most, with 0.01 s as an increment. The more

threads are involved in the pipeline-stages, the more likely Threads on guest
Guest OS Scheduler on VCPU 2
the two subroutines occur at higher time costs. In cases with T6

the same number of threads, operation at the same time cost

T2
T1
occur always much more frequently in virtual than in native
Host OS Scheduler on a physical CPU
execution environment.
T2
Overhead of numerous small fractions are then added up Threads on host T4
Threads on guest
to the total performance overhead, which soon becomes non-

T3
T3
negligible as compared to that of the native case. Threads on guest
T0
B. Implementation of thread synchronization in virtual case T1 Guest OS Scheduler on VCPU 0 Guest OS Scheduler on VCPU 1
Figure 7: A simplified thread scheduling model on a virtual machine
Being aware of the inefficient program sections on virtual
machines, it’s necessary to seek the possible root reasons
from the implementation of thread synchronization on a vir-
tual machine. Since the virtual processors are implemented time of synchronization. In future we will implemente such
by kernel threads, which can be scheduled and executed on a scheduling strategy on a kind of virtual machine like KVM
physical processor core as normal, the nature differs greatly to proved the assumption here proposed.
from that of the physical counterpart, as soon as thread
synchronization on virtual processor cores are concerned. ACKNOWLEDGMENT
Figure 7 depicted a much simplified scheduling model of The PARSEC group provides a benchmark suite with
the threads on a virtual machine, thread in a virtual execution various representative workloads from the real world, based
environment undergoes scheduling of different levels on a on whose efforts is our work possible. Besides, invaluable
guest, first scheduled by the guest OS on a virtual processor comments and remarks from our anonymous reviewers also
core, then executed as the virtual processor is scheduled by helped us to improved the work. We’d like to extend our
the VMM or the host OS. Userspace threads on a guest are sincere gratitude to them.
mapped by the guest OS kernel on the virtual processors,
R EFERENCES
which themselves may have already been mapped by the
host OS kernel or VMM to physical processor cores. In [1] D. Arteaga, M. Zhao, C. Liu, P. Thanarungroj and L. Weng.
cases that two or more different userspace threads, e.g. T1, Cooperative Virtual Machine Scheduling on Multi-core Multi-
threading Systems — A Feasibility Study. Florida International
T3 and T5 are mapped on different virtual processors, that University, Miami, FL, Dec. 2010.
are most likely to be scheduled at different moments on [2] Intel Corporation, Parallel Processing Research Institute of
the physical processors. As T1 is at this moment active Fudan University, 系统虚拟化：原理与实现. Tsinghua
on VCPU 0, T3 on VCPU 1 and T5 on VCPU 2 are still University Press, Beijing, China, 2009, 127-128.
standing in the waiting queues for their turns on the physical [3] Y. Zhang, R. Oertel and W. Rehm. Performance Loss on Virtual
Machines. Studentensymposium Informatik Chemnitz 2012,
processors. In such a case, T1 has to wait for a longer time
Tagungsband zum 1. Studentensymposium Chemnitz vom 4.
after the corresponding VCPU on which the desired thread Jul. 2012
resides regain its time-slice, the lock then may get a chance [4] M. Danish, Y. Li and R. West, Virtual-CPU Scheduling in
to be released. Unlike the native case, with multi level of the Quest Operating System. Computer Science Department,
scheduling, threads may feel restricted and can be harder to Boston University, Boston, MA 02215, USA, Jan. 2011.
talk with each other. [5] U. Drepper, Futexes Are Tricky. Dec. 2005.
[6] H. Franke and R. Russell, Fuss, Futexes and Furwocks: Fast
VI. C ONCLUSION AND O UTLOOK Userlevel Locking in Linux. Proceedings of the Ottawa Linux
Symposium 2002, Ottawa, Ontario Canada, Jun. 2002.
We have shown the performance impact of the futex [7] Gang scheduling in CFS, https://lkml.org/lkml/2011/12/19/44.
in a virtual execution environment based on the benchmark [8] U. Drepper and I. Molnar, The Native POSIX Thread Library
results with a particular workload from PARSEC benchmark for Linux. Red Hat, Inc, Feb. 2005.
suite. From the difference between basic implementations [9] Futex, http://en.wikipedia.org/wiki/Futex.
of thread synchronization in both virtual and native cases,
we infer that the asynchronous VCPU scheduling is very
likely to be the main reason for great performance loss of
futex in the virtual execution environment. As a remedy,
a possible solution is to schedule all the concerned VCPUs
synchronously, to guarantee all threads take part in commu-
nication on the virtual processor are running exactly at the

663
625

Futex PDF
No ratings yet
Futex PDF
12 pages
CC Unit 2
No ratings yet
CC Unit 2
11 pages
WWW - Oeclib.in: Futex
No ratings yet
WWW - Oeclib.in: Futex
13 pages
Unit-1 Part-2
No ratings yet
Unit-1 Part-2
13 pages
CC - Module 2 Notes
No ratings yet
CC - Module 2 Notes
61 pages
Unit Iii CC LM 2023
No ratings yet
Unit Iii CC LM 2023
7 pages
Simulation of A Virtual CPU Executing Mathematical Functions in Python
No ratings yet
Simulation of A Virtual CPU Executing Mathematical Functions in Python
38 pages
Virt
No ratings yet
Virt
8 pages
Module 2 Notes
No ratings yet
Module 2 Notes
40 pages
23PCA11 - Unit-3 - Cloud Computing-20-01-2024
No ratings yet
23PCA11 - Unit-3 - Cloud Computing-20-01-2024
51 pages
Chap 4
No ratings yet
Chap 4
91 pages
Virtualization in Cloud Computing
No ratings yet
Virtualization in Cloud Computing
10 pages
Module - 02 Cc-Notes
No ratings yet
Module - 02 Cc-Notes
31 pages
CC Unit 2
No ratings yet
CC Unit 2
21 pages
CC Unit-2
No ratings yet
CC Unit-2
21 pages
3 Virtualization Implementation Levels
No ratings yet
3 Virtualization Implementation Levels
22 pages
Virtualization for IT Professionals
No ratings yet
Virtualization for IT Professionals
18 pages
Osnoise
No ratings yet
Osnoise
12 pages
2013 The Design and Implementation of Virtual Machine System in Embedded SoftPLC System
No ratings yet
2013 The Design and Implementation of Virtual Machine System in Embedded SoftPLC System
4 pages
Lecture 5 - OS LEVEL VIRT - INTRO
No ratings yet
Lecture 5 - OS LEVEL VIRT - INTRO
41 pages
RK23CAA62 - Term Paper - CSE316 - Nandhana C S
No ratings yet
RK23CAA62 - Term Paper - CSE316 - Nandhana C S
12 pages
CloudComputing UNIT 2
No ratings yet
CloudComputing UNIT 2
36 pages
Lec 04
No ratings yet
Lec 04
14 pages
Unit 3 (KD)
No ratings yet
Unit 3 (KD)
51 pages
CPU Virtualization
No ratings yet
CPU Virtualization
6 pages
20250325123125-CC PPT Module 2
No ratings yet
20250325123125-CC PPT Module 2
117 pages
Thread
No ratings yet
Thread
10 pages
CH 3 Virtualization
No ratings yet
CH 3 Virtualization
28 pages
2017edan85l4 1
No ratings yet
2017edan85l4 1
33 pages
UNIT III-Notes - CC
No ratings yet
UNIT III-Notes - CC
17 pages
Ijcatr06041001 PDF
No ratings yet
Ijcatr06041001 PDF
3 pages
CC Unit 2
No ratings yet
CC Unit 2
18 pages
#21 (System Virtualization Overview)
No ratings yet
#21 (System Virtualization Overview)
33 pages
Distributed Systems & Threads
No ratings yet
Distributed Systems & Threads
156 pages
Adding SMP Support To A User-Level VMM: Diplomarbeit
No ratings yet
Adding SMP Support To A User-Level VMM: Diplomarbeit
67 pages
Cse5002 - Operating Systems and Virtualization - Eth - 1.0 - 0 - Cse5002 29 11 18
No ratings yet
Cse5002 - Operating Systems and Virtualization - Eth - 1.0 - 0 - Cse5002 29 11 18
6 pages
Bonus1 - Virtualization
No ratings yet
Bonus1 - Virtualization
18 pages
Processes: Hongfei Yan School of EECS, Peking University 3/15/2010
No ratings yet
Processes: Hongfei Yan School of EECS, Peking University 3/15/2010
69 pages
Thread Fuser
No ratings yet
Thread Fuser
14 pages
Module 2 CC Simplified
No ratings yet
Module 2 CC Simplified
8 pages
CC New Virtulization
No ratings yet
CC New Virtulization
48 pages
OS Threads & Scheduling Guide
No ratings yet
OS Threads & Scheduling Guide
38 pages
CPU Execution Time Analysis Based On RISC-V ISA Simulators A Survey
No ratings yet
CPU Execution Time Analysis Based On RISC-V ISA Simulators A Survey
7 pages
Virtual Machines & Cluster Virtualization
No ratings yet
Virtual Machines & Cluster Virtualization
20 pages
Chen Eembc Profiling 2013
No ratings yet
Chen Eembc Profiling 2013
13 pages
Unit-I Part-II Virtualization
No ratings yet
Unit-I Part-II Virtualization
70 pages
Implementation Virtualization
No ratings yet
Implementation Virtualization
5 pages
CC Module 2
No ratings yet
CC Module 2
42 pages
Virtualization and Cloud Computing: Norman Wilde Thomas Huber
No ratings yet
Virtualization and Cloud Computing: Norman Wilde Thomas Huber
69 pages
Cloud Resource Virtualization
No ratings yet
Cloud Resource Virtualization
53 pages
VIRTUALIZATION
No ratings yet
VIRTUALIZATION
14 pages
CC Unit 2 1
No ratings yet
CC Unit 2 1
22 pages
Cloud-Computing - Module 2
No ratings yet
Cloud-Computing - Module 2
109 pages
EE6304 Lecture12 TLP
No ratings yet
EE6304 Lecture12 TLP
70 pages
Unit 2 (Virtualization)
No ratings yet
Unit 2 (Virtualization)
50 pages
Synthea Unit 1-4
No ratings yet
Synthea Unit 1-4
19 pages
Virtual Machine - Software Documentation
No ratings yet
Virtual Machine - Software Documentation
15 pages
Gem5 Practice
No ratings yet
Gem5 Practice
3 pages
OAT BBA 1 BCom 1 Notes (1) Trading
No ratings yet
OAT BBA 1 BCom 1 Notes (1) Trading
32 pages
Arsitektur IoT
No ratings yet
Arsitektur IoT
32 pages
Leveraging HET TMS570
No ratings yet
Leveraging HET TMS570
6 pages
Chapter-3 Instruction Set and Programming of 8085 Part1 - 1-Introduction
No ratings yet
Chapter-3 Instruction Set and Programming of 8085 Part1 - 1-Introduction
82 pages
PCA Set1
No ratings yet
PCA Set1
67 pages
Read Led
No ratings yet
Read Led
3 pages
Computer Organization
No ratings yet
Computer Organization
10 pages
Microprocessor Intel x86 Evolution and Main Features
No ratings yet
Microprocessor Intel x86 Evolution and Main Features
3 pages
Microprocessor and Microcontroller Lab Manulal
No ratings yet
Microprocessor and Microcontroller Lab Manulal
79 pages
Arm Placement
No ratings yet
Arm Placement
2 pages
Datasheet 4
No ratings yet
Datasheet 4
32 pages
Coa Mid 2qb and Obj
No ratings yet
Coa Mid 2qb and Obj
29 pages
Parallel Computing Overview
No ratings yet
Parallel Computing Overview
14 pages
RISC Architecture Overview
No ratings yet
RISC Architecture Overview
11 pages
A-MPSoC PS and PL Ethernet Example Projects-110724-034447
No ratings yet
A-MPSoC PS and PL Ethernet Example Projects-110724-034447
10 pages
COMS1015A BCO - Course - Outline
No ratings yet
COMS1015A BCO - Course - Outline
8 pages
Computer Fundamentals Week 3
No ratings yet
Computer Fundamentals Week 3
19 pages
Memory Systems
No ratings yet
Memory Systems
32 pages
Programmable DSP Lecture1
75% (4)
Programmable DSP Lecture1
19 pages
ETE 22636 1000 MCQ Questions With Answers
No ratings yet
ETE 22636 1000 MCQ Questions With Answers
46 pages
Amd Cdna 3 White Paper
No ratings yet
Amd Cdna 3 White Paper
27 pages
Mvi69 MCM User Manual
No ratings yet
Mvi69 MCM User Manual
135 pages
Digital Computers & Binary Basics
No ratings yet
Digital Computers & Binary Basics
88 pages
Embedded 2
No ratings yet
Embedded 2
203 pages
Coa 3 Practicals
No ratings yet
Coa 3 Practicals
9 pages
P.A.T. / L.M.I. DS 160 Service Manual: Americas Training
No ratings yet
P.A.T. / L.M.I. DS 160 Service Manual: Americas Training
48 pages
PLC
No ratings yet
PLC
39 pages
High Performance, High Frequency, Bare Metal, Affordable Cloud Computing
No ratings yet
High Performance, High Frequency, Bare Metal, Affordable Cloud Computing
1 page
MP Report v2
No ratings yet
MP Report v2
10 pages

Futex VM

Uploaded by

Futex VM

Uploaded by

2013 European Modelling Symposium

Performance Impact of Futex on Virtual Machines

Yu Zhang René Oertel, Wolfgang Rehm

978-1-4799-2578-0/13 $31.00 © 2013 IEEE 619

Percentage of Hardware Utilization

Percentage of Hardware Utilization

(a) Resource utilization with futex (b) Behaviour with futex

(c) Resource utilization without futex (d) Behaviour without futex

Figure 2: dedup executions with and without futex on physical machine

Host Configuration 45.00

Time spent in futex during execution of dedup (second)

CPU MHz 2600.000

Figure 5: Thread forking for pipelines

|−>F i n d A l l A n c h o r −>C h u n k P r o c e s s−>Compress −>|

Figure 6: Thread and pipeline model

Table II: Occurrence of subroutines in dedup in native execution environment

Table III: Occurrence of subroutines in dedup in virtual execution environment

to 0.06 s at most, with 0.01 s as an increment. The more

the same number of threads, operation at the same time cost

You might also like