0% found this document useful (0 votes)
80 views13 pages

Tci Acm PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views13 pages

Tci Acm PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Bootstrapping and Maintaining Trust in the Cloud ∗

Nabil Schear Patrick T. Cable II Thomas M. Moyer


MIT Lincoln Laboratory Threat Stack, Inc. MIT Lincoln Laboratory
nabil@ll.mit.edu pat@threatstack.com tmoyer@ll.mit.edu
Bryan Richard Robert Rudd
MIT Lincoln Laboratory MIT Lincoln Laboratory
bryan.richard@ll.mit.edu robert.rudd@ll.mit.edu

ABSTRACT 1. INTRODUCTION
Today’s infrastructure as a service (IaaS) cloud environ- The proliferation and popularity of infrastructure-as-a-
ments rely upon full trust in the provider to secure appli- service (IaaS) cloud computing services such as Amazon
cations and data. Cloud providers do not offer the ability Web Services and Google Compute Engine means more cloud
to create hardware-rooted cryptographic identities for IaaS tenants are hosting sensitive, private, and business critical
cloud resources or sufficient information to verify the in- data and applications in the cloud. Unfortunately, IaaS
tegrity of systems. Trusted computing protocols and hard- cloud service providers do not currently furnish the build-
ware like the TPM have long promised a solution to this ing blocks necessary to establish a trusted environment for
problem. However, these technologies have not seen broad hosting these sensitive resources. Tenants have limited abil-
adoption because of their complexity of implementation, low ity to verify the underlying platform when they deploy to
performance, and lack of compatibility with virtualized en- the cloud and to ensure that the platform remains in a good
vironments. In this paper we introduce keylime, a scal- state for the duration of their computation. Additionally,
able trusted cloud key management system. keylime pro- current practices restrict tenants’ ability to establish unique,
vides an end-to-end solution for both bootstrapping hard- unforgeable identities for individual nodes that are tied to a
ware rooted cryptographic identities for IaaS nodes and for hardware root of trust. Often, identity is based solely on a
system integrity monitoring of those nodes via periodic at- software-based cryptographic solution or unverifiable trust
testation. We support these functions in both bare-metal in the provider. For example, tenants often pass unprotected
and virtualized IaaS environments using a virtual TPM. secrets to their IaaS nodes via the cloud provider.
keylime provides a clean interface that allows higher level Commodity trusted hardware, like the Trusted Platform
security services like disk encryption or configuration man- Module (TPM) [40], has long been proposed as the solution
agement to leverage trusted computing without being trusted for bootstrapping trust, enabling the detection of changes to
computing aware. We show that our bootstrapping proto- system state that might indicate compromise, and establish-
col can derive a key in less than two seconds, we can detect ing cryptographic identities. Unfortunately, TPMs have not
system integrity violations in as little as 110ms, and that been widely deployed in IaaS cloud environments due to a
keylime can scale to thousands of IaaS cloud nodes. variety of challenges. First, the TPM and related standards
for its use are complex and difficult to implement. Second,
∗This material is based upon work supported by the As-
since the TPM is a cryptographic co-processor and not an
sistant Secretary of Defense for Research and Engineering accelerator, it can introduce substantial performance bottle-
under Air Force Contract No. FA8721-05-C-0002 and/or necks (e.g., 500+ms to generate a single digital signature).
FA8702-15-D-0001. Any opinions, findings, conclusions or
recommendations expressed in this material are those of the Lastly, the TPM is a physical device by design and most
author(s) and do not necessarily reflect the views of the As- IaaS services rely upon virtualization, which purposefully
sistant Secretary of Defense for Research and Engineering. divorces cloud nodes from the hardware on which they run.
c 2016 Massachusetts Institute of Technology. Delivered At best, the limitation to physical platforms means that
to the U.S. Government with Unlimited Rights, as defined only the cloud provider would have access to the trusted
in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwith- hardware, not the tenants [17, 20, 31]. The Xen hypervisor
standing any copyright notice, U.S. Government rights in
this work are defined by DFARS 252.227-7013 or DFARS includes a virtualized TPM implementation that links its se-
252.227-7014 as detailed above. Use of this work other than curity to a physical TPM [2, 10], but protocols to make use
as specifically authorized by the U.S. Government may vio- of the vTPM in an IaaS environment do not exist.
late any copyrights that exist in this work. To address these challenges we identify the following de-
ACM acknowledges that this contribution was authored or co-authored by an em-
sirable features of an IaaS trusted computing system:
ployee, or contractor of the national government. As such, the Government retains
a nonexclusive, royalty-free right to publish or reproduce this article, or to allow oth- • Secure Bootstrapping – the system should enable
ers to do so, for Government purposes only. Permission to make digital or hard copies the tenant to securely install an initial root secret into
for personal or classroom use is granted. Copies must bear this notice and the full ci-
tation on the first page. Copyrights for components of this work owned by others than
each cloud node. This is typically the node’s long term
ACM must be honored. To copy otherwise, distribute, republish, or post, requires prior cryptographic identity and the tenant chains other se-
specific permission and/or a fee. Request permissions from permissions@acm.org. crets to it to enable secure services.
ACSAC ’16, December 05-09, 2016, Los Angeles, CA, USA

c 2016 ACM. ISBN 978-1-4503-4771-6/16/12. . . $15.00 • System Integrity Monitoring – the system should
DOI: http://dx.doi.org/10.1145/2991079.2991104 allow the tenant to monitor cloud nodes as they oper-
ate and react to integrity deviations within one second.
Software-based Cryptographic Services
ID key Software
• Secure Layering (Virtualization Support) – the revoked? ID Keys

system should support tenant controlled bootstrapping Keylime


and integrity monitoring in a VM using a TPM in the Trusted Computing Services
Valid Signed
provider’s infrastructure. This must be done in collab- TPM? EKs
oration with the provider in least privilege manner. TPM / Platform Manufacturer Enrollment

• Compatibility – the system should allow the ten-


ant to leverage hardware-rooted cryptographic keys in
Figure 1: Interface between trusted hardware and
software to secure services they already use (e.g., disk
existing software-based security services via the
encryption or configuration management).
keylime trusted computing service layer.
• Scalability – the system should scale to support boot-
strapping and monitoring of thousands of IaaS resources
provisioning a key using keylime takes less than two sec-
as they are elastically instantiated and terminated.
onds. Finally, we find that our system can detect integrity
Prior cloud trusted computing solutions address a subset measurement violations in as little as 110ms.
of these features, but none achieve all. Excalibur [31] sup-
ports bootstrapping at scale, but does not allow for system 2. BACKGROUND
integrity monitoring or offer full support for tenant trusted Trusted Computing The TPM provides the means for
computing inside a VM (i.e., layering). Manferdelli et al. creating trusted systems that are amenable to system in-
created a system that supports secure layering and boot- tegrity monitoring. The TPM, as specified by the Trusted
strapping, but does not support system integrity monitor- Computing Group (TCG)4 , is a cryptographic co-processor
ing, is incompatible with existing cryptographic services, that provides key generation, protected storage, and crypto-
and has not demonstrated cloud scale operation [25]. Fi- graphic operations. The protected storage includes a set of
nally, the Cloud Verifier [34] enables system integrity mea- Platform Configuration Registers (PCRs) where the TPM
surement and cloud scalability but does not fully address stores hashes. The TPM uses these registers to store mea-
secure layering or enable secure bootstrapping. surements of integrity-relevant components in the system.
In this paper, we introduce keylime; an end-to-end IaaS To store a new measurement in a PCR, the extend opera-
trusted cloud key management service that supports all the tion concatenates the existing PCR value with the new mea-
above desired features. The key insight of our work is to surement, securely hashes5 that value, and stores the result-
utilize trusted computing to bootstrap identity in the cloud ing hash in the register. This hash chain allows a verifier to
and provide integrity measurement to support revocation, confirm that a set of measurements reported by the system
but then allow high-level services that leverage these iden- has not been altered. This report of measurements is called
tities to operate independently. Thus, we provide a clean an attestation, and relies on the quote operation, which ac-
and easy to use interface that can integrate with existing cepts a random nonce and a set of PCRs. These PCRs can
security technologies (see Figure 1). include measurements of the BIOS, firmware, boot loader,
We introduce a novel bootstrap key derivation protocol hypervisor, OS, and applications, depending on the configu-
that combines both tenant intent and integrity measurement ration of the system. The TPM reads the PCR values, and
to install secrets into cloud nodes. We then leverage the then signs the nonce and PCRs with a key that is only ac-
Cloud Verifier [34] pattern of Schiffman et al. to enable pe- cessible by the TPM. The key the TPM uses to sign quotes
riodic attestation that automatically links to identity revo- is called an attestation identity key (AIK). We denote a
cation. keylime supports the above with secure layering in quote using QuoteAIK (nonce, P CRi : di ...) for a quote us-
both bare-metal and virtualized IaaS resources in a manner ing AIK from the TPM with the associated nonce and one
that minimizes trust in the cloud provider. We demonstrate or more optional PCR numbers P CRi and corresponding
the compatibility of keylime by securely enabling cloud pro- data di that will be hashed and placed in P CRi .
visioning with cloud-init1 , encrypted communication with The TPM contains a key hierarchy for securely storing
IPsec, configuration management with Puppet2 , secret man- cryptographic keys. The root of this hierarchy is the Stor-
agement with Vault3 , and storage with LUKS/dm-crypt en- age Root Key (SRK) which the owner generates during TPM
crypted disks. Unlike existing solutions [39, 25], these ser- initialization. The SRK in turn protects the TPM AIK(s)
vices don’t need to be trusted computing aware, they just when they are stored outside of the TPM’s nonvolatile stor-
need to use an identity key and respond to key revocations. age (NVRAM). Each TPM also contains a permanent cre-
Finally, we show that keylime can scale to handle thou- dential called the Endorsement Key (EK). The TPM man-
sands of simultaneous nodes and perform integrity checks ufacturer generates and signs the EK. The EK uniquely
on nodes at rates up to 2,500 integrity reports (quotes) ver- identifies each TPM and certifies that it is a valid TPM
ified per second. We present and evaluate multiple options hardware device. The private EK never leaves the TPM,
for deploying our integrity measurement verifier both in the is never erased, and can only be used for encryption and
cloud, in a low-cost cloud appliance based on a Raspberry decryption during AIK initialization to limit its exposure.
Pi, and on-premises. We show that the overhead of securely
4
1
http://trustedcomputinggroup.org
http://launchpad.net/cloud-init 5
TPM specification version 1.2 uses SHA-1 for measure-
2
http://puppetlabs.com/ ments. TPM specification version 2.0 adds SHA-256 to ad-
3
http://hashicorp.com/blog/vault.html dress cryptographic weaknesses in SHA-1.
Integrity Measurement To measure a system component, 3. DESIGN
the underlying component must be trusted computing-aware. To address the limitations of current approaches, we con-
The BIOS in systems with a TPM supports measurement of sider the union of trusted computing and IaaS to provide
firmware and boot loaders. TPM-aware boot loaders can a hardware root-of-trust that tenants leverage to establish
measure hypervisors and operating systems [22, 19, 29]. To trust in the cloud provider’s infrastructure and in their own
measure applications, the operating system must support systems running on that infrastructure. This section consid-
measurement of applications that are launched, such as the ers the threats that keylime addresses, and how to leverage
Linux Integrity Measurement Architecture [30, 21]. One existing trusted computing constructs in a virtualized envi-
limitation of approaches like IMA is the inability to mon- ronment while limiting complexity and overhead.
itor the run-time state of the applications. Nexus aims to
address this limitation with a new OS that makes trusted 3.1 Threat Model
computing a first-class citizen, and supports introspection Our goal is to minimize trust in the cloud provider and
to validate run-time state [37]. Several proposals exist for carefully account for all concessions we must make to enable
providing run-time integrity monitoring including LKIM [24] trusted computing services. We assume the cloud provider is
and DynIMA [8]. These systems ensure that a running sys- semitrusted, i.e., they are organizationally trustworthy but
tem is in a known state, allowing a verifier to validate not are still susceptible to compromise or malicious insiders. We
only that what was loaded was known, but that it has not assume the cloud provider has processes, technical controls,
been tampered with while it was running. and policy in place to limit the impact of such compromise
In addition to operating system validation, others have from spreading across their entire infrastructure. Thus, in
leveraged trusted computing and integrity measurement to the semitrusted model, we assume that some fraction of the
support higher-level services, such as protected access to cloud provider’s resources may be under the control of the
data when the client is offline [23], or to enforce access adversary (e.g., a rogue system administrator may control a
policies on data [26]. Others have proposed mechanisms subset of racks in an IaaS region).
to protect the server from malicious clients, e.g., in online Specifically, we assume that the adversary can monitor or
gaming [1], or applications from a malicious operating sys- manipulate compromised portions of the cloud network or
tem [6, 7, 15]. However, these proposals do not account for storage arbitrarily. We assume that the adversary may not
the challenges of migrating applications to a cloud environ- physically tamper with any host’s (e.g., hypervisor or bare
ment, and often assume existing infrastructure to support metal node) CPU, bus, memory, or TPM8 . In virtualized en-
trusted computing key management. vironments, the security of our system relies upon keeping
IaaS Cloud Services In the IaaS cloud service model, users cryptographic keys in VM memory. Therefore, we assume
request an individual compute resource to execute their ap- that the provider does not purposefully deploy a hypervisor
plication. For example, users can provision physical hard- with the explicit capability to spy on tenant VM memory
ware, virtual machines, or containers. In this paper, we (e.g., Ether [9]). We assume that TPM and system manu-
refer to any of these tenant-provisioned IaaS resources as facturers have created the appropriate endorsement creden-
cloud nodes. Users provision nodes either by uploading a tials and have some mechanism to test their validity (i.e.,
whole image to the provider or by configuring a pared-down signed certificates)
base image that the provider makes available. Users often Finally, we assume that the attacker’s goal is to obtain
begin by customizing a provider-supplied image, then create persistent access to a tenant system in order to steal, dis-
their own images (using a tool like Packer6 ) to decrease the rupt, or deny the tenant’s data and services. To accomplish
amount of time it takes for a node to become ready. persistence the attacker must modify the code loading or
cloud-init is a standard cross-provider (e.g., Amazon running process. We assume that such modifications would
EC2, Microsoft Azure...) mechanism that allows cloud ten- be detected by load-time integrity measurement of the hy-
ants to specify bootstrapping data. It accepts a YAML- pervisor or kernel [19], runtime integrity measurement of the
formatted description of what bootstrapping actions should kernel [24], and integrity measurement of applications [30].
be taken and supports plugins to take those actions. Exam-
ples of such actions include: adding users, adding package 3.2 Architecture
repositories, or running arbitrary scripts. Users of cloud To introduce the architecture of keylime we first describe
computing resources at scale typically spawn new cloud in- a simplified architecture for managing trusted computing
stances using an application programming interface and pass services for a single organization, or cloud tenant, without
along enough bootstrapping information to allow the in- virtualization. We then expand this simplified architecture
stance to communicate with a configuration management into the full keylime architecture, providing extensions that
platform (such as Chef7 or Puppet, etc.) for further instance- allow layering of provider and tenant trusted computing ser-
specific configuration. These bootstrapping instructions are vices and supporting multiple varieties of IaaS execution
not encrypted, meaning that a provider could intercept se- isolation (i.e., bare metal, virtual machines, or containers).
crets passed via the bootstrapping instructions. In our re- Figure 3 depicts the full system architecture with layering.
search, we found that organizations will either (a) send an The first step in bootstrapping the architecture is to create
unprotected pre-shared key for Puppet in their cloud-init a tenant-specific registrar. The registrar stores and certifies
bootstrapping actions, or (b) rely on some weaker method of the public AIKs of the TPMs in the tenant’s infrastructure.
proving identity such as going off the certificate’s common In the simplified architecture, the tenant registrar can be
name (hostname). hosted outside the cloud in the tenant’s own infrastructure
8
6
This is similar to the threat model assumed by the TPM,
https://www.packer.io/ where physical protections are not a strict requirement to
7
https://www.chef.io/chef/ be compliant with the specification.
Table 1: Keys used by keylime and their purpose.
Key Type Purpose
EK RSA 2048 Permanent TPM credential that identifies the TPM hardware.
SRK RSA 2048 TPM key that protects TPM created private keys when they are stored outside the TPM.
AIK RSA 2048 TPM key used to sign quotes.
Ke AES-256 Enrollment key created by the registrar and used to activate the AIK.
Kb AES-256 Bootstrap key the tenant creates. keylime securely delivers to the node.
U, V 256bit random Trivial secret shares of Kb , derived with random 256bit V : U = Kb ⊕ V .
NK RSA 2048 Non-TPM software key used to protect secret shares U, V in transit.

Node Registrar associated with machines (physical or virtual) owned by the


ID,AIKpub,EKpub tenant. The registrar, CV, and cloud node service are the
only components in keylime that manage and use keys and
EncEK(H(AIKpub),Ke) public key infrastructures associated with the TPM.
HMACK (ID) The CV participates in a three party key derivation pro-
e tocol (we describe in detail in Section 3.2.2) where the CV
and tenant cooperate to derive a key, Kb , at the cloud node
to support initial storage decryption. The tenant uses Kb to
Figure 2: Physical node registration protocol.
protect tenant secrets and trust relationships. The tenant
can use this key to unlock either its disk image or to unlock
or could be hosted on a physical system in the cloud. The tenant-specific configuration provided by cloud-init.
registrar is only a trust root and does not store any tenant This protocol is akin to the method by which a user can
secrets. The tenant can decide to trust the registrar only use the TPM to decrypt his or her disk in a laptop. To allow
after it attests its system integrity. Since the registrar is a the decryption key to be used to boot the laptop, the user
simple a component with static code, verifying its integrity must enter a password (demonstrating the user’s intent) and
is straight forward. TPM PCRs must match a set of whitelisted integrity mea-
To create a registrar, we can leverage existing standards surements (demonstrating the validity of the system that
for the creation and validation of AIKs by creating a TCG will receive the decryption key). In an IaaS cloud environ-
Privacy CA [38]. To avoid the complexity of managing a ment, there is neither a trusted console where a user can
complex PKI and because there’s no need for privacy within enter a password nor is there a way to pre-seed the TPM
a single tenant’s resources, we created a registrar that simply with the storage key or measurement whitelist. Our pro-
stores valid TPM AIK public keys indexed by node UUID. tocol uses secret sharing to solve these problems by rely-
Clients request public AIKs from the registrar through a ing externally upon the CV for integrity measurement and
server-authenticated TLS channel. by having the tenant directly interact with the cloud node
To validate the AIKs in the registrar, we developed a to demonstrate intent to derive a key. The protocol then
TPM-compatible enrollment protocol (see Figure 2). The extends beyond bootstrapping to enable continuous system
node begins by sending its ID and standard TPM creden- integrity monitoring. The CV periodically polls each cloud
tials (EKpub , AIKpub ) to the registrar. The registrar then node’s integrity state to determine if any runtime policies
checks the validity of the TPM EK with the TPM manu- have been violated. The frequency with which the CV re-
facturer. Importantly, the generation and installation of the quests and verifies each node’s integrity state will define the
EK by the TPM manufacturer roots the trust upon which latency between an integrity violation and detection.
the rest of our system relies. If the EK is valid, the regis- To cleanly link trust and integrity measurement rooted
trar creates an ephemeral symmetric key Ke and encrypts it in the TPM to higher-level services, we create a parallel
along with a hash of the public AIK, denoted H(AIKpub ), software-only PKI and a simple service to manage it. The
with the TPM EKpub . The node uses the ActivateIden- goal is to remove the need to make each service trusted
tity TPM command to decrypt Ke . The TPM will only computing-aware, e.g. integrating Trusted Network Connect
decrypt Ke if has EKpriv and if it has AIKpriv correspond- into StrongSwan9 . We refer to this parallel software-only
ing to H(AIKpub ). The nodes uses an HMAC to prove that service as the software CA. To bootstrap this service, we
it can decrypt the ephemeral key Ke . The registrar then use the key derivation bootstrap protocol to create a cloud
marks that AIK as being valid so that it can be used to node to host the service. Since the bootstrap key derivation
validate quotes. protocol ensures that the node can only derive a key if the
The core component of keylime is an out of band cloud tenant authorizes it and if the node’s integrity state is ap-
verifier (CV) similar to the one described by Schiffman et proved, we can encrypt the private key for the software CA
al. [34]. Each cloud organization will have at least one CV and pass it to the node upon provisioning. Once established,
that is responsible for verifying the system state of the orga- we can then start other cloud nodes and securely pass them
nization’s IaaS resources. The tenant can host the CV in the keys signed by this CA. The linkage to the hardware root
IaaS cloud or on-premises at their own site (we give options of trust, the secure bootstrapping of relevant keys, and user
for tenant registrar and CV deployment in Section 3.2.1). intent to create new resources are again ensured using the
The CV relies upon the tenant registrar for validating that
the AIKs used to sign TPM quotes are valid, or more specif- 9
https://wiki.strongswan.org/projects/strongswan/wiki/
ically, that the AIKs are recognized by the tenant as being TrustedNetworkConnect
Tenant

Certificate authority
Bootstrap Key
derivation
Software CA Cloud Node VM
(Cloud Node VM) Trust Tenant CV Signed
whitelists
unwrap AIK good?
Identity Bootstrap
Revocation Key Key Kb
Service vTPM vTPM Virtu
al Enro
vAIK good?
llmen
t
Deep Quote
Tenant Registrar
Hypervisor Hypervisor
TPM TPM Enro
llme
AIK good?
nt
Bo
u nd Provider Whitelist
at Provider Registrar
ma
nu Authority
fac
tur
in g
TPM Good?
TPM / Platform
Manufacturer

Figure 3: Layered keylime trusted computing architecture.

bootstrap key derivation protocol. Once established, stan- of virtual machines that can be hosted on a single modern
dard tools and services like IPsec or Puppet can now directly system. As described by Berger et al. [2] and as implemented
use the software CA identity credentials that each node now in Xen [10], we utilize a virtualized implementation of the
possesses. TPM. Each VM has its own software TPM (called a vTPM)
To complete the linkage between the trusted computing whose trust is in turn rooted in the hardware TPM of the
services and the software identity keys, we need a mechanism hosting system. The vTPM is isolated from the guest that
to revoke keys in the software PKI when integrity violations use it, by running in a separate Xen domain.
occur in the trusted computing layer. The CV is responsible The vTPM interface is the same as a hardware TPM. The
for notifying the software CA of these violations. The CV only exception to this, is that the client can request a deep-
includes metadata about the nature of the integrity viola- quote 12 that will get a quote from the hardware TPM in
tion, which allows the software CA to have a response policy. addition to getting a quote from the vTPM. These quotes
The software CA supports standardized methods for certifi- are linked together by including a hash of the vTPM quote
cate revocation like signed revocation lists or by hosting an and nonce in the hardware TPM quote’s nonce. Deep quotes
OCSP responder. To support push notifications of failures, suffer from the slow performance of hardware TPMs, but as
the software CA can also publish signed notifications to a we’ll show in later this section, we can limit the use of deep
message bus. This way services that directly support revo- quotes while still maintaining reasonable performance and
cation actions can subscribe to notifications (e.g., to trigger scale and maintaining security guarantees.
a re-key in a secret manager like Vault). To assure a chain of trust that is rooted in hardware, we
need the IaaS provider to replicate some of the trusted com-
3.2.1 Layering Trust puting service infrastructure in their environment and allow
We next expand this architecture to work across the layers the tenant trusted computing services to query it. Specif-
of virtualization common in today’s IaaS environments. Our ically, the provider must establish a registrar for their in-
goal is to create the architecture described previously that frastructure, must publish an up-to-date signed list of the
cleanly links common security services to a trusted com- integrity measurements of their infrastructure (hosted by a
puting layer in a cloud tenant’s environment. Thus, in a whitelist authority service), and may even have their own
VM hosting environment like Amazon EC2 or OpenStack, CV. The tenant CV will interact with the whitelist author-
we aim to create trusted computing enabled software CAs ity service and the provider’s registrar to verify deep quotes
and tenant nodes inside of virtual machine instances. Note collected by the infrastructure.
that, in a bare-metal provisioning environment like IBM Despite the fact that most major IaaS providers run closed-
Softlayer10 , HaaS [13], or OpenStack Ironic11 , we can di- source hypervisors and would provide opaque integrity mea-
rectly utilize the simplified architecture where there is no surements [16], we find there is still value in verifying the
trust layering. integrity of the provider’s services. By providing a known-
We observe that IaaS-based virtual machines or physical good list of integrity measurements, the provider is com-
hosts all provide a common abstraction of isolated execu- mitting to a version of the hypervisor that will be deployed
tion. Each form of isolated execution in turn needs a root of widely across their infrastructure. This prevents a targeted
trust on which to build trusted computing services. Due to attack where the a single hypervisor is replaced with a mali-
the performance and resource limitations of typical TPMs cious version designed to spy on the tenant (e.g., the provider
(e.g., taking 500 or more milliseconds to generate a quote, is coerced by a government to monitor a specific cloud ten-
and only supporting a fixed number of PCRs), direct multi-
plexed use of the physical TPM will not scale to the numbers 12
We use similar notation for quotes as we do for deep quotes,
10
DeepQuoteAIK,vAIK (nonce, P CRi : di , vP CRj : dj ), ex-
http://www.softlayer.com cept that PCRs may be from both physical and virtual sets
11
https://wiki.openstack.org/wiki/Ironic of PCRS. We use virtual PCR #16 to bind data.
Tenant Provider to the tenant registrar. The tenant registrar then returns
Node ID,vAIKpub,vEKpub Registrar Registrar
EncvEK (H(AIKpub ), Ke ) without additional checks. The
EncvEK(H(AIKpub),Ke) Server TLS
virtual node then decrypts Ke using ActivateIdentity func-
tion of its vTPM. The node next requests a deep quote using
DeepQuoteAIK,vAIK(H(Ke), a hash of Ke as the nonce to both demonstrate the freshness
v16:H(ID,vAIKpub,vEKpub)) ID of the quote and knowledge of Ke to the tenant registrar.
OK AIK It also uses virtual PCR #16 to bind the vTPM credentials
and ID to the deep quote. Upon receiving the deep quote,
Figure 4: Virtual node registration protocol. the tenant registrar asks the provider registrar if the AIK
from the deep quote is valid. The tenant registrar also re-
quests the latest valid integrity measurement whitelists from
the provider. Now the tenant registrar can check the va-
ant). Thus, an attacker must subvert both the code loading
lidity of the deep quote’s signature, ensure that the nonce
process on all the hypervisors and the publishing and signing
is H(Ke ), confirm the binding data in PCR #16 matches
process for known-good measurements. In our semitrusted
what was provided in the previous step, and check the phys-
threat model, we assume the provider has controls and mon-
ical PCRs values in the deep quote against the provider’s
itoring which limit the ability of a rogue individual to ac-
whitelist. Only if the deep quote is valid will the tenant
complish this.
registrar mark the vAIK as being valid.
As in Section 3.2, we begin with the establishment of a
When considering the cost of performing a deep quote,
tenant registrar and cloud verifier. There are multiple op-
the provider must carefully consider the additional latency
tions for hosting these services securely: 1) in a bare metal
of the physical TPM. Deep quotes provide a link between
IaaS instance with TPM, 2) on-tenant-premises in tenant-
the vTPM and the physical TPM of the machine, and new
owned hardware, 3) in a small trusted hardware appliance
enrollments should always include deep quotes. When con-
deployed to the IaaS cloud provider, and 4) in an IaaS vir-
sidering if deep quotes should be used as part of periodic
tual machine. The first three of these options rely upon
attestation, we must understand what trusted computing
the architecture and protocols we’ve already discussed. The
infrastructure the provider has deployed. If the provider is
last option requires the tenant to establish an on-tenant-
doing load time integrity only (e.g., secure boot), then deep
premises CV and use that to bootstrap the tenant registrar
quotes will only reflect the one-time binding at boot between
and CV. This on-tenant-premises CV identifies and checks
the vTPM and the physical TPM and the security of the
the integrity of the tenant’s virtualized registrar and CV,
vTPM infrastructure. If the provider has runtime integrity
who then in turn are responsible for the rest of the tenant’s
checking of their infrastructure, there is value in the ten-
virtualized infrastructure.
ant performing periodic attestation using deep quotes. In
The primary motivations for a tenant choosing between
the optimal deployment scenario, the provider can deploy
these options are the detection latency for integrity viola-
keylime and provide tenants with access to the integrity
tions, scale of IaaS instances in their environment, band-
state of the hypervisors that host tenant nodes. To limit
width between the tenant and the IaaS cloud, and cost. Op-
the impact of slow hardware TPM operations, the provider
tion 1 provides maximum performance but at higher cost.
can utilize techniques like batch attestation where multiple
Option 2 will by limited by bandwidth and requires more
deep quote requests from different vTPMs can be combined
costs to maintain resources outside of the cloud. Option 3 is
into a single hardware TPM operation [28, 31].
a good trade-off between cost and performance for a small
cloud tenant with only tens of nodes or who can tolerate a
longer detection latency. Finally, Option 4 provides compat- 3.2.2 Key Derivation Protocol
ibility with current cloud operations, good performance and We now introduce the details of our bootstrap key deriva-
scalability, and low cost at the expense of increased complex- tion protocol (Figure 5). The goal of this protocol is for the
ity. In Section 5, we examine the performance trade-offs of cloud tenant to obtain key agreement with a cloud node they
these options including a low-cost registrar and CV appli- have provisioned in an IaaS system. The protocol relies upon
ance (Option 3) we implemented on a Raspberry Pi. the CV to provide integrity measurement of the cloud node
Once we have created the tenant registrar and CV, we can during the protocol. The tenant also directly interacts with
begin securely bootstrapping nodes into the environment. the cloud node to demonstrate their intent to spawn that re-
As before, the first node to establish is a virtualized software source and allow it to decrypt sensitive contents. However,
CA and we do this by creating a private signing key offline the tenant does not directly perform integrity measurement.
and protecting it with a key that will be derived by the This separation of duties is beneficial because the attesta-
bootstrap key derivation protocol. The following process tion protocols may operate in parallel and it simplifies de-
will be the same for all tenant cloud nodes. When a node ployment by centralizing all integrity measurement, white
boots, it will get a vTPM from the IaaS provider. lists, and policy in the CV.
The process of enrolling a vTPM into the tenant regis- To begin the process, the tenant generates a fresh random
trar needs to securely associate the vTPM credentials, e.g., symmetric encryption key Kb . The cloud tenant uses AES-
(vEK, vAIK), with a physical TPM in the provider’s infras- GCM to encrypt the sensitive data to pass to the node d with
tructure (see Figure 4). The tenant registrar cannot directly Kb , denoted EncKb (d). The tenant then performs trivial
verify the authenticity of the vEK because it is virtual and secret sharing to split Kb into two parts U , which the tenant
has no manufacturer. To address this, we use a deep quote to will retain and pass directly to the cloud node and V , which
bind the virtual TPM credentials to a physical TPM AIK. the tenant will share with the CV to provide to the node
The vTPM enrollment protocol begins like the physical upon successful verification of the node’s integrity state. To
TPM enrollment protocol by sending ID, (EKpub , AIKpub ) obtain these shares the tenant generates a secure random
Tenant Cloud Veri0ier Node Registrar

ID, V, IP, port, whitelist A


B
nCV, mask
QuoteAIK(nCV,16:H(Nkpub),xi :yi),NKpub

Valid AIK?
EncNK(V)

C nt
QuoteAIK(nt,16:H(NKpub))),NKpub

Valid AIK? Legend


Mutual TLS
Enc NK (U),HMAC K b (ID) Server TLS
No TLS

Figure 5: Three Party Bootstrap Key Derivation Protocol.

value V the same length as Kb and computes U = Kb ⊕ V . the TPM thereby authenticating N K. The initiator can
In the next phase of the protocol, the tenant requests the then encrypt its share of Kb using N Kpub and securely re-
IaaS provider to instantiate a new resource (i.e., a new vir- turn it to the cloud node.
tual machine). The tenant sends EncKb (d) to the provider The differences in the attestation protocol between CV
as part of the resource creation. The data d may be configu- and tenant arise in how each validates TPM quotes. Be-
ration metadata like a cloud-init script13 . Upon creation, cause we wish to centralize the adjudication of integrity
the provider returns a unique identifier for the node uuid measurements to the CV, the TPM quote that the tenant
and an IP address at which the tenant can reach the node. requests only verifies the identity of the cloud node’s TPM
After obtaining the node uuid and IP address, the tenant and doesn’t include any PCR hashes. Since the tenant gen-
notifies the CV of their intent to boot a cloud node (see area erates a fresh Kb for each cloud node, we are not concerned
A in Figure 5). The tenant connects to the CV over a secure with leaking U to a node with invalid integrity state. Fur-
channel, such as mutually authenticated TLS, and provides thermore, because V is only one share of Kb , the CV cannot
v, uuid, node IP, and a TPM policy. The TPM policy speci- be subverted to decrypt resources without user intent.
fies a white list of acceptable PCR values to expect from the We now describe the attestation protocol in detail. The
TPM of the cloud node. At this point the CV and tenant initiator first sends a fresh nonce (nt for the tenant as in B
can begin the attestation protocol in parallel. from Figure 5 and nCV for the cloud verifier as in C from
The attestation protocol of our scheme is shared between Figure 5) to the cloud node along with a mask indicating
the interactions of the CV and the cloud node (B) and that which PCRs the cloud node should include in its quote. The
of the tenant and the cloud node (C) with only minor dif- CV sets the mask based on TPM policy exchanged earlier
ferences between them (Figure 5). The protocol consists and the tenant creates an empty mask. We extend a hash of
of two message volleys the first for the initiator (either CV N Kpub into a freshly reset PCR #16. The initiator requests
or tenant) to request a TPM quote and the second for the a quote from the TPM with the given PCR mask. The node
initiator to provide a share of Kb to the cloud node upon then returns QuoteAIK (n, 16 : H(N Kpub ), xi , : yi ), N Kpub
successful validation of the quote. Since we use this pro- to the initiator. Additional PCR numbers xi and values yi
tocol to bootstrap keys into the system, there are no ex- are only included in the quote returned to the cloud verifier
isting software keys with which we create a secure channel. based on the TPM policy it requested. During the protocol
Thus, this protocol must securely transmit a share of Kb to provide U , the tenant also supplies HM ACKb (ID) to
over an untrusted network. We accomplish this by creat- the node. This provides the node with a quick check to
ing an ephemeral asymmetric key pair on the node, denoted determine if Kb is correct.
N K, outside of the TPM14 . As in Section 3.2.1, we use PCR The initiator then confirms that the AIK is valid accord-
#16’s value in a TPM quote to bind N K to the identity of ing to the tenant registrar over server authenticated TLS. If
the initiator is the CV, then it will also check the other PCRs
13
Because Kb may not be re-used in our protocol, the cost to ensure they are valid according to the tenant-specified
of re-encrypting large disk images for each node may be whitelist. If the node is virtual, then the quote to the CV
prohibitive. We advocate for encrypting small sensitive data will also include a deep quote of the underlying hardware
packets like a cloud-init script, and then establish local
storage encryption with ephemeral keys. TPM. The CV will in turn validate it as described in the
14
N K could also be generated and reside inside the TPM. previous section. Upon successful verification, the initiator
However, since it is ephemeral, is only used for transport can then return their share of Kb . Thus, the tenant sends
security and it is authenticated by the TPM using the quote, EncN K (U ) and the cloud verifier sends EncN K (V ) to the
we found the added complexity of also storing it in the TPM node. The cloud node can now recover Kb and proceed with
unneeded.
the boot/startup process. node has correctly derived Kb , it mounts a small in-memory
The cloud node does not retain Kb or V after decryption file system using tmpfs and writes the key there for other
of d. To support node reboot or migration, the cloud node applications to access.
stores U in the TPM NVRAM to avoid needing the tenant
to interact again. After rebooting, the node must again 4.1 Integration
request verification by the CV to obtain V and re-derive While the key derivation protocol of keylime is generic
Kb . If migration is allowed, the provider must take care to and can be used to decrypt arbitrary data, we believe the
also securely migrate vTPM state to avoid losing U . most natural cloud use-case for it is to decrypt a small IaaS
node-specific package of data. To enable this use-case we
4. IMPLEMENTATION have integrated keylime with the cloud-init package, the
We implemented keylime in approximately 5,000 lines of combination we call trusted-cloud-init. As described in
Python in four components: registrar, node, CV, and ten- Section 2, cloud-init is widely adopted mechanism to de-
ant. We use the IBM Software Trusted Platform module ploy machine-specific data to IaaS resources. To integrate
library [18] to directly interact with the TPM rather than keylime and cloud-init, we patched cloud-init to sup-
going through a Trusted Software Stack (TSS) like Trousers. port AES-GCM decryption of the user-data (where cloud-
The registrar presents a REST-based web service for en- init stores tenant scripts and data). We modified the up-
rolling node AIKs. It also supports a query interface for start system in Ubuntu Linux to start the keylime node
checking the keys for a given node UUID. The registrar use service before cloud-init. We then configure cloud-init
HMAC-SHA384 to check the node’s knowledge of Ke during to find the key that keylime creates in the tmpfs mounted
registration. file system. After successful decryption, cloud-init deletes
The node component runs on the IaaS machine, VM, or the key and scrubs it from memory.
container and is responsible for responding to requests for To support applications that need node identities that do
quotes and for accepting shares of the bootstrap key Kb . It not manage their own PKIs, we implemented a simple soft-
provides an unencrypted REST-based web service for these ware CA. The tenant provisions the software CA by creating
two functions. the CA private key offline and delivering it to a new node
To support vTPM operations, we created a service the using trusted-cloud-init. We also deliver certificates to
IaaS provider runs to manage hardware TPM activation and the software CA that allow it and the tenant to mutually au-
vTPM creation/association. This service runs in a desig- thenticate each other via trusted-cloud-init. To demon-
nated Xen domain and has privileges to interact with the strate the clean separation between the trusted computing
Xen vtpmmgr domain [11]. We then implemented a utility layer and the software key management layer, we use the
for the deep quote operation. Since the Xen vTPM imple- ZMQ Curve secure channel implementation [14]. This sys-
mentation does not directly return the PCR values from the tem uses an elliptic curve cryptography scheme dissimilar
virtual TPM (i.e., the shallow quote) during a deep quote, from the cryptographic algorithms, keys, and other tech-
we chose to first execute a shallow quote, hash its contents niques the TPM uses.
with the provided nonce, and place them in the nonce field To enroll a new node, the tenant first generates a node
of the deep quote. This operation cryptographically binds identity key pair using the software CA client. The soft-
them together. This operation is not vulnerable to man-in- ware CA supports a plugin architecture that allows the ten-
the-middle attack since there is no other interface to directly ant to specify what type of key pairs to create (e.g., X.509
manipulate the nonce of a deep quote [36]. We then return RSA 2048). The tenant then connects securely to the soft-
both the shallow and deep quotes and require the verifier ware CA over ZMQ and gets the node’s identity certificate
checks both signatures and sets of PCR values. signed. The tenant can now provision a new node with this
The cloud verifier hosts a TLS-protected REST-based web identity using trusted-cloud-init. The software CA also
service for control. Tenants add and remove nodes to be supports receiving notifications from the CV if a node later
verified and also query their current integrity state. Upon fails integrity measurement.
being notified of a new node, the CV enqueues metadata To support transparent integration with an IaaS plat-
about the node onto the quote_request queue where a con- form, we patched OpenStack Nova and libvirt to support
figurable pool of worker processes will then request a deep the creation of companion vTPM Xen domains for each
quote from the node. Upon successful verification of the user created instance. We link the OpenStack UUID to the
quote, the CV will use an HTTP POST to send V to the keylime provider registrar. We then implemented a wrap-
node. The CV uses PKCS#1 OAEP and with RSA 2048 per for the OpenStack Nova command line interface that
keys to protect shares of Kb in transit. enables trusted-cloud-init. Specifically, our wrapper in-
The tenant generates a random 256-bit AES key and en- tercepts calls to nova boot and automatically encrypts the
crypts and authenticates the bootstrap data using AES with provided user-data before passing it to OpenStack. It then
Galois Counter Mode [27]. The tenant uses trivial XOR- calls the keylime tenant, which begins the bootstrapping
based secret sharing to split Kb into V and U . The tenant protocol. This allows OpenStack users to transparently use
executes a simplified version of the same protocol that the keylime and cloud-init without needing to fully trust the
CV uses. The tenant checks with the registrar to determine OpenStack provider not to tamper or steal the sensitive con-
if the quote signing AIK is valid and owned by the tenant. tents of their user-data.
Upon receiving U and V , the node can then combine them
to derive Kb . To limit the impact of rogue CVs or tenants 4.2 Demonstration Applications
connecting to the node’s unauthenticated REST interface, We next describe how keylime can securely bootstrap
the node stores all received U and V values and iteratively and handle revocation for existing non-trusted computing-
tries each combination to find the correct Kb . Once the aware applications and services common to IaaS cloud de-
ployments.
IPsec To enable secure network connectivity similar to Table 2: Average TPM Operation Latency (ms).
TNC [39], we implemented trusted-cloud-init scripts to TPM vTPM Deep quote
automatically encrypt all network traffic between a tenant’s Create Quote 725 68.5 1390
IaaS resources. The scripts use the OpenStack API for IP Check Quote 4.64 4.64 5.33
address information and then build configurations for the
Linux IPsec stack and raccoon15 . This configuration is also
easily extensible to a TLS-based VPN like OpenVPN16 . kernel. We collected both vTPM quote and deep quote mea-
Puppet To enable secure system configuration manage- surements from a domain running on Xen. As expected, op-
ment, we integrated keylime with Puppet We do so by gen- erations that require interaction with the physical TPM are
erating the signed RSA keys that Puppet uses to communi- slow. Verification times, even for deep quotes that include
cate with the Puppet master using the Software CA process two RSA signature verifications, are comparatively quick.
described previously. These steps bypass the need to either
use the insecure autosign option in the Puppet master to 5.2 Key Derivation Protocol
blindly accept new nodes or to have an operator manually We next investigate the CV latency of different phases
approve/deny certificate signing requests from new nodes. of our protocol. In Figures 6 and 7, we show the aver-
To support continuous attestation and integrity measure- aged results of hundreds of trials of the CV with 100 vTPM
ment, we implemented a plug-in for the CV that notifies the equipped VMs. Each operation includes a full REST in-
tenant’s Puppet master when a node fails its integrity mea- teraction along with the relevant TPM and cryptographic
surements. The master can then revoke that node’s access operations. We also benchmarked the latency of the pro-
to check-in and receive the latest configuration data. tocol phases emulating zero latency from the TPM (Null
Vault While tools like Puppet are often used to provi- TPM). This demonstrates the minimum latency of our CV
sion secrets and keys, tenant operators can instead use a software architecture including the time required to verify
dedicated secret management system that supports the full quotes. The results from the Null TPM trials indicate that
lifecycle of cryptographic keys directly. To demonstrate this, our network protocol and other processing impose very little
we have integrated keylime with Vault, a cloud-compatible additional overhead, even on the relatively modestly pow-
secret manager. Like Puppet, we use the Software CA to ered Raspberry Pi. The bare metal system had a slightly
provision RSA certificates for each node and configure Vault larger network RTT to the nodes it was verifying causing it
to use them. We also implemented a revocation plugin for to have a higher latency than the less powerful m1.large.
the CV that notifies Vault to both revoke access to a node In Figure 7, we see that latency for the quote retrieval
that fails integrity measurement and to re-generate and re- process is primarily affected by slow TPM operations and is
distribute any keys to which that node had access. comparable to prior work [31]. The bootstrapping latency
LUKS Finally, to demonstrate our ability to provision is the sum of the latencies for retrieving a quote and provid-
secrets instead of cryptographic identities, we implemented a ing V. We find the bootstrapping latency for bare metal and
trusted-clout-init script that provides the key to unlock virtual machines to be approximately 793ms and 1555ms re-
an encrypted volume on boot. spectively. Virtual nodes doing runtime integrity measure-
ment after bootstrapping benefit from much lower latency
5. EVALUATION for vTPM operations. Thus, for a virtual machine with a
vTPM, keylime can detect integrity violations in as little as
In this section we evaluate the overhead and scalability of
110ms. The detection latency for a system with a physical
keylime in a variety of scenarios. We ran our experiments
TPM (781ms for our Xen host) is limited by the speed of
on a private OpenStack cluster, a Xen host, and a Raspberry
the physical TPM at generating quotes.
Pi. In OpenStack, we used standard instance flavors where
the m1.small has 1 vCPU, 2GB RAM, and a 20GB disk,
and the m1.large has 4 vCPUs, 8GB RAM, an 80GB disk.
5.3 Scalability of Cloud Verifier
We used Ubuntu Linux 14.10 as the guest OS in OpenStack Next we establish the maximum rate at which the CV can
instances. The Xen host had one Xeon E5-2640 CPU with get and check quotes for sustained integrity measurement.
6 cores at 2.5Ghz, 10Gbit NIC, 64 GB RAM, a WinBond This will define the trade-off between the number of nodes
TPM, and ran Xen 4.5 on Ubuntu Linux 15.04. The Rasp- a single CV can handle and the latency between when an
berry Pi 2 had one ARMv7 with 4-cores at 900Mhz, 1GB integrity violation occurs and the CV detects it. Since the
RAM, 100Mbit NIC, and ran Raspbian 7. We ran each of CV quote checking process is a simple round robin check of
the following experiments for 1-2 minutes and present aver- each node, it is easy to parallelize across multiple CVs fur-
ages of the performance we observed. ther enhancing scalability. We emulate an arbitrarily large
population of real cloud nodes using a fixed number test
5.1 TPM Operations cloud nodes. These test cloud nodes emulate a zero latency
We first establish a baseline for the performance of TPM TPM by returning a pre-constructed quote. This way the
operations with the IBM client library, our Python wrap- test nodes appear like a larger population where the CV will
per code, the Xen virtual TPM, and the physical TPM. We never have to block for a lengthy TPM operation to com-
benchmarked both TPM quote creation and verification on plete. We found that around 500 zero latency nodes were
the Xen host (Table 2). We collected the physical TPM mea- sufficient to achieve the maximum quote checking rate.
surements on the same system with a standard (non-Xen) We show the average number of quotes verified per second
for each of our CV deployment options in Figure 8. Because
15
http://ipsec-tools.sourceforge.net/ of our process-based worker pool architecture, the primary
16
https://openvpn.net/ factor affecting CV scalability is the number of cores and
60
Bare Metal Bare Metal 2500

Integrity Check Rate (quotes/s)


1600
m1.large m1.large
50 m1.small m1.small
1400 2000
Raspberry Pi Raspberry Pi
40 1200
Latency (ms)

Latency (ms)
1000 1500
30
800
1000
20 600
400 500
10
200
0
0 0 l ll i
Meta m1.large m1.sma pberry P
Provide V Null TPM vTPM Quote TPM Quote DeepQuote Bare Ras

Figure 6: Latency of keylime boot- Figure 7: Latency of TPM opera- Figure 8: Maximum CV quote
strapping protocol. tions in bootstrapping protocol. checking rate of keylime.

Table 3: On-Premises bare metal CV verifying 250


2500 Cloud Nodes using 50 CV processes.
Network RTT Rate Bandwidth
2000 (ms) (quotes/s) (Kbits/s)
Rate (quotes/s)

4ms (native) 937 3085


1500 25ms 613 2017
50ms 398 1310
75 ms 282 928.3
1000
100 ms 208 684.7
150 ms 141 464.2
500

0 lays we inserted using the comcast17 tool in Table 3. These


0 50 100 150 200 250 300 results show that it is possible to run the CV on-premises at
Num CV Processes the tenant site at the cost of a reduction in quote checking
rate (and therefore detection latency) and several Mbit/s
Figure 9: Scaling the CV on bare metal of bandwidth. As such, we recommend the highest perfor-
mance and lowest cost option is to run the CV in the cloud
alongside the nodes it will verify.
RAM available. These options provide a variety of choices
for deploying the CV. For small cloud tenants a low cost 6. RELATED WORK
VM or inexpensive hardware appliance can easily verify hun- Many of the challenges that exist in traditional enterprise
dreds of virtual machines with moderate detection latency networks exist in cloud computing environments as well.
(5-10s). For larger customers, a well-resourced VM or dedi- However, there are new challenges and threats including
cated hardware can scale to thousands with similar latency. shared tenancy and additional reliance on the cloud provider
For high security environments, all options can provide sub- to provide a secure foundation [4, 16]. To address some of
second detection and response time. In future work, we plan the challenges, many have proposed trusted computing.
to implement a priority scheme that allows the tenant to set The existing specifications for trusted computing rely on
the rate of verifications for different classes of nodes. trusted hardware, and assume a single owner of the system.
We next show how our CV architecture can scale by adding With the advent of cloud computing, this assumption is no
more cores and parallelism. We use the bare metal CV and longer valid. While both the standards community [41] and
show the average rate of quotes retrieved and checked per prior work [2] is beginning the process of supporting virtu-
second for 500 test nodes in Figure 9. We see linear speed- alization, no end-to-end solution exists. For example, the
up until we exhaust the parallelism of the host CPU and cTPM system [5] assumes a trustworthy cloud provider and
the concurrent delay of waiting for many cloud nodes. This requires modifications to trusted computing standards. An-
performance represents a modest performance improvement other proposal for higher-level validation of services provides
over Schiffman et al.’s CV which was able to process approx- a cryptographically signed audit trail that the hypervisor
imately 2,000 quotes per second on unspecified hardware [34] provides to auditors [12]. The audit trail captures the exe-
and a substantial improvement over the Excalibur monitor’s cution of the applications within the virtual machine. This
ability to check approximately 633 quotes per second [31]. proposal does not provide a trusted foundation for the audit
trail, and assumes a benign hypervisor. Bleikertz, et al., pro-
5.4 On-Premises Cloud Verifier pose to use trusted computing to provide cryptographic ser-
vices for cloud tenants[3]. Their Cryptography-as-a-Service
Finally, we investigate the performance of the CV when
(CaaS) system relies on trusted computing, but does not ad-
hosted at the tenant site away from the cloud. We show
dress bootstrapping and requires hypervisor modifications
the results of our bare metal system’s quote verification
17
rate and the bandwidth used for a variety of network de- https://github.com/tylertreat/Comcast
that cloud providers are unlikely to support. tail the secure bootstrapping of their Key Server or Privacy
To address the issues of scalability, several proposals ex- CA component for TPM initialization. keylime explicitly
ist to monitor a cloud infrastructure, and allow for valida- describes bootstrapping of all relevant components and en-
tion of the virtual machines controlled by the tenants of the ables multiple realistic secure deployment options for CV
cloud [34, 32, 33, 35]. The cloud verifier pattern proposed and registrar hosting.
by Schiffman, et al., allows a single verifier to validate trust
in the cloud infrastructure, and in turn the cloud verifier
“vouches” for the integrity of the cloud nodes. This removes 7. CONCLUSION
the need for tenants to validate the integrity of the hyper- In this paper, we have shown that keylime provides a
visor hosts prior to instantiating cloud nodes on them and fully integrated solution to bootstrap and maintain hardware-
avoids the need for nodes to mutually attest each other be- rooted trust in elastically provisioned IaaS clouds. We have
fore communicating. The tenant simply provides their in- demonstrated that we can bootstrap hardware-rooted cryp-
tegrity verification criteria to the verifier, and the verifier en- tographic identities into both physical and virtual cloud
sures that the tenant’s integrity criteria are satisfied as part nodes, and leverage those identities in higher-level security
of scheduling resources. We utilize the cloud verifier pat- services, without requiring each service to become trusted
tern in our work, with some important differences. First we computing-aware. keylime uses a novel key derivation pro-
extend it to support secure system bootstrapping for both tocol that incorporates a tenant’s intent to provision new
bare metal and virtualized IaaS environments. Second, we cloud resources with integrity measurement. Finally, we
do not host any tenant-owned parts of the integrity mea- have demonstrated and evaluated several deployment sce-
surement infrastructure in the provider’s control as they do. narios for our system’s critical component, the cloud ver-
This means that our solution is substantially less invasive ifier. keylime can securely derive a key in less than two
to the cloud provider’s infrastructure (e.g., they required seconds during the provisioning and bootstrapping process,
nearly 5,000 lines of code to be added to OpenStack) and and requires as little as 110ms to respond to an integrity
is less prone to compromise. For example, keylime relies violation. Furthermore, we have shown that keylime can
upon the vTPM integrity measurements inside tenant nodes scale to support thousands of IaaS nodes while maintaining
rather than enabling the cloud provider to have explicit vir- quick response to integrity violations.
tual machine introspection (i.e., secret stealing) capabilities.
Excalibur works to address the scalability problems of
trusted computing by leveraging ciphertext policy attribute-
8. REFERENCES
based encryption (CPABE) [31]. This encryption scheme [1] S. Balfe and A. Mohammed. Final fantasy – securing
allows data to be encrypted using keys that represent at- on-line gaming with trusted computing. In B. Xiao,
tributes of the hypervisor hosts in the IaaS environment L. Yang, J. Ma, C. Muller-Schloer, and Y. Hua,
(e.g., software version, country, zone). Using Excalibur, editors, Autonomic and Trusted Computing, volume
clients can encrypt sensitive data, and be assured that a 4610 of Lecture Notes in Computer Science, pages
hypervisor will only be given access to the data if the policy 123–134. Springer Berlin Heidelberg, 2007.
(the specified set of attributes) is satisfied. Excalibur only [2] S. Berger, R. Cáceres, K. A. Goldman, R. Perez,
addresses trusted bootstrapping for the underlying cloud R. Sailer, and L. van Doorn. vtpm: Virtualizing the
platform. Therefore, a compromised tenant node would be trusted platform module. In Proceedings of the 15th
neither detected nor prevented. The Excalibur monitor is Conference on USENIX Security Symposium - Volume
a provider-owned (but attested) component that holds the 15, USENIX-SS’06, Berkeley, CA, USA, 2006.
encryption keys that allow a node to boot on a particu- USENIX Association.
lar hypervisor. keylime uses secret sharing to avoid having [3] S. Bleikertz, S. Bugiel, H. Ideler, S. Nürnberger, and
bootstrap key stored (and therefore vulnerable) in any cloud A.-R. Sadeghi. Client-controlled
systems except for in the cloud node for which they are in- cryptography-as-a-service in the cloud. In
tended. M. Jacobson, M. Locasto, P. Mohassel, and
The CloudProxy Tao system provides building blocks to R. Safavi-Naini, editors, Applied Cryptography and
establish trusted services in a layered cloud environment [25]. Network Security, volume 7954 of Lecture Notes in
The Tao environment relies on the TPM to establish iden- Computer Science, pages 19–36. Springer Berlin
tity and load-time integrity of the nodes and software in Heidelberg, 2013.
the system. Their system does not support system integrity [4] S. Bouchenak, G. Chockler, H. Chockler,
monitoring as they assume that all interactions will only be G. Gheorghe, N. Santos, and A. Shraer. Verifying
with other trusted programs running in the Tao environ- cloud services: Present and future. SIGOPS Oper.
ment. Tao relies upon mutual attestation for all communi- Syst. Rev., 47(2):6–19, July 2013.
cating nodes, but is unable to use TPM-based keys because [5] C. Chen, H. Raj, S. Saroiu, and A. Wolman. ctpm: A
they are not fast enough to support mutual attestation. Us- cloud tpm for cross-device trusted applications. In
ing the out-of-band CV, we avoid mutual attestation while 11th USENIX Symposium on Networked Systems
maintaining rapid detection of integrity violations. The Key Design and Implementation (NSDI 14), Seattle, WA,
Server in Tao holds all the secret keys to the system, must Apr. 2014. USENIX Association.
interact with hosts to load new applications, and must be
[6] X. Chen, T. Garfinkel, E. C. Lewis, P. Subrahmanyam,
fully trusted. The Key Server does not offer compatible de-
C. A. Waldspurger, D. Boneh, J. Dwoskin, and D. R.
ployment options for IaaS environments, especially for small
Ports. Overshadow: A virtualization-based approach
tenants who cannot afford secure facilities or hardware se-
to retrofitting protection in commodity operating
curity modules. Furthermore, CloudProxy Tao does not de-
systems. SIGPLAN Not., 43(3):2–13, Mar. 2008.
[7] J. Criswell, N. Dautenhahn, and V. Adve. Virtual Symposium on USENIX Security Symposium, SS’07,
ghost: Protecting applications from hostile operating pages 16:1–16:9, Berkeley, CA, USA, 2007. USENIX
systems. SIGARCH Comput. Archit. News, Association.
42(1):81–96, Feb. 2014. [23] R. Kotla, T. Rodeheffer, I. Roy, P. Stuedi, and
[8] L. Davi, A.-R. Sadeghi, and M. Winandy. Dynamic B. Wester. Pasture: Secure offline data access using
integrity measurement and attestation: towards commodity trusted hardware. In Presented as part of
defense against return-oriented programming attacks. the 10th USENIX Symposium on Operating Systems
In Proceedings of the 2009 ACM workshop on Scalable Design and Implementation (OSDI 12), pages
trusted computing, pages 49–54. ACM, 2009. 321–334, Hollywood, CA, 2012. USENIX.
[9] A. Dinaburg, P. Royal, M. Sharif, and W. Lee. Ether: [24] P. A. Loscocco, P. W. Wilson, J. A. Pendergrass, and
Malware analysis via hardware virtualization C. D. McDonell. Linux kernel integrity measurement
extensions. In Proceedings of the 15th ACM using contextual inspection. In Proceedings of the 2007
Conference on Computer and Communications ACM Workshop on Scalable Trusted Computing, STC
Security, CCS ’08, pages 51–62, New York, NY, USA, ’07, pages 21–29, New York, NY, USA, 2007. ACM.
2008. ACM. [25] J. Manferdelli, T. Roeder, and F. Schneider. The
[10] M. Fioravante and D. D. Graaf. vTPM. cloudproxy tao for trusted computing. Technical
http://xenbits.xen.org/docs/unstable/misc/vtpm.txt, Report UCB/EECS-2013-135, EECS Department,
November 2012. University of California, Berkeley, Jul 2013.
[11] D. D. Graaf and Q. Xu. vTPM manager. [26] P. Maniatis, D. Akhawe, K. Fall, E. Shi, S. McCamant,
http://xenbits.xen.org/docs/unstable/misc/vtpmmgr.txt. and D. Song. Do you know where your data are?:
[12] A. Haeberlen, P. Aditya, R. Rodrigues, and Secure data capsules for deployable data protection. In
P. Druschel. Accountable virtual machines. In Proceedings of the 13th USENIX Conference on Hot
Proceedings of the 9th USENIX Conference on Topics in Operating Systems, HotOS’13, pages 22–22,
Operating Systems Design and Implementation, Berkeley, CA, USA, 2011. USENIX Association.
OSDI’10, pages 1–16, Berkeley, CA, USA, 2010. [27] D. A. McGrew and J. Viega. The galois/counter mode
USENIX Association. of operation (gcm). NIST, 2005.
[13] J. Hennessey, C. Hill, I. Denhardt, V. Venugopal, [28] T. Moyer, K. Butler, J. Schiffman, P. McDaniel, and
G. Silvis, O. Krieger, and P. Desnoyers. Hardware as a T. Jaeger. Scalable web content attestation. IEEE
service - enabling dynamic, user-level bare metal Transactions on Computers, Mar 2011.
provisioning of pools of data center resources. In 2014 [29] S. Munetoh. GRUB TCG Patch to Support Trusted
IEEE High Performance Extreme Computing Boot. http://trousers.sourceforge.net/grub.html.
Conference, Waltham, MA, USA, Sept. 2014. [30] R. Sailer, X. Zhang, T. Jaeger, and L. van Doorn.
[14] P. Hintjens. Curvezmq authentication and encryption Design and implementation of a tcg-based integrity
protocol. http://rfc.zeromq.org/spec:26, 2013. measurement architecture. In Proceedings of the 13th
[15] O. S. Hofmann, S. Kim, A. M. Dunn, M. Z. Lee, and Conference on USENIX Security Symposium - Volume
E. Witchel. Inktag: Secure applications on an 13, SSYM’04, pages 16–16, Berkeley, CA, USA, 2004.
untrusted operating system. SIGPLAN Not., USENIX Association.
48(4):265–278, Mar. 2013. [31] N. Santos, R. Rodrigues, K. P. Gummadi, and
[16] W. Huang, A. Ganjali, B. H. Kim, S. Oh, and D. Lie. S. Saroiu. Policy-sealed data: A new abstraction for
The state of public infrastructure-as-a-service cloud building trusted cloud services. In Presented as part of
security. ACM Comput. Surv., 47(4):68:1–68:31, June the 21st USENIX Security Symposium (USENIX
2015. Security 12), pages 175–188, Bellevue, WA, 2012.
[17] IBM. Ibm and intel bring new security features to the USENIX.
cloud. http://www.softlayer.com/press/ [32] J. Schiffman, T. Moyer, C. Shal, T. Jaeger, and
ibm-and-intel-bring-new-security-features-cloud, P. McDaniel. Justifying integrity using a virtual
September 2004. machine verifier. In Computer Security Applications
[18] IBM. Software trusted platform module. Conference, 2009. ACSAC ’09. Annual, pages 83–92,
http://sourceforge.net/projects/ibmswtpm/, April Dec 2009.
2014. [33] J. Schiffman, T. Moyer, H. Vijayakumar, T. Jaeger,
[19] Intel. Intel Trusted Boot (tboot). and P. McDaniel. Seeding clouds with trust anchors.
https://software.intel.com/en-us/articles/ In Proceedings of the 2010 ACM Workshop on Cloud
intel-trusted-execution-technology. Computing Security Workshop, CCSW ’10, pages
[20] Intel. Cloud integrity technology. http://www.intel. 43–46, New York, NY, USA, 2010. ACM.
com/p/en US/support/highlights/sftwr-prod/cit, [34] J. Schiffman, Y. Sun, H. Vijayakumar, and T. Jaeger.
2015. Cloud verifier: Verifiable auditing service for iaas
[21] T. Jaeger, R. Sailer, and U. Shankar. Prima: clouds. In Services (SERVICES), 2013 IEEE Ninth
Policy-reduced integrity measurement architecture. In World Congress on, pages 239–246, June 2013.
Proceedings of the Eleventh ACM Symposium on [35] J. Schiffman, H. Vijayakumar, and T. Jaeger.
Access Control Models and Technologies, SACMAT Verifying system integrity by proxy. In Proceedings of
’06, pages 19–28, New York, NY, USA, 2006. ACM. the 5th International Conference on Trust and
[22] B. Kauer. Oslo: Improving the security of trusted Trustworthy Computing, TRUST’12, pages 179–200,
computing. In Proceedings of 16th USENIX Security Berlin, Heidelberg, 2012. Springer-Verlag.
[36] A. Segall. Using the tpm: Machine authentication and
attestation. http://opensecuritytraining.info/
IntroToTrustedComputing files/Day2-1-auth-and-att.
pdf, Oct 2012.
[37] E. G. Sirer, W. de Bruijn, P. Reynolds, A. Shieh,
K. Walsh, D. Williams, and F. B. Schneider. Logical
attestation: An authorization architecture for
trustworthy computing. In Proceedings of the
Twenty-Third ACM Symposium on Operating Systems
Principles, SOSP ’11, pages 249–264, New York, NY,
USA, 2011. ACM.
[38] Trusted Computing Group. TCG Infrastructure
Working Group A CMC Profile for AIK Certificate
Enrollment. http:
//www.trustedcomputinggroup.org/files/resource
files/738DF0BB-1A4B-B294-D0AF6AF9CC023163/
IWG CMC Profile Cert Enrollment v1 r7.pdf.
[39] Trusted Computing Group. Trusted Network
Communications.
http://www.trustedcomputinggroup.org/developers/
trusted network communications.
[40] Trusted Computing Group. Trusted Platform Module.
http://www.trustedcomputinggroup.org/developers/
trusted platform module.
[41] Trusted Computing Group. Virtualized Platform.
http://www.trustedcomputinggroup.org/developers/
virtualized platform.

You might also like