Tci Acm PDF
Tci Acm PDF
ABSTRACT                                                                                    1.     INTRODUCTION
Today’s infrastructure as a service (IaaS) cloud environ-                                      The proliferation and popularity of infrastructure-as-a-
ments rely upon full trust in the provider to secure appli-                                 service (IaaS) cloud computing services such as Amazon
cations and data. Cloud providers do not offer the ability                                  Web Services and Google Compute Engine means more cloud
to create hardware-rooted cryptographic identities for IaaS                                 tenants are hosting sensitive, private, and business critical
cloud resources or sufficient information to verify the in-                                 data and applications in the cloud. Unfortunately, IaaS
tegrity of systems. Trusted computing protocols and hard-                                   cloud service providers do not currently furnish the build-
ware like the TPM have long promised a solution to this                                     ing blocks necessary to establish a trusted environment for
problem. However, these technologies have not seen broad                                    hosting these sensitive resources. Tenants have limited abil-
adoption because of their complexity of implementation, low                                 ity to verify the underlying platform when they deploy to
performance, and lack of compatibility with virtualized en-                                 the cloud and to ensure that the platform remains in a good
vironments. In this paper we introduce keylime, a scal-                                     state for the duration of their computation. Additionally,
able trusted cloud key management system. keylime pro-                                      current practices restrict tenants’ ability to establish unique,
vides an end-to-end solution for both bootstrapping hard-                                   unforgeable identities for individual nodes that are tied to a
ware rooted cryptographic identities for IaaS nodes and for                                 hardware root of trust. Often, identity is based solely on a
system integrity monitoring of those nodes via periodic at-                                 software-based cryptographic solution or unverifiable trust
testation. We support these functions in both bare-metal                                    in the provider. For example, tenants often pass unprotected
and virtualized IaaS environments using a virtual TPM.                                      secrets to their IaaS nodes via the cloud provider.
keylime provides a clean interface that allows higher level                                    Commodity trusted hardware, like the Trusted Platform
security services like disk encryption or configuration man-                                Module (TPM) [40], has long been proposed as the solution
agement to leverage trusted computing without being trusted                                 for bootstrapping trust, enabling the detection of changes to
computing aware. We show that our bootstrapping proto-                                      system state that might indicate compromise, and establish-
col can derive a key in less than two seconds, we can detect                                ing cryptographic identities. Unfortunately, TPMs have not
system integrity violations in as little as 110ms, and that                                 been widely deployed in IaaS cloud environments due to a
keylime can scale to thousands of IaaS cloud nodes.                                         variety of challenges. First, the TPM and related standards
                                                                                            for its use are complex and difficult to implement. Second,
∗This material is based upon work supported by the As-
                                                                                            since the TPM is a cryptographic co-processor and not an
sistant Secretary of Defense for Research and Engineering                                   accelerator, it can introduce substantial performance bottle-
under Air Force Contract No. FA8721-05-C-0002 and/or                                        necks (e.g., 500+ms to generate a single digital signature).
FA8702-15-D-0001. Any opinions, findings, conclusions or
recommendations expressed in this material are those of the                                 Lastly, the TPM is a physical device by design and most
author(s) and do not necessarily reflect the views of the As-                               IaaS services rely upon virtualization, which purposefully
sistant Secretary of Defense for Research and Engineering.                                  divorces cloud nodes from the hardware on which they run.
c 2016 Massachusetts Institute of Technology. Delivered                                    At best, the limitation to physical platforms means that
to the U.S. Government with Unlimited Rights, as defined                                    only the cloud provider would have access to the trusted
in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwith-                                     hardware, not the tenants [17, 20, 31]. The Xen hypervisor
standing any copyright notice, U.S. Government rights in
this work are defined by DFARS 252.227-7013 or DFARS                                        includes a virtualized TPM implementation that links its se-
252.227-7014 as detailed above. Use of this work other than                                 curity to a physical TPM [2, 10], but protocols to make use
as specifically authorized by the U.S. Government may vio-                                  of the vTPM in an IaaS environment do not exist.
late any copyrights that exist in this work.                                                   To address these challenges we identify the following de-
ACM acknowledges that this contribution was authored or co-authored by an em-
                                                                                            sirable features of an IaaS trusted computing system:
ployee, or contractor of the national government. As such, the Government retains
a nonexclusive, royalty-free right to publish or reproduce this article, or to allow oth-        • Secure Bootstrapping – the system should enable
ers to do so, for Government purposes only. Permission to make digital or hard copies              the tenant to securely install an initial root secret into
for personal or classroom use is granted. Copies must bear this notice and the full ci-
tation on the first page. Copyrights for components of this work owned by others than
                                                                                                   each cloud node. This is typically the node’s long term
ACM must be honored. To copy otherwise, distribute, republish, or post, requires prior             cryptographic identity and the tenant chains other se-
specific permission and/or a fee. Request permissions from permissions@acm.org.                    crets to it to enable secure services.
ACSAC ’16, December 05-09, 2016, Los Angeles, CA, USA
c 2016 ACM. ISBN 978-1-4503-4771-6/16/12. . . $15.00                                             • System Integrity Monitoring – the system should
DOI: http://dx.doi.org/10.1145/2991079.2991104                                                     allow the tenant to monitor cloud nodes as they oper-
      ate and react to integrity deviations within one second.
                                                                                  Software-based Cryptographic Services
                                                                    ID key                                                 Software
    • Secure Layering (Virtualization Support) – the              revoked?                                                 ID Keys
                                        Certificate authority
                                                                                                    Bootstrap Key
                                                                                                      derivation
              Software CA                                              Cloud Node VM
              (Cloud Node VM)                 Trust                                                                                                Tenant CV                               Signed
                                                                                                                                                                                          whitelists
                                                                        unwrap                                                                                              AIK good?
                                                            Identity             Bootstrap
                Revocation                                    Key                 Key Kb
                 Service         vTPM                                                        vTPM          Virtu
                                                                                                                 al    Enro
                                                                                                                                                               vAIK good?
                                                                                                                                 llmen
                                                                                                                                            t
                                                                                               Deep Quote
                                                                                                                                                Tenant Registrar
                    Hypervisor                                           Hypervisor
                                 TPM                                                         TPM                Enro
                                                                                                                       llme
                                                                                                                                                                AIK good?
                                                                                                                                nt
                                                                                                    Bo
                                                                                                       u   nd                                                                   Provider Whitelist
                                                                                                                at                              Provider Registrar
                                                                                                                     ma
                                                                                                                          nu                                                        Authority
                                                                                                                               fac
                                                                                                                                   tur
                                                                                                                                       in   g
                                                                                                                                                                              TPM Good?
                                                                                                                                                 TPM / Platform
                                                                                                                                                  Manufacturer
bootstrap key derivation protocol. Once established, stan-                                                           of virtual machines that can be hosted on a single modern
dard tools and services like IPsec or Puppet can now directly                                                        system. As described by Berger et al. [2] and as implemented
use the software CA identity credentials that each node now                                                          in Xen [10], we utilize a virtualized implementation of the
possesses.                                                                                                           TPM. Each VM has its own software TPM (called a vTPM)
   To complete the linkage between the trusted computing                                                             whose trust is in turn rooted in the hardware TPM of the
services and the software identity keys, we need a mechanism                                                         hosting system. The vTPM is isolated from the guest that
to revoke keys in the software PKI when integrity violations                                                         use it, by running in a separate Xen domain.
occur in the trusted computing layer. The CV is responsible                                                             The vTPM interface is the same as a hardware TPM. The
for notifying the software CA of these violations. The CV                                                            only exception to this, is that the client can request a deep-
includes metadata about the nature of the integrity viola-                                                           quote 12 that will get a quote from the hardware TPM in
tion, which allows the software CA to have a response policy.                                                        addition to getting a quote from the vTPM. These quotes
The software CA supports standardized methods for certifi-                                                           are linked together by including a hash of the vTPM quote
cate revocation like signed revocation lists or by hosting an                                                        and nonce in the hardware TPM quote’s nonce. Deep quotes
OCSP responder. To support push notifications of failures,                                                           suffer from the slow performance of hardware TPMs, but as
the software CA can also publish signed notifications to a                                                           we’ll show in later this section, we can limit the use of deep
message bus. This way services that directly support revo-                                                           quotes while still maintaining reasonable performance and
cation actions can subscribe to notifications (e.g., to trigger                                                      scale and maintaining security guarantees.
a re-key in a secret manager like Vault).                                                                               To assure a chain of trust that is rooted in hardware, we
                                                                                                                     need the IaaS provider to replicate some of the trusted com-
 3.2.1      Layering Trust                                                                                           puting service infrastructure in their environment and allow
   We next expand this architecture to work across the layers                                                        the tenant trusted computing services to query it. Specif-
of virtualization common in today’s IaaS environments. Our                                                           ically, the provider must establish a registrar for their in-
goal is to create the architecture described previously that                                                         frastructure, must publish an up-to-date signed list of the
cleanly links common security services to a trusted com-                                                             integrity measurements of their infrastructure (hosted by a
puting layer in a cloud tenant’s environment. Thus, in a                                                             whitelist authority service), and may even have their own
VM hosting environment like Amazon EC2 or OpenStack,                                                                 CV. The tenant CV will interact with the whitelist author-
we aim to create trusted computing enabled software CAs                                                              ity service and the provider’s registrar to verify deep quotes
and tenant nodes inside of virtual machine instances. Note                                                           collected by the infrastructure.
that, in a bare-metal provisioning environment like IBM                                                                 Despite the fact that most major IaaS providers run closed-
Softlayer10 , HaaS [13], or OpenStack Ironic11 , we can di-                                                          source hypervisors and would provide opaque integrity mea-
rectly utilize the simplified architecture where there is no                                                         surements [16], we find there is still value in verifying the
trust layering.                                                                                                      integrity of the provider’s services. By providing a known-
   We observe that IaaS-based virtual machines or physical                                                           good list of integrity measurements, the provider is com-
hosts all provide a common abstraction of isolated execu-                                                            mitting to a version of the hypervisor that will be deployed
tion. Each form of isolated execution in turn needs a root of                                                        widely across their infrastructure. This prevents a targeted
trust on which to build trusted computing services. Due to                                                           attack where the a single hypervisor is replaced with a mali-
the performance and resource limitations of typical TPMs                                                             cious version designed to spy on the tenant (e.g., the provider
(e.g., taking 500 or more milliseconds to generate a quote,                                                          is coerced by a government to monitor a specific cloud ten-
and only supporting a fixed number of PCRs), direct multi-
plexed use of the physical TPM will not scale to the numbers                                                     12
                                                                                                                      We use similar notation for quotes as we do for deep quotes,
10
                                                                                                                     DeepQuoteAIK,vAIK (nonce, P CRi : di , vP CRj : dj ), ex-
     http://www.softlayer.com                                                                                        cept that PCRs may be from both physical and virtual sets
11
     https://wiki.openstack.org/wiki/Ironic                                                                          of PCRS. We use virtual PCR #16 to bind data.
                                          Tenant             Provider     to the tenant registrar. The tenant registrar then returns
 Node
          ID,vAIKpub,vEKpub
       Registrar
          Registrar
                                                                          EncvEK (H(AIKpub ), Ke ) without additional checks. The
               EncvEK(H(AIKpub),Ke)
              Server	TLS	
                                                                          virtual node then decrypts Ke using ActivateIdentity func-
                                                                          tion of its vTPM. The node next requests a deep quote using
             DeepQuoteAIK,vAIK(H(Ke),
                                    a hash of Ke as the nonce to both demonstrate the freshness
            v16:H(ID,vAIKpub,vEKpub))
                ID
                 of the quote and knowledge of Ke to the tenant registrar.
                       OK
                            AIK
                It also uses virtual PCR #16 to bind the vTPM credentials
                                                                          and ID to the deep quote. Upon receiving the deep quote,
    Figure 4: Virtual node registration protocol.                         the tenant registrar asks the provider registrar if the AIK
                                                                          from the deep quote is valid. The tenant registrar also re-
                                                                          quests the latest valid integrity measurement whitelists from
                                                                          the provider. Now the tenant registrar can check the va-
ant). Thus, an attacker must subvert both the code loading
                                                                          lidity of the deep quote’s signature, ensure that the nonce
process on all the hypervisors and the publishing and signing
                                                                          is H(Ke ), confirm the binding data in PCR #16 matches
process for known-good measurements. In our semitrusted
                                                                          what was provided in the previous step, and check the phys-
threat model, we assume the provider has controls and mon-
                                                                          ical PCRs values in the deep quote against the provider’s
itoring which limit the ability of a rogue individual to ac-
                                                                          whitelist. Only if the deep quote is valid will the tenant
complish this.
                                                                          registrar mark the vAIK as being valid.
   As in Section 3.2, we begin with the establishment of a
                                                                             When considering the cost of performing a deep quote,
tenant registrar and cloud verifier. There are multiple op-
                                                                          the provider must carefully consider the additional latency
tions for hosting these services securely: 1) in a bare metal
                                                                          of the physical TPM. Deep quotes provide a link between
IaaS instance with TPM, 2) on-tenant-premises in tenant-
                                                                          the vTPM and the physical TPM of the machine, and new
owned hardware, 3) in a small trusted hardware appliance
                                                                          enrollments should always include deep quotes. When con-
deployed to the IaaS cloud provider, and 4) in an IaaS vir-
                                                                          sidering if deep quotes should be used as part of periodic
tual machine. The first three of these options rely upon
                                                                          attestation, we must understand what trusted computing
the architecture and protocols we’ve already discussed. The
                                                                          infrastructure the provider has deployed. If the provider is
last option requires the tenant to establish an on-tenant-
                                                                          doing load time integrity only (e.g., secure boot), then deep
premises CV and use that to bootstrap the tenant registrar
                                                                          quotes will only reflect the one-time binding at boot between
and CV. This on-tenant-premises CV identifies and checks
                                                                          the vTPM and the physical TPM and the security of the
the integrity of the tenant’s virtualized registrar and CV,
                                                                          vTPM infrastructure. If the provider has runtime integrity
who then in turn are responsible for the rest of the tenant’s
                                                                          checking of their infrastructure, there is value in the ten-
virtualized infrastructure.
                                                                          ant performing periodic attestation using deep quotes. In
   The primary motivations for a tenant choosing between
                                                                          the optimal deployment scenario, the provider can deploy
these options are the detection latency for integrity viola-
                                                                          keylime and provide tenants with access to the integrity
tions, scale of IaaS instances in their environment, band-
                                                                          state of the hypervisors that host tenant nodes. To limit
width between the tenant and the IaaS cloud, and cost. Op-
                                                                          the impact of slow hardware TPM operations, the provider
tion 1 provides maximum performance but at higher cost.
                                                                          can utilize techniques like batch attestation where multiple
Option 2 will by limited by bandwidth and requires more
                                                                          deep quote requests from different vTPMs can be combined
costs to maintain resources outside of the cloud. Option 3 is
                                                                          into a single hardware TPM operation [28, 31].
a good trade-off between cost and performance for a small
cloud tenant with only tens of nodes or who can tolerate a
longer detection latency. Finally, Option 4 provides compat-              3.2.2    Key Derivation Protocol
ibility with current cloud operations, good performance and                  We now introduce the details of our bootstrap key deriva-
scalability, and low cost at the expense of increased complex-            tion protocol (Figure 5). The goal of this protocol is for the
ity. In Section 5, we examine the performance trade-offs of               cloud tenant to obtain key agreement with a cloud node they
these options including a low-cost registrar and CV appli-                have provisioned in an IaaS system. The protocol relies upon
ance (Option 3) we implemented on a Raspberry Pi.                         the CV to provide integrity measurement of the cloud node
   Once we have created the tenant registrar and CV, we can               during the protocol. The tenant also directly interacts with
begin securely bootstrapping nodes into the environment.                  the cloud node to demonstrate their intent to spawn that re-
As before, the first node to establish is a virtualized software          source and allow it to decrypt sensitive contents. However,
CA and we do this by creating a private signing key offline               the tenant does not directly perform integrity measurement.
and protecting it with a key that will be derived by the                  This separation of duties is beneficial because the attesta-
bootstrap key derivation protocol. The following process                  tion protocols may operate in parallel and it simplifies de-
will be the same for all tenant cloud nodes. When a node                  ployment by centralizing all integrity measurement, white
boots, it will get a vTPM from the IaaS provider.                         lists, and policy in the CV.
   The process of enrolling a vTPM into the tenant regis-                    To begin the process, the tenant generates a fresh random
trar needs to securely associate the vTPM credentials, e.g.,              symmetric encryption key Kb . The cloud tenant uses AES-
(vEK, vAIK), with a physical TPM in the provider’s infras-                GCM to encrypt the sensitive data to pass to the node d with
tructure (see Figure 4). The tenant registrar cannot directly             Kb , denoted EncKb (d). The tenant then performs trivial
verify the authenticity of the vEK because it is virtual and              secret sharing to split Kb into two parts U , which the tenant
has no manufacturer. To address this, we use a deep quote to              will retain and pass directly to the cloud node and V , which
bind the virtual TPM credentials to a physical TPM AIK.                   the tenant will share with the CV to provide to the node
   The vTPM enrollment protocol begins like the physical                  upon successful verification of the node’s integrity state. To
TPM enrollment protocol by sending ID, (EKpub , AIKpub )                  obtain these shares the tenant generates a secure random
                 Tenant
                     Cloud Veri0ier
                               Node
   Registrar
                                                                           Valid AIK?
                                                                     EncNK(V)
                           C
                         nt 
                                       QuoteAIK(nt,16:H(NKpub))),NKpub
value V the same length as Kb and computes U = Kb ⊕ V .                           the TPM thereby authenticating N K. The initiator can
   In the next phase of the protocol, the tenant requests the                     then encrypt its share of Kb using N Kpub and securely re-
IaaS provider to instantiate a new resource (i.e., a new vir-                     turn it to the cloud node.
tual machine). The tenant sends EncKb (d) to the provider                            The differences in the attestation protocol between CV
as part of the resource creation. The data d may be configu-                      and tenant arise in how each validates TPM quotes. Be-
ration metadata like a cloud-init script13 . Upon creation,                       cause we wish to centralize the adjudication of integrity
the provider returns a unique identifier for the node uuid                        measurements to the CV, the TPM quote that the tenant
and an IP address at which the tenant can reach the node.                         requests only verifies the identity of the cloud node’s TPM
   After obtaining the node uuid and IP address, the tenant                       and doesn’t include any PCR hashes. Since the tenant gen-
notifies the CV of their intent to boot a cloud node (see area                    erates a fresh Kb for each cloud node, we are not concerned
A in Figure 5). The tenant connects to the CV over a secure                       with leaking U to a node with invalid integrity state. Fur-
channel, such as mutually authenticated TLS, and provides                         thermore, because V is only one share of Kb , the CV cannot
v, uuid, node IP, and a TPM policy. The TPM policy speci-                         be subverted to decrypt resources without user intent.
fies a white list of acceptable PCR values to expect from the                        We now describe the attestation protocol in detail. The
TPM of the cloud node. At this point the CV and tenant                            initiator first sends a fresh nonce (nt for the tenant as in B
can begin the attestation protocol in parallel.                                   from Figure 5 and nCV for the cloud verifier as in C from
   The attestation protocol of our scheme is shared between                       Figure 5) to the cloud node along with a mask indicating
the interactions of the CV and the cloud node (B) and that                        which PCRs the cloud node should include in its quote. The
of the tenant and the cloud node (C) with only minor dif-                         CV sets the mask based on TPM policy exchanged earlier
ferences between them (Figure 5). The protocol consists                           and the tenant creates an empty mask. We extend a hash of
of two message volleys the first for the initiator (either CV                     N Kpub into a freshly reset PCR #16. The initiator requests
or tenant) to request a TPM quote and the second for the                          a quote from the TPM with the given PCR mask. The node
initiator to provide a share of Kb to the cloud node upon                         then returns QuoteAIK (n, 16 : H(N Kpub ), xi , : yi ), N Kpub
successful validation of the quote. Since we use this pro-                        to the initiator. Additional PCR numbers xi and values yi
tocol to bootstrap keys into the system, there are no ex-                         are only included in the quote returned to the cloud verifier
isting software keys with which we create a secure channel.                       based on the TPM policy it requested. During the protocol
Thus, this protocol must securely transmit a share of Kb                          to provide U , the tenant also supplies HM ACKb (ID) to
over an untrusted network. We accomplish this by creat-                           the node. This provides the node with a quick check to
ing an ephemeral asymmetric key pair on the node, denoted                         determine if Kb is correct.
N K, outside of the TPM14 . As in Section 3.2.1, we use PCR                          The initiator then confirms that the AIK is valid accord-
#16’s value in a TPM quote to bind N K to the identity of                         ing to the tenant registrar over server authenticated TLS. If
                                                                                  the initiator is the CV, then it will also check the other PCRs
13
   Because Kb may not be re-used in our protocol, the cost                        to ensure they are valid according to the tenant-specified
 of re-encrypting large disk images for each node may be                          whitelist. If the node is virtual, then the quote to the CV
 prohibitive. We advocate for encrypting small sensitive data                     will also include a deep quote of the underlying hardware
 packets like a cloud-init script, and then establish local
 storage encryption with ephemeral keys.                                          TPM. The CV will in turn validate it as described in the
14
   N K could also be generated and reside inside the TPM.                         previous section. Upon successful verification, the initiator
 However, since it is ephemeral, is only used for transport                       can then return their share of Kb . Thus, the tenant sends
 security and it is authenticated by the TPM using the quote,                     EncN K (U ) and the cloud verifier sends EncN K (V ) to the
 we found the added complexity of also storing it in the TPM                      node. The cloud node can now recover Kb and proceed with
 unneeded.
the boot/startup process.                                         node has correctly derived Kb , it mounts a small in-memory
  The cloud node does not retain Kb or V after decryption         file system using tmpfs and writes the key there for other
of d. To support node reboot or migration, the cloud node         applications to access.
stores U in the TPM NVRAM to avoid needing the tenant
to interact again. After rebooting, the node must again           4.1   Integration
request verification by the CV to obtain V and re-derive             While the key derivation protocol of keylime is generic
Kb . If migration is allowed, the provider must take care to      and can be used to decrypt arbitrary data, we believe the
also securely migrate vTPM state to avoid losing U .              most natural cloud use-case for it is to decrypt a small IaaS
                                                                  node-specific package of data. To enable this use-case we
4.   IMPLEMENTATION                                               have integrated keylime with the cloud-init package, the
   We implemented keylime in approximately 5,000 lines of         combination we call trusted-cloud-init. As described in
Python in four components: registrar, node, CV, and ten-          Section 2, cloud-init is widely adopted mechanism to de-
ant. We use the IBM Software Trusted Platform module              ploy machine-specific data to IaaS resources. To integrate
library [18] to directly interact with the TPM rather than        keylime and cloud-init, we patched cloud-init to sup-
going through a Trusted Software Stack (TSS) like Trousers.       port AES-GCM decryption of the user-data (where cloud-
The registrar presents a REST-based web service for en-           init stores tenant scripts and data). We modified the up-
rolling node AIKs. It also supports a query interface for         start system in Ubuntu Linux to start the keylime node
checking the keys for a given node UUID. The registrar use        service before cloud-init. We then configure cloud-init
HMAC-SHA384 to check the node’s knowledge of Ke during            to find the key that keylime creates in the tmpfs mounted
registration.                                                     file system. After successful decryption, cloud-init deletes
   The node component runs on the IaaS machine, VM, or            the key and scrubs it from memory.
container and is responsible for responding to requests for          To support applications that need node identities that do
quotes and for accepting shares of the bootstrap key Kb . It      not manage their own PKIs, we implemented a simple soft-
provides an unencrypted REST-based web service for these          ware CA. The tenant provisions the software CA by creating
two functions.                                                    the CA private key offline and delivering it to a new node
   To support vTPM operations, we created a service the           using trusted-cloud-init. We also deliver certificates to
IaaS provider runs to manage hardware TPM activation and          the software CA that allow it and the tenant to mutually au-
vTPM creation/association. This service runs in a desig-          thenticate each other via trusted-cloud-init. To demon-
nated Xen domain and has privileges to interact with the          strate the clean separation between the trusted computing
Xen vtpmmgr domain [11]. We then implemented a utility            layer and the software key management layer, we use the
for the deep quote operation. Since the Xen vTPM imple-           ZMQ Curve secure channel implementation [14]. This sys-
mentation does not directly return the PCR values from the        tem uses an elliptic curve cryptography scheme dissimilar
virtual TPM (i.e., the shallow quote) during a deep quote,        from the cryptographic algorithms, keys, and other tech-
we chose to first execute a shallow quote, hash its contents      niques the TPM uses.
with the provided nonce, and place them in the nonce field           To enroll a new node, the tenant first generates a node
of the deep quote. This operation cryptographically binds         identity key pair using the software CA client. The soft-
them together. This operation is not vulnerable to man-in-        ware CA supports a plugin architecture that allows the ten-
the-middle attack since there is no other interface to directly   ant to specify what type of key pairs to create (e.g., X.509
manipulate the nonce of a deep quote [36]. We then return         RSA 2048). The tenant then connects securely to the soft-
both the shallow and deep quotes and require the verifier         ware CA over ZMQ and gets the node’s identity certificate
checks both signatures and sets of PCR values.                    signed. The tenant can now provision a new node with this
   The cloud verifier hosts a TLS-protected REST-based web        identity using trusted-cloud-init. The software CA also
service for control. Tenants add and remove nodes to be           supports receiving notifications from the CV if a node later
verified and also query their current integrity state. Upon       fails integrity measurement.
being notified of a new node, the CV enqueues metadata               To support transparent integration with an IaaS plat-
about the node onto the quote_request queue where a con-          form, we patched OpenStack Nova and libvirt to support
figurable pool of worker processes will then request a deep       the creation of companion vTPM Xen domains for each
quote from the node. Upon successful verification of the          user created instance. We link the OpenStack UUID to the
quote, the CV will use an HTTP POST to send V to the              keylime provider registrar. We then implemented a wrap-
node. The CV uses PKCS#1 OAEP and with RSA 2048                   per for the OpenStack Nova command line interface that
keys to protect shares of Kb in transit.                          enables trusted-cloud-init. Specifically, our wrapper in-
   The tenant generates a random 256-bit AES key and en-          tercepts calls to nova boot and automatically encrypts the
crypts and authenticates the bootstrap data using AES with        provided user-data before passing it to OpenStack. It then
Galois Counter Mode [27]. The tenant uses trivial XOR-            calls the keylime tenant, which begins the bootstrapping
based secret sharing to split Kb into V and U . The tenant        protocol. This allows OpenStack users to transparently use
executes a simplified version of the same protocol that the       keylime and cloud-init without needing to fully trust the
CV uses. The tenant checks with the registrar to determine        OpenStack provider not to tamper or steal the sensitive con-
if the quote signing AIK is valid and owned by the tenant.        tents of their user-data.
   Upon receiving U and V , the node can then combine them
to derive Kb . To limit the impact of rogue CVs or tenants        4.2   Demonstration Applications
connecting to the node’s unauthenticated REST interface,            We next describe how keylime can securely bootstrap
the node stores all received U and V values and iteratively       and handle revocation for existing non-trusted computing-
tries each combination to find the correct Kb . Once the          aware applications and services common to IaaS cloud de-
ployments.
   IPsec To enable secure network connectivity similar to         Table 2: Average TPM Operation Latency (ms).
TNC [39], we implemented trusted-cloud-init scripts to                             TPM vTPM Deep quote
automatically encrypt all network traffic between a tenant’s       Create Quote     725  68.5       1390
IaaS resources. The scripts use the OpenStack API for IP           Check Quote      4.64 4.64        5.33
address information and then build configurations for the
Linux IPsec stack and raccoon15 . This configuration is also
easily extensible to a TLS-based VPN like OpenVPN16 .            kernel. We collected both vTPM quote and deep quote mea-
   Puppet To enable secure system configuration manage-          surements from a domain running on Xen. As expected, op-
ment, we integrated keylime with Puppet We do so by gen-         erations that require interaction with the physical TPM are
erating the signed RSA keys that Puppet uses to communi-         slow. Verification times, even for deep quotes that include
cate with the Puppet master using the Software CA process        two RSA signature verifications, are comparatively quick.
described previously. These steps bypass the need to either
use the insecure autosign option in the Puppet master to         5.2   Key Derivation Protocol
blindly accept new nodes or to have an operator manually            We next investigate the CV latency of different phases
approve/deny certificate signing requests from new nodes.        of our protocol. In Figures 6 and 7, we show the aver-
To support continuous attestation and integrity measure-         aged results of hundreds of trials of the CV with 100 vTPM
ment, we implemented a plug-in for the CV that notifies the      equipped VMs. Each operation includes a full REST in-
tenant’s Puppet master when a node fails its integrity mea-      teraction along with the relevant TPM and cryptographic
surements. The master can then revoke that node’s access         operations. We also benchmarked the latency of the pro-
to check-in and receive the latest configuration data.           tocol phases emulating zero latency from the TPM (Null
   Vault While tools like Puppet are often used to provi-        TPM). This demonstrates the minimum latency of our CV
sion secrets and keys, tenant operators can instead use a        software architecture including the time required to verify
dedicated secret management system that supports the full        quotes. The results from the Null TPM trials indicate that
lifecycle of cryptographic keys directly. To demonstrate this,   our network protocol and other processing impose very little
we have integrated keylime with Vault, a cloud-compatible        additional overhead, even on the relatively modestly pow-
secret manager. Like Puppet, we use the Software CA to           ered Raspberry Pi. The bare metal system had a slightly
provision RSA certificates for each node and configure Vault     larger network RTT to the nodes it was verifying causing it
to use them. We also implemented a revocation plugin for         to have a higher latency than the less powerful m1.large.
the CV that notifies Vault to both revoke access to a node          In Figure 7, we see that latency for the quote retrieval
that fails integrity measurement and to re-generate and re-      process is primarily affected by slow TPM operations and is
distribute any keys to which that node had access.               comparable to prior work [31]. The bootstrapping latency
   LUKS Finally, to demonstrate our ability to provision         is the sum of the latencies for retrieving a quote and provid-
secrets instead of cryptographic identities, we implemented a    ing V. We find the bootstrapping latency for bare metal and
trusted-clout-init script that provides the key to unlock        virtual machines to be approximately 793ms and 1555ms re-
an encrypted volume on boot.                                     spectively. Virtual nodes doing runtime integrity measure-
                                                                 ment after bootstrapping benefit from much lower latency
5.       EVALUATION                                              for vTPM operations. Thus, for a virtual machine with a
                                                                 vTPM, keylime can detect integrity violations in as little as
  In this section we evaluate the overhead and scalability of
                                                                 110ms. The detection latency for a system with a physical
keylime in a variety of scenarios. We ran our experiments
                                                                 TPM (781ms for our Xen host) is limited by the speed of
on a private OpenStack cluster, a Xen host, and a Raspberry
                                                                 the physical TPM at generating quotes.
Pi. In OpenStack, we used standard instance flavors where
the m1.small has 1 vCPU, 2GB RAM, and a 20GB disk,
and the m1.large has 4 vCPUs, 8GB RAM, an 80GB disk.
                                                                 5.3   Scalability of Cloud Verifier
We used Ubuntu Linux 14.10 as the guest OS in OpenStack             Next we establish the maximum rate at which the CV can
instances. The Xen host had one Xeon E5-2640 CPU with            get and check quotes for sustained integrity measurement.
6 cores at 2.5Ghz, 10Gbit NIC, 64 GB RAM, a WinBond              This will define the trade-off between the number of nodes
TPM, and ran Xen 4.5 on Ubuntu Linux 15.04. The Rasp-            a single CV can handle and the latency between when an
berry Pi 2 had one ARMv7 with 4-cores at 900Mhz, 1GB             integrity violation occurs and the CV detects it. Since the
RAM, 100Mbit NIC, and ran Raspbian 7. We ran each of             CV quote checking process is a simple round robin check of
the following experiments for 1-2 minutes and present aver-      each node, it is easy to parallelize across multiple CVs fur-
ages of the performance we observed.                             ther enhancing scalability. We emulate an arbitrarily large
                                                                 population of real cloud nodes using a fixed number test
5.1       TPM Operations                                         cloud nodes. These test cloud nodes emulate a zero latency
  We first establish a baseline for the performance of TPM       TPM by returning a pre-constructed quote. This way the
operations with the IBM client library, our Python wrap-         test nodes appear like a larger population where the CV will
per code, the Xen virtual TPM, and the physical TPM. We          never have to block for a lengthy TPM operation to com-
benchmarked both TPM quote creation and verification on          plete. We found that around 500 zero latency nodes were
the Xen host (Table 2). We collected the physical TPM mea-       sufficient to achieve the maximum quote checking rate.
surements on the same system with a standard (non-Xen)              We show the average number of quotes verified per second
                                                                 for each of our CV deployment options in Figure 8. Because
15
     http://ipsec-tools.sourceforge.net/                         of our process-based worker pool architecture, the primary
16
     https://openvpn.net/                                        factor affecting CV scalability is the number of cores and
                   60
                                                Bare Metal                                 Bare Metal                                                            2500
                                                                    Latency (ms)
                                                                                   1000                                                                          1500
                   30
                                                                                   800
                                                                                                                                                                 1000
                   20                                                              600
                                                                                   400                                                                           500
                   10
                                                                                   200
                                                                                                                                                                   0
                   0                                                                 0                                                                                             l               ll        i
                                                                                                                                                                               Meta m1.large m1.sma pberry P
                                   Provide V    Null TPM                                   vTPM Quote TPM Quote DeepQuote                                               Bare                     Ras
Figure 6: Latency of keylime boot-                              Figure 7: Latency of TPM opera-                                Figure 8: Maximum CV quote
strapping protocol.                                             tions in bootstrapping protocol.                               checking rate of keylime.