The Serverless Computing Survey: A Technical Primer For Design Architecture
The Serverless Computing Survey: A Technical Primer For Design Architecture
Design Architecture
ZIJUN LI, LINSONG GUO, JIAGAN CHENG, and QUAN CHEN, Shanghai Jiao Tong University
BINGSHENG HE, National University of Singapore
MINYI GUO, Shanghai Jiao Tong University
The development of cloud infrastructures inspires the emergence of cloud-native computing. As the most
promising architecture for deploying microservices, serverless computing has recently attracted more and
more attention in both industry and academia. Due to its inherent scalability and flexibility, serverless com-
puting becomes attractive and more pervasive for ever-growing Internet services. Despite the momentum in
the cloud-native community, the existing challenges and compromises still wait for more advanced research
and solutions to further explore the potential of the serverless computing model. As a contribution to this
knowledge, this article surveys and elaborates the research domains in the serverless context by decoupling
the architecture into four stack layers: Virtualization, Encapsule, System Orchestration, and System Coordi-
nation. Inspired by the security model, we highlight the key implications and limitations of these works in
each layer, and make suggestions for potential challenges to the field of future serverless computing.
CCS Concepts: • Computer systems organization → Cloud computing; n-tier architectures; • Networks
→ Cloud computing; • Theory of computation → Parallel computing models;
Additional Key Words and Phrases: Serverless computing, architecture design, FaaS, Lambda paradigm
1 INTRODUCTION
1.1 Definition of Serverless Computing
Traditional Infrastructure-as-a-Service (IaaS) deployment mode demands a long-term running
server for sustainable service delivery. However, this exclusive allocation needs to retain resources
This work was partially sponsored by the National Natural Science Foundation of China (62022057, 61832006) and a Shang-
hai international science and technology collaboration project (no. 21510713600).
Authors’ addresses: Z. Li, L. Guo, J. Cheng, Q. Chen (corresponding author), and M. Guo (corresponding author), De-
partment of Computer Science and Engineering, Shanghai Jiao Tong University, China; emails: {lzjzx1122, gls1196,
chengjiagan}@sjtu.edu.cn, {chen-quan, guo-my}@cs.sjtu.edu.cn; B. He, Department of Computer Science, National Uni-
versity of Singapore, Singapore; email: hebs@comp.nus.edu.sg.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2022 Association for Computing Machinery.
0360-0300/2022/09-ART220 $15.00
https://doi.org/10.1145/3508360
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:2 Z. Li et al.
regardless of whether the user application is running or not. Consequently, it results in low re-
source utilization in current data centers by only about 10% on average, especially for an online
service with a diurnal pattern. The contradiction attracts the development of a platform-managed
on-demand service model to attain higher resource utilization and lower cloud computing costs.
To this end, serverless computing was put forward, and most large cloud vendors such as Amazon,
Google, Microsoft, IBM, and Alibaba have already offered such elastic computing services.
In the following, we will first review the definition given in Berkeley View [65], and then we
will give a broader definition. We believe that a narrow perception of the Function-as-a-Service
(FaaS)-based serverless model may weaken its advancement. So far, there is no formal definition
of serverless computing. The common acknowledged definitions from Berkeley View [65] are pre-
sented as follows:
• Serverless Computinд = FaaS (Function-as-a-Service) + BaaS (Backend-as-a-Service). One
fallacy is that Serverless is interchangeable with FaaS, which is revealed in a recent inter-
view [78]. To be precise, they both are essential to serverless computing. The FaaS model
enables the function isolation and invocation, whereas Backend-as-a-Service (BaaS) pro-
vides overall backend support for online services.
• In the FaaS model (aka the Lambda paradigm), an application is sliced into functions or
function-level microservices [26, 45, 57, 65, 117, 141]. The function identifier, the language
runtime, the memory limit of one instance, and the function code blob URI (Uniform Re-
source Identifier) together define the existence of a function [94].
• The BaaS covers a wide range of services that any application relies on can be categorized
into it—for example, the cloud storage (Amazon S3 and DynamoDB), the message bus system
for passing (Google Cloud pub/sub), the message notification service (Amazon SNS), and
DevOps tools (Microsoft Azure DevOps).
To depict the serverless computing model, we take the asynchronous invocation in Figure 1 as an
example. The serverless system receives triggered API queries from the users, validates them, and
invokes the functions by creating new sandboxes (aka the cold startup [15, 28, 65]) or reusing run-
ning warm ones (aka the warm startup). The isolation ensures that each function invocation runs
in an individual container or a Virtual Machine (VM) assigned from an access-control controller.
Due to the event-driven and single-event processing nature, the serverless system can be triggered
to provide on-demand isolated instances and scale them horizontally according to the actual ap-
plication workload. Afterward, each execution worker accesses a backend database to save execu-
tion results [23]. By further configuring triggers and bridging interactions, users can customize
the execution for complex applications (e.g., building internal event calls in a {Fn A , Fn B , FnC }
pipeline).
In the broader scenario, we believe that the serverless computing model should be identified
with the following features:
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:3
• Auto-scaling: Auto-scalability should not be only narrowed to the FaaS model (e.g., container
black boxes as scheduling units in OpenWhisk [134]). The indispensable factor in identify-
ing a serverless system is performing horizontal and vertical scaling when accommodat-
ing workload dynamics. Allowing an application to scale the number of instances to zero
also introduces a worrisome challenge—cold startup. When a function experiences the cold
startup, instances need to start from scratch, initialize the software environment, and load
application-specific code. These steps can significantly drag down the service response, lead-
ing to QoS (Quality-of-Service) violations.
• Flexible scheduling: Since the application is no longer bound to a specific server, the server-
less controller dynamically schedules applications according to the resource usage in the
cluster while ensuring load balancing and performance assurances. Moreover, the server-
less platform also takes the multi-region collaboration into account [154]. For a more robust
and available serverless system, flexible scheduling allows the workload queries to be dis-
tributed across a broader range of regions [119]. It avoids serious performance degradation
or damage to the service continuity in case of unavailable or crash nodes.
• Event-driven: The serverless application is triggered by events such as the arrival of RESTful
HTTP queries, the update of a message queue, or new data to a storage service. By binding
events to functions with triggers and rules, the controller and functions can use metadata
encapsulated in context attributes. It makes relationships between events and the system
detectable, enabling different collaboration responses to different events. The Cloud-Native
Computing Foundation (CNCF) serverless group also published CloudEvents specifications
for commonly describing event metadata to provide interoperability.
• Transparent development: On the one hand, managing underlying host resources will no
longer be a bother for application maintainers, as they are agnostic about the execution
environment. Simultaneously, cloud vendors should ensure isolated sandboxes, reliable ex-
ecution environment, available physical nodes, software runtimes, and computing power
while making them transparent to maintainers. On the other hand, serverless computing
should also integrate DevOps tools to help deploy and iterate more efficiently.
• Pay-as-you-go: The serverless billing model shifts the cost of computing power from a capital
expense to an operating expense. This model eliminates the requirement from users to buy
exclusive servers based on the peak load. By sharing network, disk, CPU, memory, and other
resources, the pay-as-you-go model only indicates the resources that applications actually
used [1, 2, 26], no matter whether the instances are running or idle.
We regard an elastic computing model with the preceding five features incorporated as the
key to the definition of serverless computing. Along with the serverless emergence, application
maintainers would find it more attractive that resource pricing is billed based on the actual pro-
cessing events of an application rather than the pre-assigned resources [2]. Today, the server-
less computing is commonly applied in backend scenarios for batch jobs, including data analytics
(e.g., distributed computing model in PyWren [64]), ML (Machine Learning) tasks (e.g., deep learn-
ing) [78, 111], and event-driven web applications.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:4 Z. Li et al.
Fig. 2. General layered implementation of the serverless architecture, and security models (bottom-up logic)
in the Virtualization, Encapsule, and System layers.
will lack high portability and compatibility for various serverless systems. To this end, this survey
is inspired to propose a layered design and summarize the research domains from different views.
It can help researchers and practitioners to further understand the nature of serverless computing.
As shown in Figure 2, we analyze its design architecture with a bottom-up logic and decouple the
serverless computing architecture into four stack layers: Virtualization, Encapsule, System Orches-
tration, and System Coordination. We also abstract the security model in each layer (the System
Orchestration layer and System Coordination layer are merged).
Virtualization layer. The Virtualization layer enables function isolation within a performance
and functionality secured sandbox. The sandbox serves as the runtime for application service
code, runtime environment, dependencies, and system libraries. To prevent access to resources
in the multi-application or multi-tenant scenarios, cloud vendors usually adopt containers/VMs
to achieve isolation. Currently, the popular sandbox technologies are Docker [41], gVisor [49],
Kata [67], Firecracker [3], and Unikernel [86]. The security model answers how to provide reliable
runtime environments for different tenants and guarantee security on the cloud platform. Section 2
introduces these solutions to isolate functions and analyze their pros and cons.
Encapsule layer. Various middlewares in the Encapsule layer enable customized function trig-
gers and executions, as well as collecting data metrics for communicating and monitoring. We call
all these additional middlewares the sidecar. It separates other features from the service business
logic and enables loose coupling between the functions and the underlying platform. Meanwhile,
to speed up instance startup and initialization, the prewarm pool is commonly used in the En-
capsule layer [44, 97, 104, 105, 118, 146]. Serverless systems may use prediction by analyzing the
load pattern to prewarm each by a one-to-one approach, or build a template for all functions to
dynamically install requirements (REQs) according to the runtime characteristics by a one-for-all
approach. The security model resolves privacy concerns by introducing a user-level or system-level
analyzer when loading users’ private requirements. We introduce those concepts in Section 3.
System Orchestration layer. The System Orchestration layer allows users to configure triggers
and bind rules, ensuring the high availability and stability of the user application by dynamically
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:5
adjusting as load changes. Through the cloud orchestrator, the combination of online and offline
scheduling can avoid resource contention, recycle idle resources, and ease the performance degra-
dation for co-located functions. The preceding implementations are also typically integrated into
container orchestration services (e.g., Google Kubernetes and Docker Swarm). However, in the
serverless system, the resource monitor, controller, and load balancer are consolidated to resolve
scheduling challenges [4, 32, 50, 57, 66, 70, 88, 139]. They enable the serverless system to achieve
scheduling optimizations in three different levels: resource-level, instance-level, and application-
level, respectively. The security model deals with robust performance when serverless applications
have more fragmented boundaries. Section 4 analyzes the methodology from three angles.
System Coordination layer. The System Coordination layer consists of a series of BaaS compo-
nents that use unified APIs and SDKs to integrate backend services into functions. Distinctly, it
differs from the traditional middlewares that use local physical services outside the cloud. These
BaaS services provide the storage, queue service [94, 99], trigger binding [75, 77], API gateway,
data cache [6, 7], DevOps tools [24, 25, 63, 122], and other customized components for better meet-
ing the System Orchestration layer’s flexibility requirements. Section 5 discusses these essential
BaaS components in a serverless system.
Each stack layer plays an essential role in the serverless architecture. Therefore, based on the
preceding hierarchy, we conclude the contributions of this survey as follows:
(1) Introduce the serverless definition and summarize the features.
(2) Elaborate the architecture design based on a four-layer hierarchy, and review the significant
and representative works in each layer.
(3) Analyze the security model of each layer based on the four-layered architecture.
(4) Explore the challenges, limitations, and opportunities in serverless computing.
The rest of the survey is organized as follows. Sections 2 through 5 introduce the four stack
layers and elaborate current research domains in serverless computing. Section 6 analyzes several
factors that degrade performance and compares the current production serverless systems. Finally,
we summarize and outline the challenges and opportunities in Sections 7 and 8.
2 VIRTUALIZATION LAYER
A user function invoked in the serverless runtime will be loaded and executed within a virtualized
sandbox. A function can either reuse a warm sandbox or create a new one, but usually not co-run
with different user functions. In this premise, most of the concerns in virtualization are isolation,
flexibility, and low startup latency. The isolation ensures that each application process runs in
the demarcated resource space, and the running process can avoid interference by others. The
flexibility requires the ability to test and debug, and the additional support for extending the system.
Low startup latency requires a fast response for the sandbox creation and initialization. The current
sandboxing mechanism in the Virtualization layer is broken into four representative categories:
traditional VM, container, secure container, and Unikernel. Table 1 compares these mainstream
approaches in several respects.
In the table, “Startup Latency” represents the response latency of cold startup. “Isolation Level”
indicates the capacity of functions running without interference by others. “OSkernel” shows
whether the kernel in GuestOS is shared. “Hotplug” allows the function instance to start with
minimal resources (CPU, memory, virtio blocks) and add additional resources at runtime. “OCI
Supported” means whether it provides the Open Container Initiative (OCI), an open gover-
nance structure for expressing container formats and runtimes. Moreover, “✓” in all survey tables
means that this technique or strategy is used, and vice versa.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:6 Z. Li et al.
Virtualization Startup Latency (ms) Isolation Level OSkernel Hotplug Hypervisor OCI Supported Backed by
Traditional VM >1,000 Strong Unsharing ✓ ✓ /
Docker [41] 50–500 Weak Host-sharing ✓ ✓ Docker
SOCK [101] 10–50 Weak Host-sharing ✓ ✓ /
Hyper-V [58] >1,000 Strong Unsharing ✓ ✓ ✓ Microsoft
gVisor [49] 100–500 Strong Unsharing ✓ ✓ Google
Kata [67] 100–500 Strong Unsharing ✓ ✓ ✓ OpenStack
FireCracker [3] 100–500 Strong Unsharing ✓ ✓ Amazon
Unikernel [86] 10–50 Strong Built-in ✓ Docker
The traditional VM-based isolation adopts a Virtual Machine Manager (VMM) (e.g., hyper-
visor) that provides virtualization capabilities to guests. It can also mediate access to all shared
resources by provided interfaces (or using Qemu/KVM). With snapshots, VM shows high flexibil-
ity in quick failsafe when patch performing on applications within each VM instance. Though VM
provides a strong isolation mechanism and flexibility, it lacks the benefits of lower startup latency
for user applications (usually > 1,000 ms). This tradeoff is fundamental in serverless computing,
where a function is negligible while the relative overhead of VMM and guest kernel is high.
Container customization: Provide high flexibility and performance. Another common
function isolation mechanism in serverless computing is using containers. The container engine
leverages the Linux kernel to isolate resources and create containers as different processes in the
host [19, 92]. Each container shares the host kernel with the read-only attribute, typically includ-
ing binaries and libraries. The high flexibility is also attached to the container with the UnionFS
(Union File System), which enables the combination of the layered container image by read-only
and read-write layers. Essentially, a container achieves the isolation through namespace to en-
able processes sharing the same system kernel and Linux cgroups to set resource limits. Without
hardware isolation, container-based sandboxing shows lower startup latency than coarse-grained
consolidation strategies [11, 147] in hypervisor-based VMs.
The representative container engine is Docker [41]. Docker packages software into a standard-
ized RunC container adapted to the environment requirements, including libraries, system tools,
code, and runtime. The Docker container has been widely employed in various serverless systems
for its lightweight nature. Some works further optimize the container runtime for better adaption
to the application requirements in the serverless system. SOCK [101] proposes an integration solu-
tion for serverless RunC containers, where redundant features in Docker containers are discarded
in this lean container. By only constructing a root file system, creating communication channels,
and imposing isolation boundaries, the SOCK container makes serverless systems run more effi-
ciently in startup latency and throughput. The startup latency of the SOCK container is reduced
to 10 to 50 ms compared with Docker containers that usually take 50 to 500 ms. Unlike condensing
redundance in lean containers, as additional tools (e.g., debuggers, editors, coreutils, shell) enrich
the container and increase the image size, CNTR [130] splits the container image into “fat” and
“slim” parts. A user can independently deploy the “slim” image and expand it with additional tools
by dynamically attaching the “fat” image to the former. The evaluation of CNTR shows that the
proposed mechanism can significantly improve the overall performance and effectively reduce the
image size when extensively applied in the data center.
Secure container: Compromise security with high flexibility and performance. By re-
viewing our security model of the Virtualization layer in Figure 2, security concerns arise for
the relatively low isolation level of containers. Side-channel attacks such as Meltdown [84],
Zombieload [114], and Spectre [72] prompt mitigation approaches toward vulnerabilities. On the
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:7
one hand, container isolation should involve preventing privilege escalation, information, and com-
munication disclosure side channels [3]. On the other hand, The untrusted code from user func-
tions should not allow full access to the host kernel. Any process-based solution must include a
relaxation of the security model for its insufficiency for mutually untrusted functions. It requires
containers to craft function containers and restrict permissions arbitrarily in the case of shared
kernel architecture. The state-of-the-art solution to this issue is leveraging secure containers. For
example, Microsoft proposes their Hyper-V Container for Windows [58]. Hyper-V offers enhanced
security and broader compatibility. Each instance runs inside a highly optimized microVM and does
not share its kernel with others on the same host. However, it is still a heavy-weight virtualization
that can introduce more than 1000 ms of startup latency. In Google gVisor [49], the kernel in it
acts as a nonprivileged process to restrict syscalls that called in userspace. However, the overhead
introduced during interception and processing syscalls in a sandbox is high. As a result, it is not
well suited for applications with heavy syscalls. To isolate different tenants with affordable over-
head, FireCracker [3] creates microVMs by customizing VMM for cloud-native applications. Each
Firecracker sandbox runs in userspace and is restricted by Seccomp, cgroup, and Namespace poli-
cies. Hardware and hypervisor-based virtualization help FireCracker limit access to the privileged
domain and host kernel for guests. With a container engine built-in microVMs, Kata [67] adopts
an agent to communicate with the kata-proxy located on the host through the hypervisor, thus
achieving a secure environment in a lightweight manner. Both FireCracker and Kata containers
can significantly reduce startup latency and memory consumption, and they all need only 100 to
500 ms to start a sandbox. Secure containers can provide complete and strong isolation for the host
kernel and other tenants, at the cost of the limited flexibility in condensed microVMs. However,
the startup latency of an instance is still long due to the additional application initialization, such
as JVM or Python interpreter setup.
Specialized Unikernel: Enhance flexibility with high security and performance. Another
emerging virtualization technique is Unikernel [86], which leverages libraryOS, including a series
of essential dependent libraries to construct a specialized, single-address-space machine image.
Because the Unikernel runs as a built-in GuestOS, the compile-time invariance rules out runtime
management, which significantly reduces the applicability and flexibility of Unikernel. However,
unnecessary programs or tools such as ls, cd, and tar are not contained, so the image size of a
Unikernel is smaller (e.g., 2 MB by mirage-skeleton [95] that compiled from Xen), the startup la-
tency is much less (e.g., start within 10 ms), and the security is more substantial than containers.
Based on it, LightVM [90] replaces the time-consuming XenStore and implements the split tool
stack, separating functionality that runs periodically from that which must be carried out, thus
improving efficiency and reducing VM startup latency. From the perspective of software ecosys-
tem, to solve the challenge that traditional applications are struggling to be transplanted to the
Unikernel model [86, 113], Olivier et al. [102] proposes HermitTux, a Unikernel model compati-
ble with Linux binary. HermitTux makes the Unikernel model compatible with Linux Application
Binary Interface while retaining the benefits of Unikernel. However, Unikernel is not adaptable
for developers once built, making it inherently inflexible for applications, let alone the terrible
DevOps environment. Furthermore, in heterogeneous clusters, the heterogeneity of the underly-
ing hardware forces Unikernel to update as drivers change, making it the antithesis of serverless
philosophy.
Tradeoffs among security, performance, and flexibility. Last, we make the indicatrix dia-
gram of these four technologies in Figure 3 to show the tradeoffs among security, performance,
and flexibility. To conclude, hypervisor-based VM shows better isolation and flexibility, whereas
the container can make the instance start faster and flexible to customize the runtime environ-
ment. The secure container offers high security and relatively low startup latency with flexibility
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:8 Z. Li et al.
Fig. 3. The flexibility, startup latency, and isolation level of four virtualization mechanisms.
compromise. Unikernel demonstrates great potential in terms of performance and security, but it
loses flexibility. When offering adaptable images in the production environment by either virtu-
alization mechanism, it is also critical to avoid that built ones are signed and originated from an
unsafe pedigree, with the solutions [69, 128] by keeping a continuous vulnerability assessment
and remediation program.
3 ENCAPSULE LAYER
A cold startup in serverless computing may occur when the function fails to capture a warm run-
ning container or experiences a bursty load. In the former, a function is invoked for the first time or
scheduled with a longer invocation interval than the instance lifetime. The typical characteristic is
that instances (or pods) must start from scratch. In the latter case of a bursty load, instances need to
perform horizontal scaling during a surge in user workloads. Function instances will auto-scale as
load changes to ensure adequate resource allocation. Besides taking less than 1 second to prepare a
sandbox in the Virtualization layer, the initialization of software environment, such as load Python
libraries, and application-specific user code can dwarf the former [42, 65, 83, 101, 117]. Although
we can provide a more lightweight sandboxing mechanism to reduce the cold startup latency in
the Virtualization layer, the state-of-the-art sandboxing mechanism may not demonstrate perfect
compatibility for containers or VMs when migrated to the existing serverless architecture. In re-
sponse to the tradeoff between performance and compatibility, an efficient solution is to prewarm
instances in the Encapsule layer. This approach is known as the prewarm startup, which has been
widely researched. Representative work about instance prewarm is listed in Table 2.
Before giving a detailed analysis and comparison, we first describe the taxonomy in each column.
“Template” reflects whether the cold startup instance comes from a template. “Static Image” shows
whether the VM/container image for prewarm disables dynamically updating in each cold startup.
“Pool” indicates whether a prewarm pool is used for function cold startups. “Exclusive” and “Fixed
Size” represent whether the prewarmed instance is exclusive and the prewarm pool is size-fixed.
“Predict/Heuristic” indicates whether the prediction algorithm or heuristic-based method are used
to prewarm instances. “REQs” reflects whether the runtime requirements are dynamically loading
and updating in the prewarm instance. “C/R” reflects whether it supports checkpoint and restore
to accelerate the startup. “Sidecar Based” represents whether the relevant technologies can be
implemented or integrated into the sidecar. “Imp” indicates where it is implemented.
There are two common prewarm startup approaches: one-to-one prewarm startup and one-for-all
prewarm startup. In the one-to-one prewarm startup, each function instance is prewarmed from
a size-fixed pool or by dynamic prediction based on the historical workload traces, whereas in
the one-for-all prewarm startup, instances of all functions are prewarmed from cached sandboxes,
which are pre-generated according to a common configuration file. When a cold startup occurs,
the function only needs to specialize these pre-initialized sandboxes by importing function-specific
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:9
code blob URI and settings. C/R (Checkpoint/Restore) is also used with prewarmed instances in
a serverless system for higher scalability and lower instance initialization latency. C/R is a tech-
nique that can freeze a running instance, make a checkpoint into a list of files, and then restore
the running state of the instance at the frozen point. A common pattern in serverless implemen-
tations is to pause the instance when idle to save resources and then recover it for reusing when
invoked [55, 94].
One-to-one prewarm by size-fixed pool: Makes sense but resource-unfriendly. One-to-one
strategy prewarms exclusive instances in a size-fixed prewarm pool for each function and load
codes whenever invocations arrive. The security model in the Encapsule layer is usually referred to
as privacy concerns. In the one-to-one prewarm pattern, the user-level analyzer for each function
makes user privacy inviolable, and only this user-related analyzer has access to the private pack-
ages. The function portrait cannot leak from the one-to-one prewarm pool, and hardware-based
isolation further ensures that malicious code cannot access the user-level analyzer through priv-
ilege escalation. It is a safe strategy without introducing other security concerns. By building an
exclusive and over-subscribed prewarm pool for each function, serverless providers can maximize
the availability and stability of the user applications. For example, Azure Functions [105] warms
up instances of each function by setting up a fixed-size prewarm pool. Once the always-ready in-
stance is occupied, prewarmed instances will be active and continue to buffer until reaching the
limit. The open-sourced Fission [44] also prewarms like Azure Function. It introduces a compo-
nent called poolmgr, which manages a pool of generic instances with a fixed pool size and injects
function code into the idle instances to reduce the cold start latency.
One-to-one prewarm by predictive warm-up: Ways to make it resource-friendly. The one-
to-one strategy prewarms instances for each function, which means that it is crucial to determine
the warm-up time. Otherwise, a slow warm-up cycle can reduce cold startup efficiency, whereas
a quick cycle will produce massive idle instances in the background and make the serverless sys-
tem resource-unfriendly. Such a deficiency inspires researchers to propose more flexible prewarm
strategies like using prediction-based and heuristic-based methods. Xu et al. [146] design an AWU
(Adaptive Warm-up) strategy by leveraging the LSTM (Long Short-Term Memory) networks to
discover the dependence relationships based on the historical traces. It predicts the invoking time
of each function to prewarm instances and initializes the prewarmed containers according to the
ACPS (Adaptive Container Pool Scaling) strategy once AWU fails. Shahrad et al. [118] propose a
practical resource management policy for the one-to-one prewarm startup. By characterizing the
FaaS workloads, they dynamically change the instance lifetime of the recycling and provisioning
instances according to the time series prediction. CRIU (Checkpoint/Restore In Userspace) [39]
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:10 Z. Li et al.
is a software tool on Linux to implement C/R functions. Replayable Execution [140] makes im-
provements based on CRIU, using mmap to map checkpoint files to memory and leveraging the
Copy-on-Write in OS to share cold data among multiple containers. By exploiting the intensive-
deflated execution characteristics, it reduces the container’s cold startup time and memory usage.
One-for-all prewarm with caching-aware: Try to make the prewarm generalized and
resource-friendly with privacy guaranteed. One-for-all prewarm startup shares a similar mech-
anism with the Template method, which is hatched and has already pre-imported most of the
bins/libs after being informed by the socket. When a new invocation arrives and requires a new
instance, it only needs to initialize or specialize from the templates. For example, the famous open-
sourced Apache OpenWhisk [103] resolves it by allowing that users can assign private packages
in a zip or virtualenv to specialize the prewarmed container dynamically [104]. Catalyzer [42]
optimizes the restore process in C/R by accelerating the recovery on the retrenched critical path.
Meanwhile, it proposes a sandbox fork to leverage a template sandbox that already has pre-loaded
the specific function for state reusing. To make the cold startup less initialization together with
more flat startup latency, Mohan et al. [97] propose a self-evolving pause container pool by pre-
allocating virtual network interfaces with lazy binding.
As performance improves, so arises vulnerability. The security model of the one-for-all prewarm
is weakened by introducing the system-level analyzer where different function portraits may ag-
gregate, and the pre-imported and pre-allocated requirements will implicitly embody user privacy.
Therefore, when designing a one-for-all based prewarm strategy, the security model should answer
how to make private packages/libraries (REQs) inaccessible and avoid potential privacy disclosure
in case of malicious codes reusing a prewarm container. SOCK [101] explicitly seeks to address
this problem by introducing a tree cache for packages and using the benefit-to-cost model to dy-
namically update packages in the prewarm containers. Although SOCK still uses a system-level
analyzer to collect the internal characteristics of workloads and prewarm zygotes, each handler
container may be only forked from a zygote that has not imported any additional packages other
than the ones the handler specifies/needs. Given that a zygote with a superset of packages needed
by the function may exist, SOCK does not use it for security reasons. The cache tree-based security
model of the one-for-all prewarm in SOCK provides a minimal set of user privacy.
One-to-one and one-for-all prewarm: The challenging points. For one-to-one prewarm
startup and one-for-all prewarm startup, both can be beneficial for optimizing the cold startup
in the Encapsule layer of serverless architecture. Their respective flaws are also apparent. The
one-to-one prewarm startup focuses on significantly less initialization latency by exchanging the
memory resource. According to the research [118], it meets the challenge that a warm-up time in
point is usually hard to measure or predict while ensuring the reasonable allocation of memory re-
sources. On the one hand, prediction-based and heuristic-based methods are particularly effective
when historical data is sufficient to build an accurate model but degrade when the trace is scarce.
On the other hand, the prediction and iteration operations can introduce high CPU overhead when
massive applications and function chains co-exist.
The template mechanism in the one-for-all prewarm startup is adopted to ease the high cost
of functions cold startup from scratch. In addition, maintaining a global prewarm pool introduces
less additional memory resource consumption than the one-to-one prewarm startup. However, it
still suffers from several challenges, including the huge template image size [8, 51], confliction
of various pre-imported libraries, and potential privacy disclosure. It may also potentially reveal
the vicinity in which applications with a similar portrait are widely deployed. It is nontrivial to
“suit the remedy to the case” for cold startups in different scenarios. For example, it is much more
efficient to generate a template by one-for-all prewarm startup when the function is invoked for
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:11
the first time or with poor predictions during the trace analysis. The one-to-one prewarm startup
performs better for functions with general rules or diurnal patterns, and vice versa.
4 ORCHESTRATION LAYER
The main challenge in the System Orchestration layer is the friendly and elastic support for dif-
ferent services. Even though the current serverless orchestrators are implemented differently, the
challenges they face are much the same. As hundreds of functions co-exist on a serverless node,
it challenges scheduling massive functions with inextricable dependencies. In addition, managing
granular permissions for hundreds or thousands of functions is hard to do. Therefore, the secu-
rity model is more referred to as performance security than functional security. It resolves the
challenges in making “just the right amount” of resource provision robust to performance while
answering the colocation interference and load balancing for applications. Similar to the tradi-
tional solutions [26, 35, 59, 76, 126], the serverless model should concern the ability to predict the
on-demand computing resources and an efficient scheduling strategy for services. As shown in
Figure 4, researchers usually propose to introduce the load balancer and resource monitor compo-
nents into the controller to resolve provision and scheduling challenges. The load balancer aims
to coordinate resource usage to avoid overloading any single resource. Meanwhile, the resource
monitor keeps watching the resource utilization of each node and passes the updated information
to the load balancer. With the resource monitor and load balancer, a serverless controller can per-
form better scheduling strategies in three levels: resource-level, instance-level, and application-level.
We summarize the hierarchy in Table 3.
Specifically, “Focused Hierarchy” indicates that the resource adjusting is designed in addition to
the essential resource auto-provision, which can be classified into “R” (resource-level), “I” (instance-
level), or “A” (application-level). “Resource Adjusting” shows whether the scheduling provides an
adjustment for resource provision. “SLO” reflects whether SLO constraints are considered. “Intf”
represents whether the resource contention or interference is discussed. “Usage Feedback” re-
flects whether the resource feedback in a physical node is considered. “Dynamic Strategy” indi-
cates whether it is a dynamic or runtime scheduling strategy. “Trace Driven” indicates whether
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:12 Z. Li et al.
making choices depends on traces or collected data metrics. “Predict/Heuristic” reflects whether a
prediction/heuristic-based method is used.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:13
serverless scheduler based on DRL for ML training jobs. It can dynamically adjust the number of
function instances needed and their memory size to balance high model quality and the training
cost.
The keys to making resource provision robust to performance. With a view to the perfor-
mance robustness requirement of the security model in the System Orchestration layer, recent
works take the SLA into account to ensure stability and reliability when functions are invoked
in a shared-resource cloud. CherryPick [5] leverages the Bayesian optimization, which estimates
a confidence interval of an application’s running time and cost, to help search the optimal re-
source configurations. Unlike static searching solutions, it builds a performance model to distin-
guish the unnecessary iteration trials, thus accelerating the convergence. However, CherryPick’s
performance model targets big data applications specifically, not generalized to other applications.
Similarly, Lin and Khazaei [81] build an analytical model to help general serverless applications
deployment. It can predict the application’s end-to-end response time and the average cost un-
der a given configuration. They also propose a PRCP (Probability Refined Critical Path Greedy)
algorithm based on the transition probability, recursively searching the critical path of execution
order. With PRCP, they can achieve the best performance with a specific configuration under bud-
get constraints or less cost under QoS constraints. Besides SLA, shared-resource contention should
also be noticed in the multi-tenant environment. HoseinyFarahabady et al. [57] discuss this topic.
Their proposed MPC optimizes the serverless controller for resource predictively allocation by in-
troducing a set of cost functions. It reduces the QoS violation, stabilizes the CPU utilization, and
avoids serious resource contention. However, these resource and workload estimations based on
ML or AI (Artificial Intelligence) usually achieve a tradeoff between an optimal global solution and
robust performance to inaccurate workload information [26, 31, 60, 133]. Whether they can avoid
fragile robustness and improve resource utilization in the production environment is unknown
and remains a critical avenue to explore.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:14 Z. Li et al.
for the Kubernetes-based system. It can provide a variety of runtime information to the scheduler,
including system resource utilization and the QoS performance of an application. The flaw of the
study is that it does not provide a complicated resource scheduling algorithm. Kaffes et al. [66] pro-
pose a centralized and core-granular scheduler. Centralization provides a global view of the clus-
ter to the scheduler so that it can eliminate heavy-weight function migrations. Core-granularity
binds cores with functions and therefore avoids core-sharing among functions and promises per-
formance stability. However, they only consider the scheduling of CPU resource but ignore other
important resources like memory. FnSched [127] regulates CPU-shares to accommodate the in-
coming application invocations by checking the available resource. A key advantage of employing
a greedy algorithm is that fewer invoker instances are scheduled by concentrating invocations in
response to varying workloads. Although FnSched makes a tradeoff between scalable efficiency
and acceptable response latencies, it is limited by the assumption that function execution times are
not variable. Guan et al. [50] propose an AODC-based (Application Oriented Docker Container) re-
source allocation algorithm by considering both the available resources and the required libraries.
They model the container placement and task assignment as an optimization problem, then take a
Linear Programming Solver to find the feasible solution. The Pallet container performs the AODC
algorithm, serving as both a load balancer and resource monitor. The downside is that plenty of
containers will occupy the memory space as the number of functions increases.
Take the performance interference and QoS constraints into consideration. While im-
proving utilization, load balancing strategies also bring the interference challenge that sharing
resources between instances may result in performance degradation and QoS violation. The per-
formance robustness of the security model drives the scheduling to make tradeoffs between higher
resource utilization and fewer user QoS violation due to the interference. Different functions’ sen-
sitivities to different resources may vary, which means that we should avoid physical colocation of
functions that are sensitive to the same resource (e.g., CPU-sensitive containers may cause serious
CPU contention when co-located). The load balancer should notice and moderate the interference
when scheduling containers. McDaniel et al. [93] manage the I/O of containers at both the cluster
and node levels to effectively reduce resource contention and eliminate performance degradation.
Based on a resource monitor in Docker Swarm, it refines the container I/O management by pro-
viding a client-side API, thus enforcing proportional shares among containers for I/O contention.
Kim et al. [70] present a fine-grained CPU cap control solution by automatically and distributedly
adjusting the allocation of CPU capacity. Based on performance metrics, applications are grouped
and allowed to make adjustment decisions, and application processes of the group consume only
up to the quota of CPU time. Hence, it minimizes the response time skewness and improves the
robustness of the controller to performance degradation. Smart spread [88] proposes an ML-based
function placement strategy by considering several resource utilization statistics. It can predic-
tively find the best-performing instance and incurs the least degradation in performance of the
instance.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:15
Fig. 5. Two invocation patterns for functions and two execution models of workflows.
Invocation patterns and workflow execution models. As shown in Figure 5(a), if a function
is invoked from user queries via the RESTful API or other triggers, it is called external invocation.
The instance-level load balancing can perform well in external invocation scenarios. However, the
emerging cloud applications may consist of several functions, and there are data dependencies be-
tween multiple functions. For example, the implementation of a real-world social network consists
of around 170 functions [2]. In this case, functions in such an application will get active by various
triggers from the user query or another function. If a function is initialized or assigned by other
functions, it follows the internal invocation pattern. Currently, researchers raise their vision to the
data-driven scheduling for internal invocations from the perspective of application-level topology.
Workflow is the most common implementation of internal invocations, where functions are ex-
ecuted in a specified order to satisfy complex business logic. The execution models of these data-
driven workflows can be extracted into two approaches: sequence-based workflow and DAG (Di-
rected Acyclic Graph)-based workflow. As shown in Figure 5(b), functions are invoked in a pipeline
through a registered dataflow in the sequence-based workflow. The sequence-based workflow is
the basic and the most common pattern in the serverless workflow, and most cloud vendors pro-
vide such execution mode for application definition. Obviously, there is more than one sequenced
workflow in one complex application, and the same functions can be executed in various sequences.
If we regard each function as a node and dataflow between nodes as a vector edge, such an appli-
cation with multiple interlaced sequenced workflows can be defined by the DAG (hence the name
“DAG-based workflow”). Today, few cloud vendors provide services for the application definition
in the DAG form, aka serverless workflows [1, 21, 27].
The scheduling overhead introduced in serverless sequences. With massive functions com-
municating with each other, scheduling dataflow introduces more complexities. However, the
existing serverless systems in the production environment commonly treat these workflows as
simple recursion of internal invocations. It raises the challenge of reducing the overhead in the
System Orchestration layer by scheduling function sequences [16]. Current policy to manage the
function sequences is quite simple—functions are triggered following the first-come-first-served
algorithm [129]. However, as the length of the function sequence increases, cascading cold start
overheads should be addressed to avoid seriously end-to-end latencies degradation of sequenced
workflows [20, 40]. To this end, Xanadu [40] combines the prewarm strategy with a most-likely-
path (MLP) estimation in the workflow execution. It prewarms instances by a speculative-based
strategy and makes just-in-time resource provisioning. However, the prediction miss would in-
troduce additional memory waste, especially in the scenario of multi-branch or DAGs. Moreover,
serverless workflow engines prefer the Master-Worker architecture where ready functions are
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:16 Z. Li et al.
identified by the state and invoked directly by the master without a queue [9, 17, 30, 47, 89], in-
cluding AWS Step Functions [43] and Fission Workflows [46]. As shown in Figure 5(a), the defi-
ciency is that the additional overhead is introduced in the function workflow through unnecessary
middlewares (e.g., unnecessary storage in an internal invocation).
Enhance the data locality for efficient serverless DAG executions. To help function work-
flow avoid undesired middlewares, researchers usually co-locate the functions into subgraphs to
enhance the data locality, as shown in Figure 4(c). For example, Viil and Srirama [136] use multi-
level k-way graph partitioning to provision and configure scientific workflows automatically into
multi-cloud environments. However, their partition algorithm may not match well with serverless
applications, where each node in the graph can auto-scale multiple replicas in such as foreach
steps. In this case, the connections and edge weights become unpredictable. In serverless context,
WUKONG [29, 30] implements a decentralized DAG engine based on AWS Lambda, which com-
bines static and dynamic scheduling. It divides the workflow of an application into subgraphs, be-
fore and during execution, thus improving parallelism and data locality simultaneously. However,
WUKONG’s colocation of multi-functions within a Lambda executor may introduce additional
security vulnerabilities due to its weakened isolation. SAND [4] presents a new idea to group
these workflow functions into the same instance so that libraries can be shared across forked
to reduce initialization cost, and additional transmission can be eliminated in the workflow due
to the data locality. SAND performs a better isolation mechanism than WUKONG by using pro-
cess forking for function invocations however ignores the colocation interference resulting from
the resource contention. When exchanging intermediate data of DAGs, SONIC [87] proposes to
use the VM-storage-based transmission strategy when functions are co-located on the same node.
The optimal transferring depends on application-specific parameters, such as the input size and
node parallelism. SONIC dynamically performs the data passing selection with a communication-
aware function placement, predicting such runtime metrics of functions in the workflow. Glob-
alFlow [154] considers a geographically distributed scenario where functions reside in one region
and data in another. It groups the co-located functions into subgraphs and connects them with
lightweight functions, so it improves data locality and reduces transmission latency. The combina-
tion of local and cross-region strategies in a holistic manner makes sense.
Summary of the challenges in the scheduling of serverless workflows. Workflow schedul-
ing is an NP-hard problem, and researchers have been designing various strategies for it [1, 91].
Such optimization in the workflow aims to minimize the makespan, reduce the execution cost,
and improve resource utilization while satisfying single or multiple constraints. Considering the
preceding challenges, serverless computing focuses on leveraging enhanced data locality. The chal-
lenge is that the end-to-end latency of a workflow query could increase significantly due to fre-
quent interactions with the storage from different nodes. Resource volatility becomes another fo-
cus in the serverless system, which can be unpredictable as the number of functions increases in the
production environment. It introduces more difficulty to find an efficient workflow placement and
scheduling strategy in a concise decision time (e.g., 10 ms for load balancing). To evaluate the effi-
ciency and performance for future workflow-based research, DAG-based or DG-based serverless
benchmarks also urgently need to be published. They are better adapted based on real applications
rather than simple microbenchmarks [110] or function self-loops [81, 148]. Keeping a guaranteed
QoS performance is also significant for applications in serverless computing.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:17
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:18 Z. Li et al.
Fig. 6. Techniques and works about BaaS components in the System Coordination layer.
Different phases of storage during the function execution. During a serverless invocation,
there are three phases where the database service is required: Authentication, In-Function, and
Log. Authentication is usually performed ahead of controller scheduling to avoid security issues,
and it should get fast response for access. Using an MMDB (Main Memory DataBase) to imple-
ment the Authentication phase is recommended in a serverless system, such as Redis, a high-
performance key-value database. During the function execution, the calls of storage APIs make
up the In-Function phase. Users can choose to use either a DRDB (Disk-Resident DataBase, e.g.,
MySQL) or an MMDB by different BaaS interfaces for ephemeral storage. The Log phase builds
the bridge for users to return invocations results, especially for the functions invoked in an asyn-
chronous manner. A detailed record in JSON format, including runtime, execution time, queue
time, and states, will be ephemerally or permanently stored and returned (e.g., CouchDB in Open-
Whisk). It is recommended that serverless storage follow the invocation patterns to only pay for
queries consumed by the storage operation and the storage space consumed when logging. How-
ever, the throughput of existing storage is a major bottleneck due to the frequent and vast functions
interactions [64, 65]. Although current serverless systems support NAS (Network Attached Stor-
age) to help reduce storage API calls, these shared access protocols are still network-based data
communication essentially.
IO bottlenecks in storage: Modeling in serverless context. Traditional solutions use predictive
methods [38, 100, 137] and active storage [109, 131, 132, 143, 145, 152] to automatically scale re-
sources and optimize the data locality on demand. Researchers also explore using a hybrid method
to ease the I/O bottleneck for serverless storage. For example, Pocket [71] is strict with separat-
ing responsibilities across the control, metadata, and data planes. Using heuristics and combining
several storage technologies, it dynamically rightsizes resources and achieves a balance between
cost and performance efficiency. To alleviate the extremely inefficient execution for data analyt-
ics workloads in the serverless platform, Locus [106] models a mixture of cheap but slow storage
with expensive but fast storage. It makes a cost-performance tradeoff to choose the most appro-
priate configuration variable and shuffle implementation. Middleware Zion [108] enables a data-
driven serverless computing model for stateless functions. It optimizes the elasticity and resource
contention by injecting computations into data pipelines and running on dataflows in a scalable
manner.
Due to the data-shipping architecture of serverless applications, current works usually focus on
designing a more elastic serverless storage, enhancing the data locality to ease the I/O contention of
function communication on the DB-side. However, due to the potential heterogeneity of different
functions, the preceding uncertainty still makes these technologies in practice challenging.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:19
handle internal events is using the queue trigger, by which functions get triggered whenever an
invocation enqueues. For instance, Kubeless [74] provides a Kafka-based queue trigger bond with
a Kafka topic so that users can invoke the function by writing messages into the topic. Specific
purposes also require more extensive triggers. For example, a timer trigger in Kubernetes can in-
voke a function periodically. It creates a CronJob [75] object, which is written in a Cron expression
representing the set of invocation time, to schedule a job accordingly. An event trigger invokes a
function in response to an event, which is the atomic piece of information that describes something
that happened in the system. A convincing example of such implementation is Triggerflow [85],
which maps a workflow by setting an event trigger in each edge.
Checkpoint cache: Enable functions with fault tolerance. The demand for fault tolerance
also inspires researchers to make relevant techniques applicable in serverless context, such as
C/R-based [80, 153] and log-based [85, 142]. One example of such implementation in serverless
computing is AFT [124], which builds an interposition between a storage engine and a common
serverless platform by providing atomic fault tolerance shim. It leverages the data cache and the
shared storage to guarantee the isolation of atomic read, avoid storage lookups for frequent access,
and prevent significant consistency anomalies.
In addition to the implementations we discussed earlier, other caching mechanisms following
the pay-as-you-go mode can be explored and integrated into any layer of our proposed serverless
architecture. In summary, data caching is still an essential component for higher flexibility and
better performance.
Fig. 7. Cold startup latency under different language runtimes, container runtimes, and memory limits.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:23
Table 4. Comparing Metrics of Four Serverless Vendors [77, 94] (“CCI” Means the Concurrent Invocations)
Item Amazon Lambda Google Functions Microsoft Azure Functions IBM OpenWhisk
GFLOPS per function 19.63 4.35 2.15 3.19
TFLOPS in 3,000 66.30 13.04 7.94 12.30
Throughput of 1–5 CCI 20–55 TPS 1–25 TPS 60–150 TPS 1 TPS
Throughput of 2,000 CCI 400 TPS 40 TPS 120 TPS 210 TPS
CCI Tail latency Best Superior Worst Inferior
CI/CD performance Best Fail frequently Long latency Balanced
Read/Write (1–100 CCI) 153/83 MB/s–93/39.5 MB/s 56/9.5 MB/s–54/3.5 MB/s 424/44 MB/s - NA 68/8 MB/s–34/0.5 MB/s
File I/O (1–100 CCI) 2–3.5 s 10–30 3.5–NA 15–60
Object I/O (1–100 CCI) 1.3–2.4 s 5–8 12–NA 1–30
Trigger Throughput 55-25-860 (HTTP-Object-DB) 20-25-NA 145-250-NA 50-NA-40
Language Runtime overhead Balanced 0.05 s avg (–0.06) 0.22 s (+0.1) (–0.02) 0.22 s (+0.03) (–0.02) 0.17 s (+0.02)
Dependencies overhead (−0.5) 1.1 s (+0.2) avg (−0.5) 1.9 s (+0.4) (−1.3) 3.4 s (NA) NA
Maximum Memory 3,008 MB 2,048 MB 1,536 MB 512 MB
Execution Timeout 5 minutes 9 minutes 10 minutes 5 minutes
Price per Memory $0.0000166/GB-s $0.0000165/GB-s $0.0000016/GB-s $0.000017/GB-s
Price per Execution $0.2 per 1M $0.4 per 1M $0.2 per 1M NA
Free Tier First 1M Exec First 2M Exec First 1M Exec Free Exec/40,000 GB-s
Besides the cold startup analysis of different language runtimes and memory limits, SAND [4]
also measures several sandbox isolation mechanisms for function executions, and we show their
results in Figure 7(c). Native executions (exec and fork) are the fastest methods, whereas Unikernel
(Xen MirageOS) performs similar to using a Docker container. Regardless of the recycled user code
in memory in the paused container, using the Docker client interface to start a warm function
(Docker exec C) is much faster than a cold startup (Docker run C).
As a supplement to the preceding factors that affect serverless cold startup performance,
Shahrad et al. [117] explore other factors that may affect the function cold startup and execution
time, such as MKPI (mispredictions per kilo-instruction), LLC (Last-Level Cache) size, and memory
bandwidth. First, they find that a longer execution time usually appears with a noticeably lower
branch MKPI within a function. It is easy for us to understand that functions with short execution
time spend most of the time on language runtime startup, and thus the branch predictor outputs
more miss when staying trained. Second, the LLC size is not a significant factor affecting cold
startup latency and execution time. Higher LLC size cannot improve serverless function execution
performance because of the insensitivity. Only when the LLC size is very small (e.g., less than 2M)
will it become a bottleneck for the function execution and cold startup. Therefore, cloud vendors
usually set a default LLC size and pre-profile in the serverless system to avoid serious performance
degradation. BabelFish [121] also finds that lazy page table management can result in heavy TLB
stress in a containerized environment. Therefore, to avoid redundant kernel jobs produced in page
table management, they try to share translations across containers in the TLB and page tables.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:24 Z. Li et al.
AWS Lambda shows higher capacity and throughput of concurrent function invocations, although
performing poorly in trigger throughput. From another aspect, Microsoft Azure Functions enables
fast read and write speed when queries are invoked in sequence and shows relatively higher func-
tion cold startup latency. Undoubtedly, all cloud vendors are aware of the challenges in serverless
architecture and are actively optimizing the function invocation bottlenecks.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:25
is derived from the tight coupling between the user functions and other BaaS components, which
can add difficulty to the code migration between different FaaS platforms.
The over-simplified benchmark is another problem with API lock-in. Easy-to-build microbench-
marks are over-emphasized and used in 75% of the current works [110]. We call for the estab-
lishment and open source of cross-platform real-world application benchmarks besides scientific
workflows [64, 89, 119]. However, when decomposing a large service into different functions and
then building fine-grained node interconnections, the mismatch between the pre-defined control
plane and actual data plane makes the grading of the function challenging to determine.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:26 Z. Li et al.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:27
9 CONCLUSION
The rapid development of the cloud-native concept inspires developers to reorganize cloud ap-
plications into microservices. Elastic serverless computing becomes the best practice for these
microservices. This survey explicates and reviews the fundamental aspects of serverless comput-
ing and provides a comprehensive depiction of four-layered design architecture: Virtualization,
Encapsule, System Orchestration, and System Coordination layers. We elaborate on the respon-
sibility and significance of each layer, enumerate relevant works, and give practical implications
when adopting these state-of-the-art techniques. Serverless computing will undoubtedly continue
to gain prominence, and the potential remains sealed in forthcoming years.
REFERENCES
[1] Mainak Adhikari, Tarachand Amgoth, and Satish Narayana Srirama. 2019. A survey on scheduling strategies for
workflows in cloud environment and emerging trends. ACM Comput. Surv. 52, 4 (2019), Article 68, 36 pages. https:
//doi.org/10.1145/3325097
[2] Gojko Adzic and Robert Chatley. 2017. Serverless computing: Economic and architectural impact. In Proceedings of
the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’17). ACM, New York, NY, 884–889.
https://doi.org/10.1145/3106237.3117767
[3] Alexandru Agache, Marc Brooker, Alexandra Iordache, and Anthony Liguori. 2020. Firecracker: Lightweight virtu-
alization for serverless applications. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and
Implementation (NSDI’20). 419–434.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:28 Z. Li et al.
[4] Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and
Volker Hilt. 2018. SAND: Towards high-performance serverless computing. In Proceedings of the 2018 USENIX Annual
Technical Conference (ATC’18). 923–935.
[5] Omid Alipourfard, Hongqiang Harry Liu, and Jianshu Chen. 2017. CherryPick: Adaptively unearthing the best cloud
configurations for big data analytics. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and
Implementation (NSDI’17). 469–482.
[6] Amazon. 2021. Enabling API caching to enhance responsiveness. AWS. Retrieved February 8, 2022 from https://docs.
aws.amazon.com/apigateway/latest/developerguide/api-gateway-caching.html.
[7] Amazon. 2021. Amazon DynamoDB Accelerator (DAX): A fully managed, highly available, in-memory cache service.
AWS. Retrieved February 8, 2022 from https://aws.amazon.com/dynamodb/dax/.
[8] Ali Anwar, Mohamed Mohamed, Vasily Tarasov, Michael Littley, and Lukas Rupprecht. 2018. Improving Docker
registry design based on production workload analysis. In Proceedings of the 16th USENIX Conference on File and
Storage Technologies (FAST’18). 265–278.
[9] Lixiang Ao, Liz Izhikevich, Geoffrey M. Voelker, and George Porter. 2018. Sprocket: A serverless video processing
framework. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’18). ACM, New York, NY, 263–274.
[10] Apex. 2021. Home Page. Retrieved February 8, 2022 from https://apex.sh/.
[11] Vincent Armant, Milan De Cauwer, Kenneth N. Brown, and Barry O’Sullivan. 2018. Semi-online task assignment
policies for workload consolidation in cloud computing systems. Future Gener. Comput. Syst. 82 (2018), 89–103. https:
//doi.org/10.1016/j.future.2017.12.035
[12] Dulcardo Arteaga, Jorge Cabrera, Jing Xu, Swaminathan Sundararaman, and Ming Zhao. 2016. CloudCache: On-
demand flash cache management for cloud computing. In Proceedings of the 14th USENIX Conference on File and
Storage Technologies (FAST’16). 355–369.
[13] Naylor G. Bachiega, Paulo S. L. Souza, Sarita Mazzini Bruschi, and Simone do Rocio Senger de Souza. 2018. Container-
based performance evaluation: A survey and challenges. In Proceedings of the 2018 IEEE International Conference on
Cloud Engineering (IC2E’18). IEEE, Los Alamitos, CA, 398–403.
[14] M. Bacis, R. Brondolin, and M. D. Santambrogio. 2020. BlastFunction: An FPGA-as-a-service system for accelerated
serverless computing. In Proceedings of the 2020 Design, Automation, and Test in Europe Conference and Exhibition
(DATE’20). 852–857. https://doi.org/10.23919/DATE48585.2020.9116333
[15] Ioana Baldini, Paul Castro, Kerry Chang, Perry Cheng, Stephen Fink, Vatche Ishakian, Nick Mitchell, et al. 2017.
Serverless computing: Current trends and open problems. In Research Advances in Cloud Computing. Springer, 1–20.
[16] Ioana Baldini, Perry Cheng, Stephen J. Fink, and Nick Mitchell. 2017. The serverless trilemma: Function composition
for serverless computing. In Proceedings of the 2017 ACM SIGPLAN International Symposium on New Ideas, New
Paradigms, and Reflections on Programming and Software, Onward! ACM, New York, NY, 89–103.
[17] Bartosz Balis. 2016. HyperFlow: A model of computation, programming approach and enactment engine for complex
distributed workflows. Future Gener. Comput. Syst. 55 (2016), 147–162. https://doi.org/10.1016/j.future.2015.08.015
[18] Christian Bargmann and Marina Tropmann-Frick. 2019. A survey on secure container isolation approaches for multi-
tenant container workloads and serverless computing. In Proceedings of the 8th Workshop on Software Quality Anal-
ysis, Monitoring, Improvement, and Applications (SQAMIA’19). http://ceur-ws.org/Vol-2508/paper-bar.pdf
[19] S. Barlev, Z. Basil, S. Kohanim, R. Peleg, S. Regev, and Alexandra Shulman-Peleg. 2016. Secure yet usable: Protecting
servers and Linux containers. IBM J. Res. Dev. 60, 4 (2016), 12. https://doi.org/10.1147/JRD.2016.2574138
[20] David Bermbach, Ahmet-Serdar Karakaya, and Simon Buchholz. 2020. Using application knowledge to reduce cold
starts in FaaS services. In Proceedings of the 35th ACM/SIGAPP Symposium on Applied Computing (SAC’20). ACM,
New York, NY, 134–143. https://doi.org/10.1145/3341105.3373909
[21] Kahina Bessai, Samir Youcef, Ammar Oulamara, Claude Godart, and Selmin Nurcan. 2012. Bi-criteria workflow tasks
allocation and scheduling in cloud computing environments. In Proceedings of the 2012 IEEE 5th International Con-
ference on Cloud Computing. IEEE, Los Alamitos, CA, 638–645. https://doi.org/10.1109/CLOUD.2012.83
[22] Nilton Bila, Paolo Dettori, Ali Kanso, Yuji Watanabe, and Alaa Youssef. 2017. Leveraging the serverless architecture
for securing Linux containers. In Proceedings of the 37th IEEE International Conference on Distributed Computing
Systems Workshops (ICDCS Workshops’17). IEEE, Los Alamitos, CA, 401–404.
[23] Sol Boucher, Anuj Kalia, David G. Andersen, and Michael Kaminsky. 2018. Putting the “Micro” back in microservice.
In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC’18). 645–650. https://www.usenix.org/
conference/atc18/presentation/boucher.
[24] Mark Boyd. 2021. Serverless: IOpipe Launches a Monitoring Tool for AWS Lambda. Retrieved February 8, 2022 from
https://thenewstack.io/iopipe-launches-lambda-monitoring-tool-aws-summit/.
[25] Frank Budinsky. 2021. Canary Deployments Ising Istio. Retrieved February 8, 2022 from https://istio.io/latest/blog/
2017/0.1-canary/.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:29
[26] Rajkumar Buyya, Satish Narayana Srirama, Giuliano Casale, and Rodrigo N. Calheiros. 2019. A manifesto for future
generation cloud computing: Research directions for the next decade. ACM Comput. Surv. 51, 5 (2019), Article 105,
38 pages. https://doi.org/10.1145/3241737
[27] Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona Brandic. 2009. Cloud computing
and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput.
Syst. 25, 6 (2009), 599–616. https://doi.org/10.1016/j.future.2008.12.001
[28] James Cadden, Thomas Unger, Yara Awad, and Han Dong. 2020. SEUSS: Skip redundant paths to make serverless
fast. In Proceedings of the 15th EuroSys Conference (EuroSys’20). ACM, New York, NY, Article 32, 15 pages.
[29] Benjamin Carver, Jingyuan Zhang, Ao Wang, Ali Anwar, Panruo Wu, and Yue Cheng. 2020. Wukong: A scalable and
locality-enhanced framework for serverless parallel computing. In Proceedings of the 11th ACM Symposium on Cloud
Computing (SoCC’20). ACM, New York, NY, 1–15. https://doi.org/10.1145/3419111.3421286
[30] Benjamin Carver, Jingyuan Zhang, Ao Wang, and Yue Cheng. 2019. In search of a fast and efficient serverless DAG
engine. CoRR abs/1910.05896 (2019). http://arxiv.org/abs/1910.05896.
[31] Israel Casas, Javid Taheri, Rajiv Ranjan, and Albert Y. Zomaya. 2017. PSO-DS: A scheduling engine for scientific
workflow managers. J. Supercomput. 73, 9 (2017), 3924–3947. https://doi.org/10.1007/s11227-017-1992-z
[32] Chia-Chen Chang, Shun-Ren Yang, En-Hau Yeh, Phone Lin, and Jeu-Yih Jeng. 2017. A Kubernetes-based monitoring
platform for dynamic cloud resource provisioning. In Proceedings of the 2017 IEEE Global Communications Conference
(GLOBECOM’17). IEEE, Los Alamitos, CA, 1–6.
[33] Liuhua Chen and Haiying Shen. 2017. Considering resource demand misalignments to reduce resource over-
provisioning in cloud datacenters. In Proceedings of the 2017 IEEE Conference on Computer Communications
(INFOCOM’17). IEEE, Los Alamitos, CA, 1–9.
[34] Liuhua Chen, Haiying Shen, and Stephen Platt. 2016. Cache contention aware virtual machine placement and mi-
gration in cloud datacenters. In Proceedings of the 24th IEEE International Conference on Network Protocols (ICNP’16).
IEEE, Los Alamitos, CA, 1–10.
[35] Shuang Chen, Christina Delimitrou, and José F. Martínez. 2019. PARTIES: QoS-aware resource partitioning for multi-
ple interactive services. In Proceedings of the 24th International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS’19). ACM, New York, NY, 107–120.
[36] Xinyu Chen, Yao Chen, Ronak Bajaj, Jiong He, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2020. Is FPGA
useful for hash joins? In Proceedings of the 10th Conference on Innovative Data Systems Research (CIDR’20).
[37] Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2021. ThunderGP: HLS-
based graph processing framework on FPGAs. In Proceedings of the 2021 ACM/SIGDA International Symposium on
Field-Programmable Gate Arrays (FPGA’21). ACM, New York, NY, 69–80. https://doi.org/10.1145/3431920.3439290
[38] Eli Cortez, Anand Bonde, and Alexandre Muzio. 2017. Resource central: Understanding and predicting workloads for
improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems
Principles. ACM, New York, NY, 153–167.
[39] GitHub. 2021. CRIU: A Utility to Checkpoint/Restore Linux Tasks in Userspace. Retrieved February 8, 2022 from
https://github.com/checkpoint-restore/criu.
[40] Nilanjan Daw, Umesh Bellur, and Purushottam Kulkarni. 2020. Xanadu: Mitigating cascading cold starts in serverless
function chain deployments. In Proceedings of the 21st International Middleware Conference (Middleware’20). ACM,
New York, NY, 356–370.
[41] Docker. 2021. Home Page. Retrieved February 8, 2022 from https://www.docker.com/.
[42] Dong Du, Tianyi Yu, Yubin Xia, Binyu Zang, Guanglu Yan, Chenggang Qin, Qixuan Wu, and Haibo Chen. 2020.
Catalyzer: Sub-millisecond startup for serverless computing with initialization-less booting. In Architectural Support
for Programming Languages and Operating Systems (ASPLOS’20). ACM, New York, NY, 467–481. https://doi.org/10.
1145/3373376.3378512
[43] AWS. 2021. Elastic Load Balancing: Application Load Balancers. Retrieved February 8, 2022 from https://docs.aws.
amazon.com/elasticloadbalancing/latest/application/elb-ag.pdf.
[44] Fission. 2021. Execute Mode in Fission. Retrieved February 8, 2022 from https://fission.io/docs/usage/function/
executor/.
[45] Erwin Van Eyk, Lucian Toader, and Sacheendra Talluri. 2018. Serverless is more: From PaaS to present cloud com-
puting. IEEE Internet Comput. 22, 5 (2018), 8–17.
[46] GitHub. 2021. Fission Workflows: Fast, Reliable and Lightweight Function Composition for Serverless Functions.
Retrieved February 8, 2022 from https://github.com/fission/fission-workflows.
[47] Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Balasubramaniam, William Zeng, Rahul Bhalerao,
Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, fast and slow: Low-latency video processing
using thousands of tiny threads. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and
Implementation (NSDI’17). 363–376.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:30 Z. Li et al.
[48] Etienne Tremel. 2021. Deployment Strategies on Kubernetes. Retrieved February 8, 2022 from https://www.cncf.io/
wp-content/uploads/2020/08/CNCF-Presentation-Template-K8s-Deployment.pdf.
[49] GitHub. 2021. Google Container Runtime Sandbox. Retrieved February 8, 2022 from https://github.com/google/
gvisor.
[50] Xinjie Guan, Xili Wan, Baek-Young Choi, Sejun Song, and Jiafeng Zhu. 2017. Application oriented dynamic resource
allocation for data centers using Docker containers. IEEE Commun. Lett. 21, 3 (2017), 504–507.
[51] Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Slacker:
Fast distribution with lazy Docker containers. In Proceedings of the 14th USENIX Conference on File and Storage
Technologies (FAST’16). 181–195. https://www.usenix.org/conference/fast16/technical-sessions/presentation/harter.
[52] Hassan B. Hassan, Saman A. Barakat, and Qusay I. Sarhan. 2021. Survey on serverless computing. J. Cloud Comput.
10, 1 (2021), 39. https://doi.org/10.1186/s13677-021-00253-7
[53] Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander. 2008. Relational joins
on graphics processors. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data
(SIGMOD’08). ACM, New York, NY, 511–524. https://doi.org/10.1145/1376616.1376670
[54] Joseph M. Hellerstein, Jose M. Faleiro, and Joseph Gonzalez. 2019. Serverless computing: One step forward, two steps
back. In Proceedings of the 9th Biennial Conference on Innovative Data Systems Research (CIDR’19).
[55] Scott Hendrickson, Stephen Sturdevant, Edward Oakes, Tyler Harter, Venkateshwaran Venkataramani, Andrea C.
Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Serverless computation with OpenLambda. Login Usenix Mag.
41, 4 (2016), 14–19. https://www.usenix.org/publications/login/winter2016/hendrickson.
[56] Honeycomb. 2021. Home Page. Retrieved February 8, 2022 from https://www.honeycomb.io/.
[57] M. Reza HoseinyFarahabady, Albert Y. Zomaya, and Zahir Tari. 2018. A model predictive controller for managing
QoS enforcements and microarchitecture-level interferences in a lambda platform. IEEE Trans. Parallel Distrib. Syst.
29, 7 (2018), 1442–1455.
[58] Microsoft. 2021. Isolation Modes. Retrieved February 8, 2022 from https://docs.microsoft.com/en-us/virtualization/
windowscontainers/manage-containers/hyperv-container.
[59] Shigeru Imai, Thomas Chestna, and Carlos A. Varela. 2013. Accurate resource prediction for hybrid IaaS clouds using
workload-tailored elastic compute units. In Proceedings of the IEEE/ACM 6th International Conference on Utility and
Cloud Computing (UCC’13). IEEE, Los Alamitos, CA, 171–178. https://doi.org/10.1109/UCC.2013.40
[60] Shigeru Imai, Stacy Patterson, and Carlos A. Varela. 2018. Uncertainty-aware elastic virtual machine scheduling for
stream processing systems. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud, and Grid
Computing (CCGRID’18). IEEE, Los Alamitos, CA, 62–71. https://doi.org/10.1109/CCGRID.2018.00021
[61] Vitalii Ivanov and Kari Smolander. 2018. Implementation of a DevOps pipeline for serverless applications. In Product-
Focused Software Process Improvement. Lecture Notes in Computer Science, Vol. 11271. Springer, 48–64.
[62] David Jackson and Gary Clynch. 2018. An investigation of the impact of language runtime on the performance
and cost of serverless functions. In Proceedings of the 2018 IEEE/ACM International Conference on Utility and Cloud
Computing Companion (UCC Companion’18). IEEE, Los Alamitos, CA, 154–160.
[63] Jenkins. 2021. DevOps CI Tool. Retrieved February 8, 2022 from https://www.jenkins.io/.
[64] Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the cloud: Distributed
computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC’17). ACM, New York, NY,
445–451. https://doi.org/10.1145/3127479.3128601
[65] Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar,
et al. 2019. Cloud programming simplified: A Berkeley view on serverless computing. CoRR abs/1902.03383 (2019).
http://arxiv.org/abs/1902.03383.
[66] Kostis Kaffes, Neeraja J. Yadwadkar, and Christos Kozyrakis. 2019. Centralized core-granular scheduling for server-
less functions. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’19). ACM, New York, NY, 158–164.
[67] Kata Containers. 2021. Home Page. Retrieved February 8, 2022 from https://katacontainers.io/.
[68] Alireza Keshavarzian, Saeed Sharifian, and Sanaz Seyedin. 2019. Modified deep residual network architecture de-
ployed on serverless framework of IoT platform based on human activity recognition application. Future Gener.
Comput. Syst. 101 (2019), 14–28.
[69] Asif Khan. 2017. Key characteristics of a container orchestration platform to enable a modern application. IEEE Cloud
Comput. 4, 5 (2017), 42–48. https://doi.org/10.1109/MCC.2017.4250933
[70] Young Ki Kim, M. Reza HoseinyFarahabady, Young Choon Lee, and Albert Y. Zomaya. 2020. Automated fine-grained
CPU cap control in serverless computing platform. IEEE Trans. Parallel Distrib. Syst. 31, 10 (2020), 2289–2301. https:
//doi.org/10.1109/TPDS.2020.2989771
[71] Ana Klimovic, Yawen Wang, Patrick Stuedi, and Animesh Trivedi. 2018. Pocket: Elastic ephemeral storage for
serverless analytics. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation
(OSDI’18). 427–444.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:31
[72] Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, et al. 2020. Spectre
attacks: Exploiting speculative execution. Commun. ACM 63, 7 (2020), 93–101.
[73] Ricardo Koller and Alan Dawson. 2021. Vulnerability Advisor—Secure your Dev + Ops Across Containers. Retrieved
February 8, 2022 from https://www.ibm.com/blogs/cloud-archive/2016/11/vulnerability-advisor-secure-your-dev-
ops-across-containers/.
[74] GitHub. 2021. Kubeless. Retrieved February 8, 2022 from https://kubeless.io/.
[75] Kubernetes. 2021. CronJob. Retrieved February 8, 2022 from https://kubernetes.io/docs/concepts/workloads/
controllers/cron-jobs/.
[76] Anthony Kwan, Jonathon Wong, Hans-Arno Jacobsen, and Vinod Muthusamy. 2019. HyScale: Hybrid and network
scaling of dockerized microservices in cloud data centres. In Proceedings of the 39th IEEE International Conference on
Distributed Computing Systems (ICDCS’19). IEEE, Los Alamitos, CA, 80–90. https://doi.org/10.1109/ICDCS.2019.00017
[77] Hyungro Lee, Kumar Satyam, and Geoffrey C. Fox. 2018. Evaluation of production serverless computing environ-
ments. In Proceedings of the 11th IEEE International Conference on Cloud Computing (CLOUD’18). IEEE, Los Alamitos,
CA, 442–450.
[78] Philipp Leitner, Erik Wittern, Josef Spillner, and Waldemar Hummer. 2019. A mixed-method empirical study of
function-as-a-service software development in industrial practice. J. Syst. Softw. 149 (2019), 340–359. https://doi.org/
10.1016/j.jss.2018.12.013
[79] Huiba Li, Yifan Yuan, Rui Du, Kai Ma, Lanzheng Liu, and Windsor Hsu. 2020. DADI: Block-level image service for
agile and elastic application deployment. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX
ATC’20). 727–740.
[80] Wubin Li and Ali Kanso. 2015. Comparing containers versus virtual machines for achieving high availability. In
Proceedings of the 2015 IEEE International Conference on Cloud Engineering (IC2E’15). IEEE, Los Alamitos, CA, 353–
358.
[81] Changyuan Lin and Hamzeh Khazaei. 2021. Modeling and optimization of performance and cost of serverless appli-
cations. IEEE Trans. Parallel Distrib. Syst. 32, 3 (2021), 615–632.
[82] W. Ling, L. Ma, C. Tian, and Z. Hu. 2019. Pigeon: A dynamic and efficient serverless and FaaS framework for private
cloud. In Proceedings of the 2019 International Conference on Computational Science and Computational Intelligence
(CSCI’19). 1416–1421. https://doi.org/10.1109/CSCI49370.2019.00265
[83] David Lion, Adrian Chu, Hailong Sun, Xin Zhuang, Nikola Grcevski, and Ding Yuan. 2017. Don’t get caught in
the cold, warm up your JVM. Login Usenix Mag. 42, 1 (2017), 46–51. https://www.usenix.org/publications/login/
spring2017/lion.
[84] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Jann Horn, Stefan Mangard, et al. 2020.
Meltdown: Reading kernel memory from user space. Commun. ACM 63, 6 (2020), 46–56.
[85] Pedro García López, Aitor Arjona, Josep Sampé, Aleksander Slominski, and Lionel Villard. 2020. Triggerflow: Trigger-
based orchestration of serverless workflows. In Proceedings of the 14th ACM International Conference on Distributed
and Event-Based Systems (DEBS’20). ACM, New York, NY, 3–14. https://doi.org/10.1145/3401025.3401731
[86] Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David J. Scott, Balraj Singh, Thomas Gazagnaire, Steven
Smith, Steven Hand, and Jon Crowcroft. 2013. Unikernels: Library operating systems for the cloud. In Proceedings of
Architectural Support for Programming Languages and Operating Systems (ASPLOS’13). ACM, New York, NY, 461–472.
https://doi.org/10.1145/2451116.2451167
[87] Ashraf Mahgoub, Karthick Shankar, Subrata Mitra, Ana Klimovic, Somali Chaterji, and Saurabh Bagchi. 2021. SONIC:
Application-aware data passing for chained serverless applications. In Proceedings of the 2021 USENIX Annual Tech-
nical Conference (USENIX ATC’21). 285–301.
[88] Nima Mahmoudi, Changyuan Lin, Hamzeh Khazaei, and Marin Litoiu. 2019. Optimizing serverless computing: In-
troducing an adaptive function placement algorithm. In Proceedings of the 29th Annual International Conference on
Computer Science and Software Engineering (CASCON’19). ACM, New York, NY, 203–213.
[89] Maciej Malawski, Adam Gajek, Adam Zima, Bartosz Balis, and Kamil Figiela. 2020. Serverless execution of scientific
workflows: Experiments with HyperFlow, AWS Lambda and Google Cloud Functions. Future Gener. Comput. Syst.
110 (2020), 502–514.
[90] Filipe Manco, Costin Lupu, Florian Schmidt, Jose Mendes, Simon Kuenzer, Sumit Sati, Kenichi Yasukata, Costin Raiciu,
and Felipe Huici. 2017. My VM is lighter (and safer) than your container. In Proceedings of the 26th Symposium on
Operating Systems Principles. ACM, New York, NY, 218–233. https://doi.org/10.1145/3132747.3132763
[91] Mohammad Masdari, Sima ValiKardan, Zahra Shahi, and Sonay Imani Azar. 2016. Towards workflow scheduling in
cloud computing: A comprehensive analysis. J. Netw. Comput. Appl. 66 (2016), 64–82.
[92] Massimiliano Mattetti, Alexandra Shulman-Peleg, Yair Allouche, Antonio Corradi, Shlomi Dolev, and Luca Foschini.
2015. Securing the infrastructure and the workloads of linux containers. In Proceedings of the 2015 IEEE Conference
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:32 Z. Li et al.
on Communications and Network Security (CNS’15). IEEE, Los Alamitos, CA, 559–567. https://doi.org/10.1109/CNS.
2015.7346869
[93] Sean McDaniel, Stephen Herbein, and Michela Taufer. 2015. A two-tiered approach to I/O quality of service in Docker
containers. In Proceedings of the 2015 IEEE International Conference on Cluster Computing (CLUSTER’15). IEEE, Los
Alamitos, CA, 490–491.
[94] M. Garrett McGrath and Paul R. Brenner. 2017. Serverless computing: Design, implementation, and performance.
In Proceedings of the 37th IEEE International Conference on Distributed Computing Systems Workshops (ICDCS Work-
shops’17). IEEE, Los Alamitos, CA, 405–410. https://doi.org/10.1109/ICDCSW.2017.36
[95] GitHub. 2021. Mirage-Skeleton with Simple MirageOS Applications. Retrieved February 8, 2022 from https://github.
com/mirage/mirage-skeleton.
[96] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A.
Riedmiller. 2013. Playing Atari with deep reinforcement learning. CoRR abs/1312.5602 (2013).
[97] Anup Mohan, Harshad Sane, Kshitij Doshi, and Saikrishna Edupuganti. 2019. Agile cold starts for scalable serverless.
In Proceedings of the 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’19). https://www.usenix.
org/conference/hotcloud19/presentation/mohan.
[98] Diana M. Naranjo, Sebastián Risco, Carlos de Alfonso, Alfonso Pérez, Ignacio Blanquer, and Germán Moltó. 2020.
Accelerated serverless computing based on GPU virtualization. J. Parallel Distrib. Comput. 139 (2020), 32–42. https:
//doi.org/10.1016/j.jpdc.2020.01.004
[99] Hylson Vescovi Netto, Lau Cheuk Lung, Miguel Correia, Aldelir Fernando Luiz, and Luciana Moreira Sá de Souza.
2017. State machine replication in containers managed by Kubernetes. J. Syst. Archit. 73 (2017), 53–59.
[100] Hiep Nguyen, Zhiming Shen, Xiaohui Gu, Sethuraman Subbiah, and John Wilkes. 2013. AGILE: Elastic distributed
resource scaling for infrastructure-as-a-service. In Proceedings of the 10th International Conference on Autonomic
Computing (ICAC’13). 69–82.
[101] Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea C. Arpaci-Dusseau, and Remzi H.
Arpaci-Dusseau. 2018. SOCK: Rapid task provisioning with serverless-optimized containers. In Proceedings of
the 2018 USENIX Annual Technical Conference (USENIX ATC’18). 57–70. https://www.usenix.org/conference/atc18/
presentation/oakes.
[102] Pierre Olivier, Daniel Chiba, Stefan Lankes, Changwoo Min, and Binoy Ravindran. 2019. A binary-compatible uniker-
nel. In Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
(VEE’19). ACM, New York, NY, 59–73. https://doi.org/10.1145/3313808.3313817
[103] GitHub. 2021. OpenWhisk: Serverless Functions Platform for Building Cloud Applications. Retrieved February 8,
2022 from https://github.com/apache/openwhisk.
[104] GitHub. 2021. Prewarm in Apache OpenWhisk. Retrieved February 8, 2022 from https://github.com/apache/
openwhisk/blob/master/docs/actions-python.md.
[105] Microsoft. 2021. Azure Functions Premium Plan. Retrieved February 8, 2022 from https://docs.microsoft.com/en-
us/azure/azure-functions/functions-premium-plan.
[106] Qifan Pu, Shivaram Venkataraman, and Ion Stoica. 2019. Shuffling, fast and slow: Scalable analytics on server-
less infrastructure. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation
(NSDI’19). 193–206.
[107] K. V. Rashmi, Mosharaf Chowdhury, Jack Kosaian, Ion Stoica, and Kannan Ramchandran. 2016. EC-cache: Load-
balanced, low-latency cluster caching with online erasure coding. In Proceedings of the 12th USENIX Symposium on
Operating Systems Design and Implementation (OSDI’16). 401–417.
[108] Josep Sampé, Marc Sánchez Artigas, Pedro García López, and Gerard París. 2017. Data-driven serverless functions
for object storage. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference. ACM, New York, NY, 121–133.
[109] Josep Sampé, Pedro García López, and Marc Sánchez Artigas. 2016. Vertigo: Programmable micro-controllers
for software-defined object storage. In Proceedings of the 9th IEEE International Conference on Cloud Computing
(CLOUD’16). IEEE, Los Alamitos, CA, 180–187.
[110] Joel Scheuner and Philipp Leitner. 2020. Function-as-a-service performance evaluation: A multivocal literature re-
view. J. Syst. Softw. 170 (2020), 110708.
[111] Joel Scheuner and Philipp Leitner. 2020. The state of research on function-as-a-service performance evaluation: A
multivocal literature review. CoRR abs/2004.03276 (2020). https://arxiv.org/abs/2004.03276.
[112] Johann Schleier-Smith, Vikram Sreekanti, Anurag Khandelwal, Joao Carreira, Neeraja Jayant Yadwadkar, Raluca Ada
Popa, Joseph E. Gonzalez, Ion Stoica, and David A. Patterson. 2021. What serverless computing is and should become:
The next phase of cloud computing. Commun. ACM 64, 5 (2021), 76–84.
[113] Florian Schmidt. 2017. Uniprof: A unikernel stack profiler. In Posters and Demos Proceedings of the Conference of the
ACM Special Interest Group on Data Communication (SIGCOMM’17). ACM, New York, NY, 31–33. https://doi.org/10.
1145/3123878.3131976
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
The Serverless Computing Survey 220:33
[114] Michael Schwarz, Moritz Lipp, Daniel Moghimi, Jo Van Bulck, Julian Stecklina, Thomas Prescher, and Daniel Gruss.
2019. ZombieLoad: Cross-privilege-boundary data sampling. In Proceedings of the 2019 ACM SIGSAC Conference on
Computer and Communications Security (CCS’19). ACM, New York, NY, 753–768. https://doi.org/10.1145/3319535.
3354252
[115] Srinath T. V. Setty, Chunzhi Su, and Jacob R. Lorch. 2016. Realizing the fault-tolerance promise of cloud storage using
locks with intent. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation
(OSDI’16). 501–516.
[116] Hossein Shafiei, Ahmad Khonsari, and Payam Mousavi. 2021. Serverless computing: A survey of opportunities, chal-
lenges and applications. arXiv:1911.01296 [cs.NI].
[117] Mohammad Shahrad, Jonathan Balkind, and David Wentzlaff. 2019. Architectural implications of function-as-a-
service computing. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO’19). ACM, New York, NY, 1063–1075. https://doi.org/10.1145/3352460.3358296
[118] Mohammad Shahrad, Rodrigo Fonseca, Iñigo Goiri, and Gohar Chaudhry. 2020. Serverless in the wild: Characterizing
and optimizing the serverless workload at a large cloud provider. In Proceedings of the 2020 USENIX Annual Technical
Conference (USENIX ATC’20). 205–218. https://www.usenix.org/conference/atc20/presentation/shahrad.
[119] Vaishaal Shankar, Karl Krauth, and Qifan Pu. 2018. Numpywren: Serverless linear algebra. CoRR abs/1810.09679
(2018).
[120] Arjun Singhvi, Junaid Khalid, Aditya Akella, and Sujata Banerjee. 2020. SNF: Serverless network functions. In Pro-
ceedings of the ACM Symposium on Cloud Computing (SoCC’20). ACM, New York, NY, 296–310.
[121] Dimitrios Skarlatos, Umur Darbaz, Bhargava Gopireddy, and Nam Sung Kim. 2020. BabelFish: Fusing address transla-
tions for containers. In Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture
(ISCA’20). IEEE, Los Alamitos, CA, 501–514.
[122] Sonarqube. 2021. Code Quality and Code Security. Retrieved February 8, 2022 from https://www.sonarqube.org/.
[123] Sparta. 2021. A Go Framework for AWS Lambda Microservices. Retrieved February 8, 2022 from http://gosparta.io/.
[124] Vikram Sreekanti, Chenggang Wu, and Saurav Chhatrapati. 2020. A fault-tolerance shim for serverless computing.
In Proceedings of the 15th EuroSys Conference (EuroSys’20). ACM, New York, NY, Article 15, 15 pages.
[125] Vikram Sreekanti, Chenggang Wu, Xiayue Charles Lin, and Johann Schleier-Smith. 2020. Cloudburst: Stateful
functions-as-a-service. Proc. VLDB Endow. 13, 11 (2020), 2438–2452.
[126] Satish Narayana Srirama and Alireza Ostovar. 2018. Optimal cloud resource provisioning for auto-scaling enterprise
applications. Int. J. Cloud Comput. 7, 2 (2018), 129–162. https://doi.org/10.1504/IJCC.2018.10014880
[127] Amoghavarsha Suresh and Anshul Gandhi. 2019. FnSched: An efficient scheduler for serverless functions. In Pro-
ceedings of the 5th International Workshop on Serverless Computing (WOSC@Middleware’19). ACM, New York, NY,
19–24.
[128] Byungchul Tak, Canturk Isci, Sastry Duri, Nilton Bila, Shripad Nadgowda, and James Doran. 2017. Understanding
security implications of using containers in the cloud. In Proceedings of the 2017 USENIX Annual Technical Conference
(USENIX ATC’17). 313–319.
[129] Ali Tariq, Austin Pahl, Sharat Nimmagadda, Eric Rozner, and Siddharth Lanka. 2020. Sequoia: Enabling quality-of-
service in serverless computing. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’20). ACM, New
York, NY, 311–327.
[130] Jörg Thalheim, Pramod Bhatotia, Pedro Fonseca, and Baris Kasikci. 2018. Cntr: Lightweight OS containers. In Proceed-
ings of the 2018 USENIX Annual Technical Conference (USENIX ATC’18). 199–212. https://www.usenix.org/conference/
atc18/presentation/thalheim.
[131] Raúl Gracia Tinedo, Pedro García López, Marc Sánchez Artigas, and Josep Sampé. 2016. IOStack: Software-defined
object storage. IEEE Internet Comput. 20, 3 (2016), 10–18.
[132] Raúl Gracia Tinedo, Josep Sampé, and Edgar Zamora-Gómez. 2017. Crystal: Software-defined storage for multi-
tenant object stores. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). 243–256.
[133] László Toka, Gergely Dobreff, Balázs Fodor, and Balázs Sonkoly. 2020. Adaptive AI-based auto-scaling for Kubernetes.
In Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud, and Internet Computing (CCGRID’20).
IEEE, Los Alamitos, CA, 599–608.
[134] GitHub. 2021. Creating and Invoking Docker Actions. Retrieved February 8, 2022 from https://github.com/apache/
openwhisk/blob/master/docs/actions-docker.md.
[135] Alexandre Verbitski, Anurag Gupta, and Debanjan Saha. 2017. Amazon Aurora: Design considerations for high
throughput cloud-native relational databases. In Proceedings of the 2017 ACM International Conference on Manage-
ment of Data (SIGMOD’17). ACM, New York, NY, 1041–1052.
[136] Jaagup Viil and Satish Narayana Srirama. 2018. Framework for automated partitioning and execution of scientific
workflows in the cloud. J. Supercomput. 74, 6 (2018), 2656–2683.
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.
220:34 Z. Li et al.
[137] Muhammad Wajahat, Anshul Gandhi, Alexei A. Karve, and Andrzej Kochut. 2016. Using machine learning for black-
box autoscaling. In Proceedings of the 7th International Green and Sustainable Computing Conference (IGSC’16). IEEE,
Los Alamitos, CA, 1–8.
[138] Ao Wang, Jingyuan Zhang, Xiaolong Ma, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Vasily Tarasov, Feng Yan,
and Yue Cheng. 2020. InfiniCache: Exploiting ephemeral serverless functions to build a cost-effective memory cache.
In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST’20). 267–281.
[139] Hao Wang, Di Niu, and Baochun Li. 2019. Distributed machine learning with a serverless architecture. In Proceedings
of the 2019 IEEE Conference on Computer Communications (INFOCOM’19). IEEE, Los Alamitos, CA, 1288–1296.
[140] Kai-Ting Amy Wang, Rayson Ho, and Peng Wu. 2019. Replayable execution optimized for page sharing for a managed
runtime environment. In Proceedings of the 14th EuroSys Conference (EuroSys’19). ACM, New York, NY. https://doi.
org/10.1145/3302424.3303978
[141] Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael M. Swift. 2018. Peeking behind the
curtains of serverless platforms. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC’18).
133–146.
[142] Stephanie Wang, John Liagouris, and Robert Nishihara. 2019. Lineage stash: Fault tolerance off the critical path. In
Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP’19). ACM, New York, NY, 338–352.
[143] Jake Wires and Andrew Warfield. 2017. Mirador: An active control plane for datacenter storage. In Proceedings of the
15th USENIX Conference on File and Storage Technologies (FAST’17). 213–228.
[144] Mingyu Wu, Zeyu Mi, and Yubin Xia. 2020. A survey on serverless computing and its implications for JointCloud
computing. In Proceedings of the 2020 IEEE International Conference on Joint Cloud Computing. 94–101. https://doi.
org/10.1109/JCC49151.2020.00023
[145] Yulai Xie, Dan Feng, Yan Li, and Darrell D. E. Long. 2016. Oasis: An active storage framework for object storage
platform. Future Gener. Comput. Syst. 56 (2016), 746–758.
[146] Zhengjun Xu, Haitao Zhang, Xin Geng, Qiong Wu, and Huadong Ma. 2019. Adaptive function launching acceleration
in serverless computing platforms. In Proceedings of the 25th IEEE International Conference on Parallel and Distributed
Systems (ICPADS’19). IEEE, Los Alamitos, CA, 9–16. https://doi.org/10.1109/ICPADS47876.2019.00011
[147] Kejiang Ye, Zhaohui Wu, Chen Wang, Bing Bing Zhou, Weisheng Si, Xiaohong Jiang, and Albert Y. Zomaya. 2015.
Profiling-based workload consolidation and migration in virtualized data centers. IEEE Trans. Parallel Distrib. Syst.
26, 3 (2015), 878–890. https://doi.org/10.1109/TPDS.2014.2313335
[148] Tianyi Yu, Qingyuan Liu, and Dong Du. 2020. Characterizing serverless platforms with serverlessbench. In Proceed-
ings of the ACM Symposium on Cloud (SoCC’20). ACM, New York, NY, 30–44.
[149] Yinghao Yu, Renfei Huang, Wei Wang, Jun Zhang, and Khaled Ben Letaief. 2018. SP-Cache: Load-balanced,
redundancy-free cluster caching with selective partition. In Proceedings of the International Conference for High Per-
formance Computing, Networking, Storage, and Analysis. IEEE, Los Alamitos, CA, Article 1, 13 pages.
[150] Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. 2019. MArk: Exploiting cloud services for cost-effective,
SLO-aware machine learning inference serving. In Proceedings of the 2019 USENIX Annual Technical Conference
(USENIX ATC’19). 1049–1062. https://www.usenix.org/conference/atc19/presentation/zhang-chengliang.
[151] Haoran Zhang, Adney Cardoza, and Peter Baile Chen. 2020. Fault-tolerant and transactional stateful serverless work-
flows. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI’20).
1187–1204.
[152] Tian Zhang, Dong Xie, Feifei Li, and Ryan Stutsman. 2019. Narrowing the gap between serverless and its state with
storage functions. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’19). ACM, New York, NY, 1–12.
[153] Wen Zhang, Vivian Fang, Aurojit Panda, and Scott Shenker. 2020. Kappa: A programming framework for serverless
computing. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’20). ACM, New York, NY, 328–343.
[154] Ge Zheng and Yang Peng. 2019. GlobalFlow: A cross-region orchestration service for serverless computing services.
In Proceedings of the 12th IEEE International Conference on Cloud Computing (CLOUD’19). IEEE, Los Alamitos, CA,
508–510.
[155] Wenjia Zheng, Michael Tynes, Henry Gorelick, Ying Mao, Long Cheng, and Yantian Hou. 2019. FlowCon: Elastic
flow configuration for containerized deep learning applications. In Proceedings of the 48th International Conference
on Parallel Processing (ICPP’19). ACM, New York, NY, Article 87, 10 pages.
[156] Jianlong Zhong and Bingsheng He. 2014. Medusa: Simplified graph processing on GPUs. IEEE Trans. Parallel Distrib.
Syst. 25, 6 (June 2014), 1543–1552. https://doi.org/10.1109/TPDS.2013.111
ACM Computing Surveys, Vol. 54, No. 10s, Article 220. Publication date: September 2022.