Joint Service Caching and Computation Offloading Scheme Based On Deep Reinforcement Learning in Vehicular Edge Computing Systems
Joint Service Caching and Computation Offloading Scheme Based On Deep Reinforcement Learning in Vehicular Edge Computing Systems
Publication date
05-01-2023
Licence
This work is made available under the Copyright not evaluated licence and should only be used in accordance
with that licence. For more information on the specific terms, consult the repository record for this item.
Document Version
Accepted version
Published in
IEEE Transactions on Vehicular Technology
   Abstract—Vehicular edge computing (VEC) is a new com-                     vehicular applications, such as augmented reality (AR) navi-
puting paradigm that enhances vehicular performance by in-                   gation and autonomous driving. However, offloading all the
troducing both computation offloading and service caching,                   computing tasks to the remote cloud results in a heavy
to resource-constrained vehicles and ubiquitous edge servers.
Recent developments of autonomous vehicles enable a variety                  backhaul load and unacceptable latency. Through vehicular
of applications that demand high computing resources and low                 edge computing (VEC) [1], vehicular users can offload their
latency, such as automatic driving, auto navigation, etc. However,           tasks via vehicle-to-infrastructure (V2I) communications to
the highly dynamic topology of vehicular networks and limited                edge nodes to reduce response latency, which caters for un-
caching space at resource-constrained edge servers calls for intel-          precedentedly exploding data traffic and increasingly stringent
ligent design of caching placement and computation offloading.
Meanwhile, service caching decisions are highly correlated to the            requirements of vehicular applications [2].
computation offloading decisions, which pose a great challenge
to effectively design service caching and computation offloading
strategies. In this paper, we investigate a joint optimization prob-            Task computation requires both input task data from users
lem by integrating service caching and computation offloading                and program data installed on edges, which are defined as
in a general VEC scenario with time-varying task requests. To                content caching and service caching, respectively. Content
minimize the average task processing delay, we formulate the                 caching refers to caching of the input data needed and output
problem using long-term mixed integer non-linear programming                 data generated (e.g., in computational or infotainment appli-
(MINLP) and propose an algorithm based on deep reinforcement
learning to obtain a suboptimal solution with low computation                cations) at vehicles and edge nodes [3]–[7]. Since these data
complexity. The simulation results demonstrate that our proposed             dominate mobile data traffic, content caching at edge servers
scheme exhibits an effective performance improvement in task                 can effectively alleviate mobile traffic on backhaul links and
processing delay compared with other representative benchmark                reduce content delivery latency [3]. On the other hand, service
methods.                                                                     caching refers to caching the specific programs for task execu-
  Index Terms—Vehicular edge computing, service caching, com-                tion [8]–[14]. As a motivating example, in object detection, the
putation offloading, deep reinforcement learning.                            input data consists of videos and radar sensor data, and task
                                                                             execution requires the corresponding object detection service
                         I. I NTRODUCTION                                    program to be cached in the vehicle or the edge server. The
                                                                             input data of object detection service is typically unique and
T    HE rapid development of Internet of Things (IoT) and
     artificial intelligence (AI) has recently led to the emer-
gence of diverse computation-intensive and latency-sensitive
                                                                             hardly reusable for other executions. In comparison, service
                                                                             program data in the cache is evidently reusable by future
                                                                             executions of the same type of tasks. Because edge servers
   Copyright (c) 2022 IEEE. Personal use of this material is permitted.      have limited caching space, how to selectively cache service
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to pubs-permissions@ieee.org.    program over space (e.g., at multiple edge servers) and time
   Manuscript received June 2, 2022; revised September 4, 2022; accepted     resources for achieving optimum transmission and computing
December 9, 2022. This work was supported in part by Guangzhou Sci-          performance is crucial for efficient task computation.
ence and Technology Plan Project under Grant 202201010239; in part by
Graduate Education & Innovation Project of Guangdong Province under
Grant 2022XSLT022; in part by Guangdong Introducing Innovative and
Entrepreneurial Teams of The Pearl River Talent Recruitment Program             The design of optimal computation offloading and service
(2021ZT09X044); in part by Guangdong Introducing Outstanding Young           caching faces many challenges in vehicular networks. First,
Scholars of The Pearl River Talent Recruitment Program (2021QN02X546);       vehicles and edge servers can only cache a small number
in part by Guangdong Provincial Key Laboratory of Photonics Information
Technology (No. 2020B121201011); in part by the Science and Technology       of service programs at a time due to limited storage space.
Program of Guangzhou under Grant 202102020869, and the Guangdong Basic       Thus, which service programs should be cached needs to
and Applied Basic Research Foundation under Grant 2022A1515110602 and        be decided judiciously. Second, the computing resources on
Grant 2022A1515010153. (Corresponding author : Chang Liu.)
   Zheng Xue, Chang Liu, Canliang Liao and Guojun Han are with               different edge servers may be unevenly distributed. It is critical
School of Information Engineering, Guangdong University of Technolo-         to balance the computation load by cooperative offloading.
gy, Guangzhou 510006, China. (e-mail: xuezheng@mail2.gdut.edu.cn; li-        Third, the computation offloading decisions and the service
uchang@gdut.edu.cn; canliang148@gmail.com; gjhan@gdut.edu.cn).
   Zhengguo Sheng is with the Department of Engineering and Design, Uni-     caching decisions are closely correlated. Intuitively, we tend
versity of Sussex, Brighton, BN1 9RH, U.K. (e-mail: z.sheng@sussex.ac.uk).   to offload a task if the required program is already cached at
2                                                                       IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XX 2023
the edge.1 Besides, the network status and available resources               communication, caching and computing strategy to minimize
of edge servers change dynamically during the movement                       the system cost. To guarantee the reliability of communication
of vehicles. Therefore, it becomes significant and yet very                  systems, powerful error correction codes, such as low-density
challenging to design an appropriate service caching and                     parity-check (LDPC) codes, can be applied to enhance the
computation offloading strategy in the VEC systems.                          anti-noise and anti-fading capability [25]–[27]. Qiao et al. [20]
   In this paper, we investigate a joint optimization of service             minimize the content access cost for a novel cooperative edge
caching and computation offloading for a VEC system with                     caching framework. Ning et al. [21] design a mobility-aware
limited storage and computing capacities, taking account of                  edge computing and caching scheme to maximize mobile
time-varying task requests and dynamic network topology. In                  network operators’ profits. However, all these works are based
order to make full use of the limited caching and computing re-              on the implicit assumption that all the services are available in
sources of each node (i.e. vehicles and edge servers) as well as             edge servers, which is impractical due to their limited storage
the cooperative offloading between edge servers, we propose                  capacities.
a deep reinforcement learning (DRL)-based service caching                       Service caching, which refers to the caching of related
and computation offloading scheme to provide low-complexity                  programs for computational task execution, can significantly
decision making and adaptive resource management. The main                   affect the performances of MEC systems since service caching
contributions of this paper can be summarized as follows:                    strategies and computation offloading strategies are always
   1) We design a novel edge service caching and computation                 coupled. There has been considerable research focusing on
offloading framework with cooperation among the cloud, edge                  joint service caching, computation offloading and resource
servers and vehicles, which not only balances the computa-                   allocation for mobile users in MEC system [8]–[12]. Yan
tion load among edge servers, but also enables integration                   et al. [8] propose an MEC service pricing scheme to co-
of caching and computing resources combined with edge                        ordinate with service caching decisions and control wireless
intelligence.                                                                devices’ task offloading behaviors in a cellular network. Ko
   2) We minimize the cumulative average task processing                     et al. [9] maximize a sum-utility for multi-mobile service
delay over a long time-slotted horizon, considering dynamic                  caching enabled MEC. Bi et al. [10] formulate a mixed
task requesting, offloading and service caching, as well as                  integer nonlinear programming (MINLP) that jointly optimizes
dynamic channel conditions between vehicles and edge servers                 service caching placement, computation offloading decisions,
at each time slot. To solve the formulated long-term mixed                   and system resource allocation to minimize computation delay
integer non-linear programming (MINLP) problem, we pro-                      and energy consumption of mobile user. Zhang et al. [12]
pose an edge caching and offloading scheme based on deep                     investigate joint service caching, computation offloading and
deterministic policy gradient (DDPG) [15] to efficiently make                resource allocation problem in a general multi-user multi-task
decisions on task offloading and service caching.                            scenario and aim to minimize the weighted sum of all users
   3) We carry out extensive simulations to evaluate the                     computation cost and delay cost. However, the joint caching
average task processing delay and energy consumption of                      and computing strategy optimization in MEC systems cannot
the proposed scheme in VEC. Numerical results demonstrate                    be directly applied to VEC systems. The high mobility of
that our proposed scheme can achieve better performances                     vehicles results in complicated and dynamic topology as well
compared with the other benchmark approaches.                                as frequent server switching.
   The rest of the paper is organized as follows. We review the                 There have been emerging efforts on content caching and
related work in Section II. The system model and the problem                 computation offloading in VEC [20], [21], [28]–[31]. Tian
formulation are described in Section III. In Section IV, the                 et al. [30] propose a collaborative computation offloading
proposed scheme is presented in details. Section V provides                  and content caching method, by leveraging DRL for a self-
numerical results, and Section VI concludes this paper.                      driving system. Wu et al. [31] propose a multi-agent based
                                                                             reinforcement learning (RL) algorithm to make decisions on
                        II. R ELATED W ORK                                   task offloading and edge caching to optimize both service
                                                                             latency and energy consumption of vehicles. The important
   In the past few years, computation offloading in mobile edge
                                                                             difference between content caching and service caching is
computing (MEC) has been intensively discussed [16]–[19].
                                                                             that the latter not only concerns storage capacity but also the
Zhang et al. [16] minimize the average bandwidth consump-
                                                                             computing capacity. Thus, the service caching brings more
tion with a novel bidirectional computation task model by joint
                                                                             challenges to the VEC system design. To the best of our
caching and computing offloading policy optimization. Tout et
                                                                             knowledge, so far only a few works [13], [14] have inves-
al. [17] find the optimal dissemination of computational tasks
                                                                             tigated the problem of joint optimization of service caching
within MEC-based infrastructures while satisfying persona
                                                                             and computation offloading in VEC system. Tang et al. [13]
needs on a wider range of devices and assuring minimal
                                                                             apply application caching to VEC to optimize the response
additional fees imposed by remote execution. Multi-user multi-
                                                                             time for the outsourced applications while satisfying the time
task computation offloading and resource allocation for mobile
                                                                             slot spanned energy consumption constraint. The Lyapunov
edge computing are proposed in [19]. For the task offloading
                                                                             optimization technology is adopted to tackle this constraint
in VEC [1], [2], [20]–[24]. Tan et al. [2] propose a joint
                                                                             issue. Finally, two greedy heuristic algorithms are incorporated
   1 In this paper, we use the terms “edge”, “edge node” and “edge server”   into the drift-plus-penalty based algorithm to help finding the
interchangeably.                                                             approximate optimal solution. Lan et al. [14] propose a fog-
XUE et al.: JOINT SERVICE CACHING AND COMPUTATION OFFLOADING SCHEME                                                                                   3
                                                                                                        TABLE I
based vehicular architecture featuring computation offloading                                     P RIMARY N OTATIONS
with service caching and design the offloading strategies based
on a DRL algorithm to reduce the task computation delay and           Notation               Definition
                                                                      V                      The set of vehicles V = {1, 2, ...v, ...}
energy cost. However, the limitation of storage and computing         E                      The set of edge nodes E = {1, 2, ...e, ...NE }
resources in vehicles or edge servers is not considered in [13],      K                      The set of service programs K = {1, 2, ...k, ...NK }
[14]. Based on the existing literature, service caching and           B                      The total bandwidth of each edge server
                                                                      Bv,e (t)               The bandwidth of edge server e allocated to vehicle v
computation offloading scenarios involving vehicles and edge                                 in time slot t
servers with limited resources in VEC have not been explored.         dk                     The input data size of task k
   In recent years, AI-based algorithms have been successfully        θk                     The required storage space of service program k
                                                                      SvV (SeE )             The storage at each vehicle (edge)
applied in numerous related works on emerging vehicular               γv,e                   The signal-to-noise ratio from edge e to vehicle v
applications and services which are mostly delay-sensitive.           gv,e (t)               The up-link gain between edge e and vehicle v in time
This smart driving assistance thus can significantly improve                                 slot t
                                                                      f V (f E )             The fixed CPU frequency of each vehicle (edge)
driving safety, reduce energy consumption, and enhance traffic        κ                      The computing energy efficiency parameter
management efficiency [32], [33]. Specifically, RL and DRL            λ                      The nature of service application
have been demonstrated to significantly improve the perfor-           cV     E
                                                                       v,k (ce,k )           The caching decision of vehicle v’s (edge e’s) service
mances of vehicular task offloading [14], [20], [21], [29]–[31],                             program k
                                                                      oV     E
                                                                       v,k (oe,k )           The offloading decision of vehicle v’s (edge e’s) task
[34]–[39]. Zhu et al. [35] propose a multiagent DRL-based                                    k
computation offloading scheme to minimize the total task exe-         Netask (t)             The total number of tasks received at edge e in time
cution delay of the considered system during a certain period.                               slot t
                                                                      Redge (Rcloud )        The transmission rate from edge to edge (cloud)
Since current vehicular services always contain multiple tasks        pedge (pcloud )        The transmission power from edge to edge (cloud)
with data dependency, a knowledge driven service offloading
framework is proposed in [36] by exploring the DRL model
to find the long-term optimal service offloading policy. Since                                                         Service program
DRL agents can react to vehicular environment changes in                                                               Caching resource
milliseconds to achieve real-time decision making, they have                       Service A B C D E ...               Computing resource
superiority in complex and highly dynamic vehicular networks                                                        Energy
with fast varying channels and computation load.                                                                       Edge cooperative offloading
   The above studies on VEC are mostly based on the assump-             Task offloading            Service caching
tion that service programs are completely available in vehicles                                                      Edge pool
or edge servers with limited resources, which is not always
feasible in practice. To the best of our knowledge, Refs. [13]
and [14] are the only existing literature that do not assume full
availability of service programs. However, they have not taken
the resource limitation into consideration. To fill this gap, we
aim at joint computation offloading and service caching, taking
full advantage of the limited resources of vehicles and edge
nodes as well as edge cooperative offloading to minimize the
                                                                    Fig. 1.   System illustration.
average task processing time.
                                                                    service-dependent tasks in the system (i.e., running these tasks
   III. S YSTEM M ODEL AND P ROBLEM F ORMULATION                    requires precaching of their corresponding service programs).
   In this section, we first provide the system overview, com-      If the associated service program of a requested task has been
munication model, as well as the service caching and task           cached at the cloud or the edge pool, vehicles can offload
offloading model. Then we analytically derive the computa-          a portion of the computing tasks via wireless communication
tion delay, energy consumption, and provide extreme value           (e.g., cellular vehicle-to-everything (C-V2X)) to the edge pool
analysis of single task processing delay and edge node energy       or the cloud, depending on the trajectory of the vehicle and
consumption. Last we present our problem formulation, which         the location of the cached service program. The edge pool
is essential for service caching and task offloading scheme de-     consists of a set of interconnected edge servers to balance
cision making. The primary notations utilized in the following      different computing and caching resources. Due to the mobility
are summarized in Table I.                                          of vehicles, cooperative edge offloading between multiple edge
                                                                    servers can further improve the caching efficiency. However,
                                                                    unlike the cloud which has abundant computing and storage
A. System Overview                                                  resources, limited computing and caching resources of edge
   As illustrated in Fig. 1, we consider a general vehicular        nodes allow only a small set of services program to be
network consisting of an edge pool (the set of edge nodes           cached at the same time. Therefore, an AI-based management
are denoted as E = {1, 2, ...e, ...NE }), the cloud and a           controller (i.e., agent) is typically deployed at the edge pool,
number of moving vehicles (the set of vehicles are denoted          which collects information from vehicles and edge servers, and
as V = {1, 2, ...v, ...}). Suppose that there are NK ser-           makes decisions on service caching and task offloading. To this
vice programs (e.g., executable .EXE files) corresponding to        end, time is divided into a set of discrete time slot, indexed
4                                                                          IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XX 2023
                                                              Yes                                    Yes
                Yes                                                                                    Case 5                          Case 6
                                                                                              Task completely                 Task completely
                                                           Program                              offloading                      offloading
                                                            cached
                                                                       Yes
                                            Case 1
                                                              No                                  Program
                                     No offloading                                                 cached       Yes
                                            Case 2                             Case 3
                                     Task partially                                                  No
                                                                      No offloading
                                      offloading
                                                                             Case 4
                                                                      Task partially
                                                                       offloading
by {1, 2, ...t, ...tend }, each of which has an equal duration                 the Shannon’s formula:
∆t. The service caching and task offloading strategy can be
                                                                                             Rv,e (t) = Bv,e (t)log2 (1 + γv,e (t)),             (2)
updated at each time slot.
                                                                               where Bv,e (t) is the allocated bandwidth for vehicle v and
B. Communication Model                                                         Bv,e (t) = B/Netask (t).
   To characterize a practical vehicle moving environment,
vehicles with driving speed ranging from Vmin to Vmax are                      C. Service Caching and Task Offloading Model
considered [4], [5] and edge nodes are assumed to be aware                        At the beginning of each time slot t, vehicles entering the
of all the vehicles belonging to its coverage area. We consider                range of edge nodes update task requests, and complete task
a multi-channel uplink model, where each edge has overall                      offloading and computation in this time slot. Before the end
bandwidth of B, and each channel has two possible states (i.e.,                of each time slot, service caching decisions for edge nodes are
occupied and unoccupied). Note that the allocated spectrum is                  made by the management controller, while those for vehicles
the same for different edge nodes and the allocated channels                   are made by the vehicles themselves based on interests. Let
are all orthogonal so that the interference is negligible in the               K = {1, 2, ...k, ...Nk } denote the set of service programs.
coverage area of the same edge node. We assume that each                       Each task can be represented as {dk , θk }, k ∈ K where
vehicle can only send one task request in time slot t, with                    dk denotes data size of the input data and θk denotes the
Netask (t) denoting the total number of tasks received at edge                 required storage space of service program k. Then we have
node e in time slot t. In this case, the signal-to-interference-               the following storage capacity constraints:
plus-noise ratio (SINR) between vehicle v and edge e in time                                       NK
                                                                                                   X
slot t is given by                                                                                       cVv,k (t)θk ≤ SvV , ∀v, t,              (3)
                                 gv,e (t)pv,e (t)                                                  k=1
           γv,e (t) = PN                                       ,       (1)
                        E
                                    PNetask (t)
                                                pv,e (t) + σ 2                 and
                       e=1 gv,e (t)    v=1                                                         NK
                                                                                                   X
            2
where σ refers to the variance of additive white Gaussian                                                cE            E
                                                                                                          e,k (t)θk ≤ Se , ∀v, t.                (4)
                                                                                                   k=1
noise, gv,e (t) denotes the average channel gain, and pv,e (t)
represents the average channel transmission power of edge                      where cVv,k (t) ∈ {0, 1}, cE e,k (t) ∈ {0, 1} are binary decision
node e.                                                                        variables to denote whether service program k is cached (i.e.,
  If vehicle v needs to offload data to the nearest edge e, the                cVv,k (t) = 1, cE                           V             E
                                                                                               e,k (t) = 1) or not (i.e., cv,k (t) = 0, ce,k (t) = 0)
wireless transmission rate at time slot t is calculated based on               on vehicle v and edge node e in time slot t. Sv and SeE   V
XUE et al.: JOINT SERVICE CACHING AND COMPUTATION OFFLOADING SCHEME                                                                                  5
                                                                      TABLE II
                                                TASK O FFLOADING R ATIO OF D IFFERENT C ACHING C ASES
are the storage capacity of each vehicle and each edge server,               exponent parameter. ωk denotes the number of cycles needed
respectively.                                                                for service program k, which is expressed as the number
   In our system model, a partial offloading strategy is used                of computation input data dk multiplied by a factor λ, i.e.
for vehicles’ tasks in each time slot, as illustrated in Fig. 2.             ωk = λdk . Here λ (λ > 0) is determined based on the nature
For example, when vehicle v within communication range                       of service application, (e.g., computational complexity) [40].
of edge e initiates an offloading request for task k, if the                 f V and f E denote the fixed CPU frequencies of the vehicle
corresponding service program is not locally cached, this task               and the edge server, respectively.
will be completely offloaded to the nearest edge. Then if the                   When vehicle v within communication range of the nearest
edge pool does not have the needed service program for task k                edge e initiates a offloading request for task k, the agent
precached, the task will be uploaded to the cloud for execution              selects device nodes that task k should be offloaded to
(Case 6 in Fig. 2). If program k is cached in both the vehicle               according to the cache placement of the service program,
and its nearest edge node, the task can be handled at edge                   and determines the offload ratio for resource allocation. Let
nodes by partial offloading (Case 2 in Fig. 2). If program k                 oVv,k (t) ∈ [0, 1], oE
                                                                                                  e,k (t) ∈ [0, 1] be a continuous decision
is cached in the nearest edge node and the edge pool but not                 variable to denote the ratio of task k offloaded to the nearest
in the vehicle itself, the vehicle completely offloads task k to             edge and edge pool, and (1-oVv,k (t)) and (1-oE e,k (t)) bes the
the nearest edge node, and then task k is partially offloaded                remaining task that is locally executed at vehicle v and the
from the nearest edge node to the edge pool (Case 4 in Fig. 2).              nearest edge e, respectively. Table II lists the mathematical
As can be seen from Fig. 2, tasks are offloaded to different                 expressions of the task offloading ratio in different caching
computing nodes based on the corresponding service caching                   cases (corresponding to those in Fig. 2). Based on those, the
strategies. Therefore, caching and computing resources need                  computation time and transmission time of tasks involving
to be properly allocated by the agent to achieve maximum                     different caching cases are derived as follows.
benefits.                                                                       Then the local execution delay of task k in vehicle v at time
                                                                             slot t can be given as
D. Computation Delay and Energy Consumption
                                                                                                local                                  λdk
  Following [10], the time and energy consumption for the                                      Tv,e,k (t) = cVv,k (t)(1 − oVv,k (t))       .       (7)
                                                                                                                                       fV
computation of task k are calculated as
                                      ωk                                     If task k needs to be offloaded to the nearest edge e, the time
                              Dk =       ,                            (5)    consumption for the input data offloading of task k is
                                      fk
                                                                                            ( P
                                                                                                  NE E                      dk
and                                                                              up          ϕ( e=1    ce,k (t))oVv,k (t) Rv,e (t) , Rv,e (t) > 0,
                                                                               Tv,e,k (t) =
                   εk = κfkα Dk = κωk (fk )α−1 ,                      (6)                                     0,                     Rv,e (t) = 0,
                                                                                                                                                (8)
respectively, where fk denotes the CPU frequency and is
                                                                             where
constrained by a maximum frequency fmax (i.e., fk < fmax ),                                                (
κ (κ > 0) denotes the computing energy efficiency parameter,                                                  1, if x > 0,
                                                                                                 ϕ(x) =                                         (9)
and α (in this work we assume that α = 2) denotes the                                                         0, if x = 0.
6                                                                                 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XX 2023
In (8), Rv,e (t) = 0 indicates that the caching scheme belongs                           We define the average delay of task processing in time slot
to Case 1, and tasks do not need to be offloaded to edge nodes.                       t as
   The execution time on this nearest edge is expressed as                                                                 Ne              task
                                                                                                             NE
                                                                                                           1 X       1       X(t)
                                                                                                                                   total                (15)
             edge                                                λdk                             Td (t) =                         Tv,e,k (t).
            Tv,e,k (t) = cE       V            E
                          e,k (t)ov,k (t)(1 − oe,k (t))              .      (10)                          NE e=1 Netask (t) v=1
                                                                 fE
If the nearest edge server e is unavailable and task k needs                          Similarly, the average energy consumption of edge servers in
to be further offloaded to other edge servers in the edge pool,                       time slot t can be calculated as
                                                                                                                                           task
the uploading time is obtained as                                                                            NE            Ne
                                                                                                                             X(t)
                                                                                                           1 X       1
                                            NE                                                     ε(t) =                         εtotal (t).           (16)
      uppool
                                            X                                                             NE e=1 Netask (t) v=1 v,e,k
     Tv,e,k  (t)   =(1 −    cVv,k (t))ϕ(              cE        V       E
                                                       e,k (t))ov,k (t)oe,k (t)
                                           i=1,i6=e
                          dk                                                          E. Problem Formulation
                       ×       ,
                         Redge                                                           With the emerging of interactive AR/VR services, user ex-
                                                        (11)                          perience is crucial for users’ viscosity, which is the key to the
where Redge denotes the transmission rate among edge servers.                         success of those services. As an important aspect of the user
Then, the delay resulting from the computation at edge pool                           experience, the task processing delay is gradually becoming a
can be given as                                                                       crucial wireless network performance metric. The main goal
                                           NE
                                           X                                          of our work is to design a joint computation offloading and
       pool                                                                           service caching scheme for the purpose of minimizing long-
      Tv,e,k (t) =(1 − cVv,k (t))ϕ(               cE        V       E
                                                   e,k (t))ov,k (t)oe,k (t)
                                       i=1,i6=e                                       term cumulative average task processing time. Specifically, the
                     λdk                                                              optimization problem can be described as:
                    × E.
                     f                                                                                                    tX
                                                                                                                           end
                                                                                                                                           Td (t)
                                                           (12)                                       min                        γ t−1 (          ),    (17)
   Since the output data is typically much smaller than the                               {cE
                                                                                            e,k
                                                                                                (t),oV
                                                                                                     v,k
                                                                                                         (t),oE
                                                                                                              e,k
                                                                                                                  (t)}
                                                                                                                          t=1
                                                                                                                                           Tdmax
input task size, we ignore the time delay of the output return                            s.t.
process [31]. As the computing and storage resources are
                                                                                          cVv,k (t) ∈ {0, 1}, ∀v ∈ V, ∀k ∈ K, ∀t ∈ {1, 2, ..., tend },
abundant in the cloud, we ignore the queuing and computing
                                                                                                                                                    (17a)
latency incurred by the cloud. However, users need to consider
the latency for task offloading to the cloud. Considering                                 cE
                                                                                           e,k (t) ∈ {0, 1}, ∀e ∈ E, k, t,                             (17b)
the transmission delay and calculation delay under different                              0≤     oVv,k (t)   ≤ 1, ∀v, k, t,                            (17c)
caching and offloading decisions, the total delay of processing                           0≤     oE          ≤ 1, ∀e, k, t,
                                                                                                   e,k (t)                                             (17d)
task k in time slot t can be expressed as
                                                                                          0 ≤ γ ≤ 1;                                                   (17e)
                                  up                                                      NK
     total            local
    Tv,e,k (t) = max{Tv,e,k (t), Tv,e,k (t)                                               X
                                                                                                 cVv,k (t)θk ≤ SvV , ∀v, t,                            (17f)
                               edge        uppool            pool
                   +   max{Tv,e,k    (t), Tv,e,k   (t) + Tv,e,k   (t)}}                   k=1
                                                                                          NK
                                                N E                                       X
                                                                            (13)                 cE            E
                                                                                                  e,k (t)θk ≤ Se , ∀e, t.                              (17g)
                                               X
                   +   (1 − cVv,k (t))(1 − ϕ(       cEe,k (t)))
                                                e=1                                       k=1
                                                                                                                 total
time slot t under different caching cases, in which the caching                     Hence, the minimum value of Tv,e,k (t) can be expressed as
decision variable cVv,k (t), cE
                              e,k (t) are not relevant.
                                                                                        total
   1) Case 1:                                                                      min{Tv,e,k (t)} = min{min{TdCase2 (t)}, min{TdCase4 (t)},
                                         λdk
                        TdCase1 (t) = V .                  (18)                                        TdCase6 (t)}.
                                         f                                                                                                  (28)
  2) Case 2:                                                                        The analysis above indicates that the maximum and min-
                                                                                  imum task processing delays are related to the communi-
                                                λdk
           TdCase2 (t) = max{(1 − oVv,k (t))        ,                             cation, computing and caching capabilities of each device.
                                                 fV                               Meanwhile, aiming at minimizing the delay, cache decisions
                                                                           (19)
                                        dk      λdk                               corresponding to the maximum tolerated delay should be
                           oVv,k (t)(          + E )}.
                                      Rv,e (t)  f                                 avoided as much as possible.
                                                                                    Similarly, in order to evaluate the maximum and minimum
  3) Case 3:
                                                                                  values of εtotal
                                                                                             v,e,k (t), we need to derive the energy consumption
                                       dk      λdk                                of edge servers in time slot t under different caching cases.
                TdCase3 (t) =                 + E.                         (20)
                                     Rv,e (t)  f                                    1) Case 1:
                                                                                                            εCase1 (t) = 0.                 (29)
  4) Case 4:
                                                                                    2) Case 2:
                         dk                         λdk
        TdCase4 (t) =          + max{(1 − oEe,k (t)) E ,                                                                           λdk
                        Rv,e                        f                                            εCase2 (t) = κ(f E )2 (ov,k (t)       ).     (30)
                                                                           (21)                                                    fE
                                   dk    λd k
                        oE
                         e,k (t)(       + E )}.                                     3) Case 3:
                                  Redge  f
                                                                                                                              λdk
  5) Case 5:                                                                                        εCase3 (t) = κ(f E )2 (       ).          (31)
                                                                                                                              fE
                               dk      dk    λdk                                    4) Case 4:
           TdCase5 (t) =             +      + E.                           (22)
                             Rv,e (t) Redge  f                                                                               λdk           λdk
                                                                                     εCase4 (t) = κ(f E )2 ((1 − oe,k (t))       + oe,k (t) E )+
  6) Case 6:                                                                                                                 fE            f
                                                                                                                    dk
                                 dk      dk                                                        oe,k (t)pedge
               TdCase6 (t)   =         +       .                           (23)                                    Redge
                               Rv,e (t) Rcloud                                                                                 dk
                                                                                                 = κf E λdk + oe,k (t)pedge         .
From the above analysis, we have                                                                                              Redge
                                                                                                                                              (32)
      total
 max{Tv,e,k (t)} =       max {TdCasei (t)}                                          5) Case 5:
                       {i=1,...6}
                                                                                                                                    dk
                   = max{TdCase1 (t), TdCase5 (t), TdCase6 (t)}                                εCase5 (t) = κf E λdk + pedge             .    (33)
                                                                                                                                   Redge
                           λdk    dk          dk       λdk
                   = max{ V ,            +          + E,                            6) Case 6:
                            f   Rv,e (t) Redge         f
                       dk        dk                                                                                    dk
                              +        }.                                                            εCase6 (t) = pcloud    .                 (34)
                     Rv,e (t) Rcloud                                                                                 Rcloud
                                                            (24)                  Based on the above analysis, we have
                                                                                       max{εtotal
                                                                                            v,e,k (t)} =     max {εCasei (t)}
      total                                                                                                {i=1,...6}
 min{Tv,e,k (t)}   =     min        {TdCasei (t)}
                       {i=1,...6}                                                                       = max{εCase5 (t), εCase6 (t)}
                   =   min{TdCase2 (t), TdCase4 (t), TdCase6 (t)}.                                                                 dk         (35)
                                                         (25)                                           = max{κf E λdk + pedge        ,
                                                                                                                                Redge
In order to eliminate the offloading decision variables, it is                                                    dk
easy to obtain the minimum delays of Case 2 and Case 4                                                    pcloud        }.
                                                                                                                 Rcloud
based on their properties as piecewise linear functions.
                                                                                  From (29) it can be easily observed that min{εtotal
                                                                                                                                    v,e,k (t)} = 0.
                                             E
                                    dk (f        + λRv,e (t))                        From the analysis of the maximum and minimum values
  min{TdCase2 (t)} =      fV fE
                                                                       ,               total
                            λ       +   fV   Rv,e (t) + f E Rv,e (t)              of Tv,e,k  (t), it can be seen that in order to reduce the task
                                                                           (26)   processing delay, it is necessary to make Case 2 and Case 4
                            fV
where ov,k (t) = 1/(1 +      λ   ( Rv,e1 (t) +     λ
                                                  fE
                                                     )).                          caching decisions as many as possible. That is, tasks should be
                                                                                  offloaded to the edge nodes as much as possible. On the other
                          dk     λdk (λRedge + f E )                              hand, energy consumption is also a critical indicator for edge
 min{TdCase4 (t)} =             + E                  ,                     (27)
                        Rv,e (t) f (2λRedge + f E )                               server operators. Offloading tasks to edge nodes can reduce
                                                                                  task processing delays, but it increases energy consumption of
                            fE
where oe,k (t) = 1/(2 +    λRedge ).                                              edge servers.
8                                                                   IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XX 2023
         IV. D EEP R EINFORCEMENT L EARNING FOR E DGE                     • cEe,k (t) : The service caching indicator of edge node e
                   C ACHING AND O FFLOADING                                 after completing all task offloading and calculation in
   Due to diverse user demands and the constrained compu-                   time slot t.
                                                                              V
                                                                          • ov,k (t) : The proportion of vehicle v offloading task k to
tation and caching resources, it is complex to minimize the
cumulative system average delay in (17). Most traditional                   edge node e in time slot t.
                                                                              E
                                                                          • oe,k (t) : The proportion of edge node e offloading task
optimization methods (e.g. convex optimization, game theory,
etc.) are assumed to have the knowledge of key factors in                   k to edge pool in time slot t.
vehicular networks, such as channel conditions and content              The action of edge node e in time slot t is denoted as
popularity. However, these key factors are time-varying and             ae (t) ={[cE (t)]ρmax ×NK , [oV (t)]ρmax ×NK , [oE (t)]ρmax ×NK }.
unavailable in reality. These methods can merely achieve                                                                             (38)
optimal or near optimal results for one snapshot, since they            Then the actions of all edge nodes are denoted as ae (t) after
ignore how the current decision exerts long-term influence on           dimensionality reduction and normalization. The system action
resource allocation [21]. DRL is viewed as an efficient way to          at time slot t consists of the actions of all edge nodes, defined
solve the complicated problem in a dynamic environment by               as
optimizing the expected cumulative reward. In DRL, an agent                              at = {a1 (t), a2 (t), ..., aNE (t)}.        (39)
collects the needed information regarding diverse demands
of users and available resources in vehicular networks. Then              3) Reward: The agent’s behavior is reward-based, and the
the agent takes an action to manage offloading and caching              reward should correspond with the objective function. Hence,
decisions and optimizes resource allocation. We formulate the           we set the reward in time slot t as
joint optimization problem as a discrete-time markov decision                                                 Td (t)
process (MDP) and propose a DDPG-based edge caching and                                 rt = r(st , at ) = −(        ).         (40)
                                                                                                              Tdmax
offloading scheme for joint service caching and computation
offloading strategy design.                                             B. DDPG-Based Edge Caching and Offloading Scheme
                                                                           Since the state space consists of a great amount of dynamic
A. Problem Formulation Based on DRL                                     environmental information and the action space contains con-
  1) State space: At the initial phase of each time slot t, each        tinuous value, we exploit the deep deterministic policy gradi-
edge server gathers all the environmental parameters, which             ent (DDPG) algorithm, a model-free and actor-critic algorithm,
contains the following parts:                                           to solve the joint computation offloading and service caching
  • Iv,k (t) : The request indicator for task k by vehicle v
                                                                        problem. The framework of DDPG-based method is illustrated
     within the coverage range of edge node e at time slot t,           in Fig. 3, consisting of primary networks, target networks and
       V
  • cv,k (t) : The service caching indicator for vehicle v
                                                                        a replay buffer.
     within the coverage range of edge node e at time slot                 Based on the deterministic actor-critic model, we leverage
     t.                                                                 deep neural networks to provide accurate estimation of de-
  • Bv,e (t), γv,e (t) : The bandwidth that edge node e allo-
                                                                        terministic policy function µ(st ) and action-value function
     cates to vehicle v and the received SINR of edge node e            Q(st , at ), which should satisfy the following condition:
     at time slot t.
       E
  • ce,k (t) : The service caching indicator of edge node e at                Q(st , µ(st |θµ )|θQ ) ≈r(st , at )+
                                                                                                                            0       0   (41)
     time slot t.                                                                                     γQ(st+1 , µ(st+1 |θµ )|θQ ).
The state of edge node e at time slot t is denoted as
                                                                        As shown in Fig. 3, The primary networks use the actor
    se (t) ={[I(t)]ρmax ×NK , [cV (t)]ρmax ×NK , [B(t)]ρmax ×NK ,       network µ(st |θµ ) and the critic network Q(st , at |θQ ) to
                [γ(t)]ρmax ×NK , [cE (t)]ρmax ×NK },                    approximate the policy function and the Q-value function,
                                                           (36)         respectively. In addition, The target networks contain a actor
                                                                                         0                                   0
2
  where ρmax denotes the maximum vehicle density within                 network µ(st |θµ ) and a critic network Q(st , at |θQ ) with the
each edge node. The system state at time slot t consists of the         same structure. The target Q-value can be represented as
states of all edge nodes, defined as                                                                                    0       0
                                                                                yt = r(st , at ) + γQµ (st+1 , µ(st+1 |θµ )|θQ ).       (42)
                       st = {s1 (t), s2 (t), ..., sNE (t)}.   (37)         We utilize the actor network to explore the policies and
Where the states of edge node e is denoted as se (t) after              the critic network to critic the policies. The actor network
dimensionality reduction and normalization.                             architecture is illustrated in Fig. 4, which takes the state st
  2) Action space: The agent obtains the states of service              as input, and outputs an action at . The action variable cE
                                                                                                                                  e,k (t)
caching and communication information between vehicles and              needs to be discretized (i.e. dcE
                                                                                                        e,k (t))e).
edge nodes, and then decides the offloading ratio of all tasks             At the beginning of each time slot t, the agent collects
and updates service caching of all edge nodes. Here, the                environmental information to get the current system state st .
corresponding action contains the following parts:                      In order to solve the exploration problem of deterministic
                                                                        policy, we construct the action space by adding behavior noise
    2   Bold letters are used to denote matrices.                       nt to obtain action at = µ(st |θµ ) + nt . After vehicles and
XUE et al.: JOINT SERVICE CACHING AND COMPUTATION OFFLOADING SCHEME                                                                                                                                                                  9
                                                                                                                                        Agent                                       θQ ← τ θQ + (1 − τ )θQ ,
                Replay Sampling                                                                                                                                                            0                    0                 (45)
                buffer
                                                      N ´ ( st , at , rt , st +1 )
                                                                                                                                                                                     θµ ← τ θµ + (1 − τ )θµ ,
    Action
                                                                                                                                                               where τ is the updating coefficient. The whole process of the
                 Primary networks                                                         Target networks
      Actor_P                                Critic_P                          Actor_T                                   Critic_T
                                                                                                                                                               DDPG-based edge caching and offloading scheme is presented
                                                                                                                                                               in Algorithm 1.
          Action                                                                    Action                  
           at = m ( st | q m ) 
                                    
                                      
                                                                               
                                                                                 
                                                                                   
                                                                                     
                                                                                             at +1 = m(st +1 | q m¢ )
                                                                                                                         
                                                                                                                          
                                                                                                                            
                                                                                                                             
                                                                                                                                                               Algorithm 1 DDPG-Based Edge Caching and Offloading
                                                                                                                                                               Algorithm
                        Qm ( st , m ( st | q m ) | q Q )
                                                                                                                   Qm ( st +1 , m ( st +1 | q m ¢ ) | q Q¢ )                                                        0        0
      Update q m
                                                                  Update q Q
                                                                                                                                                                Input: The initial parameters: γ, θµ , θµ , θQ , θQ τ , M ,
                                                       Q                                                                  Q
               Policy gradient Ñq m J (q )                                               Loss function J (q )                                                      tend , D, N .
                                                                                                                                                                Output: Primary optimal actor network parameter θµ .
                                                                                                                                                                1: Initialize primary networks and target networks.
                                                                                                                                                                2: Empty the experience replay buffer D.
Fig. 3. The framework of the DDPG-based method for vehicular edge caching
and offloading.                                                                                                                                                 3: for episode = 1, 2, ...M do
                                                                                                                                                                4:    Initial observation state s0 .
                                                                                                                                                                5:    Add a random Gaussian distributed behavior noise nt
                                                                                                                                                                      for action exploration.
                                                                                                                                                                6:    for t = 1, 2, ...tend do
 Vehiclesÿrequest                
        task                                                                                                                                                   7:       Agent receives normalized observation state st .
    Vehiclesÿ                                                                                               
                                                                                                                           Edgesÿcaching                        8:       Select action at = µ(st |θµ ) + nt .
                                                                                                                          indicators
 caching indicators               
                                                                                                                                                                9:       Perform action at , calculate immediate reward rt , and
                                                                                                                         Proportion of tasks
 Channel bandwidth
                                 
                                                                                   
                                                                                          
                                                                                                           
                                                                                                                           offloaded to the
                                                                                                                                                                         obtain the next normalized state st+1 .
    allocated to                  
                                                                                                         
      vehicles
                                                                                   
                                                                                                                            nearest edges                      10:       if the replay buffer is not full then
                                                                                                                         Proportion of tasks
 The received SINR               
                                                                                                            
                                                                                                                         offloaded to edge
                                                                                                                                                               11:          Store transition (st , at , rt , st+1 ) in replay buffer D.
                                  
   of edge nodes                                                                                                                pools                          12:       else
  Edgesÿcaching                                                                        64                                                                    13:          Randomly replace a transition in replay buffer D
                                  
    indicators
                                                                                128                                                                                         with (st , at , rt , st+1 ).
                                                               256                                                                                             14:          Randomly sample a mini-batch of N transitions
                                                512
                                                                                                                                                                            (sj , aj , rj , sj+1 ), ∀j = 1, 2, ..., N from D.
                                                                                                                                                                                                                               0     0
          Input Layer                                       Hidden Layers                                        Output Layer                                  15:          Set yj = r(sj , aj ) + γQµ (sj+1 , µ(sj+1 |θµ )|θQ ).
                                 TABLE III
                          S IMULATION PARAMETERS                                slot.
                                                                                   5) Least recently used (LRU) edge caching and offloading:
     Parameter                                             Value                The services requested by the edge server in the previous time
     Total number of time slots (tend )                    40
     The duration for each time slot ∆t                    30 s                 slot continue to be cached in the next time slot, the services
     Density of vehicles within range of edge (ρ)          [2,5]                that have not been requested are randomly replaced [41], and
     Each edge communication range                         500 m                the offloading ratio is determined according to the offloading
     Bandwidth of each edge server (B)                     20 MHz
     The value of SINR                                     4∼5 dB               scheme based on latency minimization.
     Number of edge servers (NE )                          3                       6) Executing all tasks in the cloud: All tasks are offloaded
     Number of service types (NK )                         5                    to the cloud for execution.
     Data size of each task (dk )                          20 Mb
     Storage space of each service program (θk )           50 Mb
     CPU cycles of each vehicle (f V )                     5 × 108 cy-
                                                           cles/s               B. Simulation Results
     CPU cycles of each edge server   (f E )               1 × 109 cy-             First, we compare the total delay per episode of different
                                                           cles/s
     Computation intensity for each task (λ)               105 cycles/bit       schemes based on the DDPG learning algorithm. It can be seen
     Storage capacity of each vehicle (SvV )               50 Mb                in Fig. 5 that all schemes can approach their stable cumulative
     Storage capacity of each edge server (SeE )           100 Mb               average delay as the number of episodes increases. Meanwhile,
     The computing energy efficiency parameter (κ)         1 × 10−26
     Transmission rate between edge servers (Redge )       15 Mbps
                                                                                since the energy consumption is related to the task processing
     Transmission rate from edge to the cloud Rcloud       10 Mbps              delay, we evaluate the total energy consumption per episode
     Transmission power between edge servers (pedge )      1W                   in Fig. 6. As the number of episodes increases, except for
     Transmission power from edge to the cloud (pcloud )   2W
     Discount factor (γ)                                   0.99
                                                                                the offloading without edge caching scheme, the total delay
     Learning rate of actor network (lra )                 0.001                of all the other considered schemes decreases and reaches a
     Learning rate of critic network (lrc )                0.002                stable value, while the total energy consumption increases and
     Soft update coefficient (τ )                          0.01
     Size of mini-batch sample (N )                        128
                                                                                reaches a stable value. As analyzed in Sec. III-F, to reduce
     The size of experience replay buffer (D)              10000                latency, it is necessary to offload tasks to edge nodes as much
                                                                                as possible, which on the other hand increases the energy
A. Experimental Settings                                                        consumption of edge nodes. This is verified in Fig. 5 and
   The involved parameters along with their corresponding                       Fig. 6. Another notable point is that our proposed scheme
values are listed in Table III. These previous works [10],                      achieves the lowest task processing latency with the same
[12], [31] can bring rich experience to the parameter settings,                 energy consumption of edge nodes. The offloading without
including the duration for each time slot ∆t, κ,  and so                       edge caching scheme keeps the maximum delay, which is
on. For instance, ∆t should be set appropriately such that                      approximately the same as the total delay that the offloading
on one hand the agent has enough time to make caching and                       scheme based on energy minimization converges to. This is
offloading decisions, and on the other hand tasks offloaded to                  because when aiming to minimize energy consumption, the
edge server can be completely accomplished by the end of time                   offloading ratios are determined as the ones that minimizes
slot. We use Python 3.6 to create a simulation environment                      energy consumption in all the considered cases. However, of-
for the considered vehicular edge caching and task offloading                   floading without edge caching scheme consumes the maximum
system. In the simulation, each vehicle randomly requests its                   energy. The offloading without edge caching scheme remains
task of interest at the beginning of each time slot. The duration               smooth because agent cannot participate in decision-making,
of each time slot is set appropriately such that the agent has                  and edge servers cannot provide computing and caching
sufficient time to make caching and offloading decisions, while                 resources. Our proposed scheme can yield the lowest total
tasks offloaded to other nodes can be accomplished by the                       delay compared with the other benchmark schemes, which
end of a time slot. Furthermore, we use TensorFlow 1.14.0                       demonstrates the efficiency of DDPG-Based edge caching and
to implement the DDPG-based edge caching and offloading                         offloading scheme.
scheme.                                                                            Next, we investigate the effects of vehicle density on the
   We consider the following benchmark methods for perfor-                      total delay and energy consumption for different schemes in
mance comparison:                                                               Fig. 7. As the density of vehicle ρ increases, the number
   1) Offloading without edge caching: Tasks requested by the                   of task requests increases, while the bandwidth allocated to
vehicle are computed locally or offloaded to the cloud.                         each vehicle and the transmission rate of tasks uploaded to
   2) Offloading based on latency minimization: The task                        edge nodes decrease, resulting in an increase in the total task
offloading is performed according to the optimal ratio to                       processing delay. When ρ = 2, 3, the total latency of the
achieve the minimum latency under each case based on the                        executing all tasks in the cloud scheme is lower than that of
DDPG learning algorithm.                                                        the offloading without edge caching scheme. This is because as
   3) Offloading based on energy minimization: The task                         each vehicle occupies more bandwidth resources, tasks can be
offloading is performed according to the optimal ratio to                       uploaded faster to the cloud, which has powerful computing
achieve the minimum energy consumption under each case                          resources. As the vehicle density continues to increase, this
based on the DDPG learning algorithm.                                           superiority vanishes. It can be observed that our proposed
   4) Random edge caching and offloading: The service                           scheme outperforms the other methods in terms of the total
caching and task offloading ratios are random in each time                      delay in the period tend .
XUE et al.: JOINT SERVICE CACHING AND COMPUTATION OFFLOADING SCHEME                                                                                                                                                                                                                                                                                       11
'