Mobility-Aware Vehicle-to-Grid Control Algorithm in Microgrids
Mobility-Aware Vehicle-to-Grid Control Algorithm in Microgrids
   Abstract— In a vehicle-to-grid (V2G) system, electric vehi-                                  still challenging owing to the properties of RES production
cles (EVs) can be efficiently used as power consumers and                                       (e.g., variability, discontinuity, and poor predictability) and
suppliers to achieve microgrid (MG) autonomy. Since EVs can                                     MGs (e.g., distributed generation).
act as energy transporters among different regions (i.e., MGs),
it is an important issue to decide where and when EVs are                                          To achieve efficient DRM in MGs, there is an increasing
charged or discharged to achieve the optimal performance in                                     interest in vehicle-to-grid (V2G) technology, which provides
a V2G system. In this paper, we propose a mobility-aware V2G                                    reliability by supplying power from the energy storage in
control algorithm (MACA) that considers the mobility of EVs,                                    electric vehicles (EVs) [6], [7]. Specifically, EVs can adjust
states of charge of EVs, and the estimated/actual demands of MGs                                their charging or discharging behaviors depending on the
and then determines charging and discharging schedules for EVs.
To optimize the performance of MACA, the Markov decision                                        load profile and their current states of charge (SOC) [8], [9].
process problem is formulated and the optimal policy on charging                                That is, EVs can have a dual role in the electricity market [10]:
and discharging is obtained by a value iteration algorithm. Since                               1) power consumer when their batteries are charged and
the mobility of EVs and the estimated/actual demand profiles                                    2) power supplier when they sell excessive energy from their
of MGs may not be easily obtained, a reinforcement learning                                     batteries. Since most EVs are parked for long time (up to
approach is also introduced. Evaluation results demonstrate
that MACA with the optimal and learning-based policies can                                      22 hours per day) [11], they can effectively perform both
effectively achieve MG autonomy and provide higher satisfaction                                 two roles. However, since each EV has different conditions
on the charging.                                                                                (e.g., arrival/departure time and SOC), an efficient V2G con-
   Index Terms— Vehicle-to-grid (V2G), electric vehicle (EV),                                   trol algorithm is needed to determine the operation for each
microgrid, Markov decision process (MDP), reinforcement learn-                                  EV (i.e., charging and discharging). To address this issue,
ing (RL).                                                                                       several works have been reported in the literature [12]–[15].
                                                                                                Shi and Wong [12] proposed a V2G control algorithm based
                                                                                                on a Markov decision process (MDP), where price uncer-
                          I. I NTRODUCTION                                                      tainty is considered by exploiting a Q-learning algorithm.
                                                                                                Deilami et al. [13] suggested a real-time smart load man-
I  N TRADITIONAL power systems, electricity can be gen-
   erated according to the demands of consumers. That is,
a day-ahead schedule can be generated by predicting con-
                                                                                                agement control strategy to minimize the total cost of
                                                                                                generating energy and the associated grid energy losses.
sumers’ load profiles [1]. Nowadays, renewable energy sources                                   Chen and Duan [14] introduced a two-stage solution algorithm
(RESs) have received high attention [2] and microgrids (MGs)                                    based on a genetic algorithm to find the optimal number of
have been developed as a localized grouping of electricity                                      parking numbers under the optimal scheduling of EVs in MGs.
generations, energy storages, and loads [3], [4]. Even with                                     Liu et al. [15] proposed a V2G control algorithm to achieve
these new trends, demand response management (DRM)1 is                                          frequency regulation and maintain the battery energy over a
                                                                                                certain level.
   Manuscript received April 25, 2017; revised October 19, 2017; accepted                          Even though these works can improve the performance of
March 14, 2018. This work was supported in part by the Korean Govern-
ment (MSIP) through the National Research Foundation (NRF) of Korea under
                                                                                                the V2G control algorithm, they cannot effectively exploit the
Grant 2017R1E1A1A01073742 and in part by the Basic Science Research                             salient features of EVs, i.e., EVs can travel across different
Program through the NRF of Korea supported by the Ministry of Education                         regions and thus act as energy transporters among different
under Grant 2017R1A6A3A03006846. The Associate Editor for this paper
was C. Sommer. (Corresponding author: Sangheon Pack.)
                                                                                                MGs. In particular, when MGs are isolated from the main
   H. Ko is with the Smart Quantum Communication Research Center, Korea                         grid, the features of EVs can be exploited in a more efficient
University, Seoul 02841, South Korea, and also with the Department of Elec-                     manner. For example, most EVs move to working regions
trical and Computer Engineering, University of British Columbia, Vancouver,
BC V6T 1Z4, Canada (e-mail: st_basket@korea.ac.kr).
                                                                                                in the morning, and thus higher electric demand is observed
   S. Pack is with the School of Electrical Engineering, Korea University,                      in the working regions than residential regions. In such a
Seoul 02841, South Korea (e-mail: shpack@korea.ac.kr).                                          situation, EVs can transport the energy from the residential
   V. C. M. Leung is with the Department of Electrical and Computer
Engineering, University of British Columbia, Vancouver, BC V6T 1Z4,
                                                                                                regions to working ones to satisfy the high demand of the
Canada (e-mail: vleung@ece.ubc.ca).                                                             working regions. To balance the difference in power demand
   Color versions of one or more of the figures in this paper are available                     among regions, the mobility of EVs was explained in [10] and
online at http://ieeexplore.ieee.org.
   Digital Object Identifier 10.1109/TITS.2018.2816935
                                                                                                energy transport problems were investigated. However, how to
   1 DRM aims to shape the load profile to balance energy demand and                            optimize the performance of the V2G control algorithm under
supply [5].                                                                                     mobility was not studied.
                      1524-9050 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
                           See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
                       This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
to be charged or discharged following the policy table at the                                    the policy table, the controller sends the updated one to the
current time (Steps 4-5). In this example, since MG 1 has                                        aggregators (Step 8). After that, in Step 9, since EV 1 is in
surplus energy which can be used to charge only one EV and                                       energy-scarce MG 3, EV 1 is discharged (Step 9(a)). On the
EVs 1 and 2 are estimated to move to energy-scarce MG 3                                          contrary, EV 2 and EV 3 in energy-abundant MGs (i.e.,
and energy-abundant MG 2, the aggregator 1 commands a                                            MG 2 and MG 3) are charged (Step 9(b) and (c)).
charging operation to EV 1 (Step 4(a)) while EV 2 is not
charged (Step 4(b)). Since the actual demand of MG 3 is                                                                      III. MDP F ORMULATION
higher than its estimated demand, the aggregator 3 commands                                         To achieve the autonomy of MGs, EVs should be
a discharging operation to EV 3 in Step 4(c). However,                                           charged or discharged with the consideration on the esti-
in Step 5, since EVs are expected to move soon,4 the SOCs of                                     mated/actual demand profiles of MGs and their mobility.
EVs should not be decreased below a certain level. Therefore,                                    To this end, we formulate an MDP model5 with five elements:
the aggregator 3 does not command any discharging operation                                      1) decision epoch; 2) action; 3) state; 4) transition probability;
to EV 3 (Step 5(c)). After EVs’ movements, they inform                                           and 5) reward and cost functions [19], [20]. We also introduce
their SOCs to the aggregators, and then the aggregators                                          an optimality equation and a value iteration algorithm to
forward EVs’ SOCs and mobility profiles to the controller                                        solve the equation. Then, an RL approach is presented for
(Steps 6-7). Based on the updated information, the controller
can reconstruct its policy table. If there is any update in                                        5 The MDP model represents a mathematical framework to model decision-
                                                                                                 making in situations in which outcomes are partially random and partially
  4 At this time, the policy table is constructed based on the high probability                  under the control of the decision maker [18]. Therefore, the MDP model is
that EVs will move to other MGs.                                                                 suitable for deciding the charging and discharging schedules of EVs.
                   This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Ck = [c1 , c2 , . . . , c N E V ] (3)
G = {G 1 , G 2 , . . . , G N P,G } (4)
G k = [g1 , g2 , . . . , g N E V ] (5)
where C means the vector set that describes EVs’ SOCs.                                                                    Dk = [d1 , d2 , . . . , d N MG ]            (9)
In addition, G represents the vector set that illustrates MG
identifications where EVs are located (e.g., MG 1, 2, or 3 in                                where dl describes the difference of the lth MG, and
Figure 1), and H denotes the vector set that represents the                                  N M G is the total number of MGs in the system. That is,
movement phases of EVs. D describes the vector set for                                       dl = I Dl − ADl , where I Dl and ADl denote the estimated
the difference between the estimated demand and the actual                                   demand and the actual demand of the lth MG, respectively.
demand (except EV demands) of MGs.                                                           That is, if I Dl is larger than ADl (i.e., when dl > 0),
  C is denoted by                                                                            |dl | means the surplus volume of electricity in the lth MG.
                                                                                             Otherwise (i.e., when dl < 0), |dl | represents the shortage
                  C = {C1 , C2 , . . . , C N P,C }                                 (2)       volume of electricity in the lth MG.
                     This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
                                                                                            
                                                                                             1, if c j  = c j + ζ τ  , g j  = g j
                                 P[c j  , g j  |c j , g j , h j = 0, a j = 1] =                                                                                          (13)
                                                                                             0, otherwise
                                                                                            
                                                                                             1, if c j  = c j , g j  = g j
                                 P[c j  , g j  |c j , g j , h j = 0, a j = 0] =                                                                                          (14)
                                                                                             0, otherwise
                                                                                            
                                                                                             1, if c j  = c j − ζ τ  , g j  = g j
                              P[c j  , g j  |c j , g j , h j = 0, a j = −1] =                                                                                            (15)
                                                                                             0, otherwise
                                                                                            
                                                                                              pg j gk , if c j  = c j − ηg j gk , g j  = gk
                             P[c j  , g j  |c j ≥ ηg j gk , g j , h j = 1, a] =                                                                                          (16)
                                                                                             0,         otherwise
                                                                                            
                                                                                              pg j gk , if c j  = −1, g j  = gk
                             P[c j  , g j  |c j < ηg j gk , g j , h j = 1, a] =                                                                                          (17)
                                                                                             0,         otherwise
                     This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
 Since EVs move independently, P[Hk |Hk , G k ] =                                             autonomy of MG is more important than the satisfaction of
                                                   
  P[h j  |h j , g j ]. The transition probability of h j can be                               EV owing to low electricity price, a larger value of ω1 can
    j                                                                                          be set. Meanwhile, if the autonomy of MG can be achieved
defined as follows. We assume that the residence time of the                                   without any V2G control, a small value of ω1 can be used.
 j th EV in g j (i.e., MG where the j th EV is currently located)                                 When the actual and estimated demands of MG are compa-
follows an exponential distribution with mean 1/μ j,g j . Then,                                rable, high autonomy of MG can be achieved. Therefore, when
the transition probability from h j = 0 to h j = 1 is given by                                 the actual demand of the lth MG is lower than its estimated
μ j,g j τ [24]. Therefore, when h j = 0, the transition probability                            demand (i.e., dl > 0), the charging EVs located in the lth MG
of h j can be derived as                                                                       have advantages to improve the autonomy of MG. In contrast,
                                 ⎧                                                             if the estimated demand of the lth MG is lower than its actual
                                 ⎪
                                 ⎨1 − μ j,g j τ, if h j  = 0                                  demand (i.e., dl < 0), EVs located in the lth MG should
          P[h j |h j = 0, g j ] = μ j,g j τ,
                                                if h j  = 1 (20)
                                 ⎪
                                 ⎩                                                             be discharged to supply electricity to MG. Specifically, when
                                  0,             otherwise.                                    the total summation of the charging and discharging volumes
Note that a different μ j,g j is used depending on the time                                    in the lth MG is close to dl , high autonomy of the lth MG
to reflect the mobility variance. On one hand, we assume                                       can be obtained. Since the total summation of the charging
that MGs are sufficiently close to each other. Then, an EV                                     and discharging
                                                                                                       	       
 volumes in the	 lth MG
 can be calculated by
can move to another MG within the duration of a decision                                           a j δ g j = l ζ τ , where δ g j = l is a delta function that
                                                                                                 j
epoch. That is, when an EV is in the movement phase (i.e.,
                                                                                               returns 1 if the condition g j = l is true,6 r G (S, A) can be
h j = 1), h j  is always 2. Since consecutive movements do
                                                                                               defined as (24) at the bottom of this page, where exp(·) returns
not generally occur, h j  should be always 0 when h j = 2.
                                                                                               higher value as the input parameter becomes closer to 0. Note
Therefore, P[h j  |h j = 1, g j ] and P[h j  |h j = 2, g j ] can be
                                                                                               that EVs can be charged or discharged over the duration of
represented as
                                                                                              decision epoch by ζ τ .
                                       1, if h j  = 2                                            Since the price for electricity and the satisfaction on the
            P[h j  |h j = 1, g j ] =                            (21)                          charging are considered to define the reward function with
                                       0, otherwise
                                                                                               respect to the EV perspective, r E V (S, A) can be expressed by
and
                                                                                                         r E V (S, A) = ω2 f P (S, A) + (1 − ω2 ) f L (S, A)           (25)
                                      1, if h j  = 0
            P[h j  |h j = 2, g j ] =                                              (22)        where f P (S, A) and f L (S, A) are the reward functions for
                                      0, otherwise.                                            the electricity price and the satisfaction on the charging,
  Since the estimated/actual demands of MGs change inde-                                       respectively. Also, ω2 (0 ≤ ω2 ≤ 1) is the weighted factor
pendently, the differences between their estimated demand                                      between f P (S, A) and g L (S, A). Note that ω2 can be decided
and actual demand     change also independently. Therefore,                                    based on the driver preference. For example, if the driver is
                                                                                              sensitive to the price of electricity, a large ω2 is set to weight
P[Dk  |Dk ] =   P[dl  |dl ], where P[dl  |dl ] can be defined in
                 l                                                                              f P (S, A). Otherwise, a small ω2 can be used to maximize the
a statistical manner.                                                                          satisfaction on the charging.
                                                                                                   The electricity price is influenced by the difference between
E. Reward and Cost Functions                                                                   the estimated demand and actual demand (i.e., d). That is,
   To define the reward and cost functions, we consider both                                   the electricity price is non-decreasing function of d. Moreover,
the grid perspective and the EV perspective. In terms of                                       when EV is charged (discharged), the EV owner should pay
the grid perspective, the autonomy of MGs can be taken                                         (receive) an electricity fee. That is, the electricity price is
into account. In terms of the EV perspective, the price for                                    affected by dg j and a j . For example, an EV owner should
electricity and the satisfaction on the charging are considered.                               pay an expensive electricity fee to charge its EV at an energy-
Therefore, the total reward function, r (S, A), is defined as                                  scarce j th MG. Meanwhile, if an EV located at the j th MG
                                                                                               is discharged, the EV owner can receive high profit. Then,
        r (S, A) = ω1 r G (S, A) − (1 − ω1 ) r E V (S, A)                           (23)        f P (S, A) can be described by
where r G (S, A) and r E V (S, A) are the reward functions with                                                               1          	         
                                                                                                              f P (S, A) =            PM dg j , a j          (26)
respect to the grid and EV perspectives, respectively. In addi-                                                             NE V
                                                                                                                                                   j
tion, ω1 (0 ≤ ω1 ≤ 1) is a weighted factor to balance
                                                                                               where PM (d, a) is the price model. This price model can be
r G (S, A) and r E V (S, A). The relative importance of reward
                                                                                               defined according to the policy of the grid operators.
functions, r G (S, A) and r E V (S, A), can be changed depending
on the perspective of either the grid or EV. For example, if the                                 6 If condition g = l is not true, a delta function returns 0.
                                                                                                                 j
                                                                              ⎛       ⎛                       ⎞ ⎞
                                                                                                              
                                                              1                                	       
        
                                       r G (S, A) =                           ⎝  
                                                                           exp − dl − ⎝   a j δ g j = l ζ τ  ⎠
                                                                                                               ⎠                                                        (24)
                                                           NM G
                                                                       l                j                       
                        This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
   If EV is sufficiently charged while parked, the driver does                                    G. Reinforcement Learning (RL)
not need to stop by any charging station during his trip.                                            If P[S  |S, A] cannot be easily obtained due to some
Otherwise, the driver should visit a charging station to recharge                                 reasons,7 an RL approach, where an agent (i.e., the controller)
his EV. Since the charging is a time-consuming procedure,                                         learns what action to take by trial-and-error, can be used and
the latter case degrades the satisfaction of the driver in the                                    turns out to be a good solution for deriving a near-optimal
V2G system. f L (S, A) is defined by considering this situation.                                  policy [27]. In this paper, Q-learning [28] is exploited due to
We assume that the j th EV stops by a charging station                                            its simplicity. The agent interacts with its environment over
and is charged as much as necessary to move to a MG                                               sequence Te . The quality function (i.e., Q-value) of state-
if there is no sufficient SOC (i.e., c j = −1). In such a                                         action pair, Q(S, A), is defined as the expected long-term
situation, the satisfaction of the j th EV’s user can be degraded.                                discount reward of state S with policy π. The objective of
Therefore, f L (S, A) can be represented by                                                       the Q-learning algorithm is to find an optimal policy πopt
                                 1       	        
                                              that maximizes the Q-value of each state S, i.e., πopt =
             f L (S, A) =            L j δ c j = −1 .                                  (27)       arg max Q(S, A). To this end, the agent iteratively learns
                                NE V                                                                 A∈A
                                           j
                                                                                                  optimal Q-values without knowledge of P[S  |S, A]. That is,
where L j denotes the satisfaction degradation degree of the                                      when the agent in state S conducts action A, the agent receives
j th EV.                                                                                          reward r and updates the Q-value of state-action pair (S, A)
                                                                                                  as
                                                                                                                                                              
                                                                                                                                           	  
where λ is a discount factor in the MDP model. λ closer                                                                      IV. E VALUATION R ESULTS
to 1 gives more weights to the future rewards. The solu-                                              For the performance evaluation, we compare the proposed
tion of the optimality equations correspond to the maximum                                        scheme, SM AC A , with the following three schemes: 1) SF C
expected total reward and the optimal policy. To solve the                                        where EVs are fully charged while they do not conduct
optimality equation and obtain the optimal policy, δ, we use                                      any discharge actions; 2) SMC where the SOCs of EVs are
a value iteration algorithm, as shown in Algorithm 1, where                                       maintained at the minimum level (i.e., EVs are charged or dis-
|v| = max[v(S)] for S ∈ S.                                                                        charged to the minimum level); and 3) S D BC where EVs are
                                                                                                  charged based on the difference between the estimated demand
                                                                                                  and the actual demand, i.e., EVs located in MGs whose actual
Algorithm 1 Value Iteration Algorithm                                                             demands are lower than their estimated demands are charged,
                                                                                                  while EVs located in MGs whose actual demands are higher
1: Set v 0 (S) = 0 for each state S. Specify 
 > 0, and set
                                                                                                  than their estimated demands are discharged.
   k = 0.
                                                                                                      The number of MGs is set to three. We assume that
                    compute v (S) by
2: For each state S,           k+1
                                                                                                 the estimated and actual demands are dynamically changed.
                                                
   v (S) = max r (S, A) +
    k+1                            λP[S |S, A]v (S )
                                               k                                                  The price is proportional to the difference dg j between the
                  A∈A                          S  ∈S                                             estimated demand and the actual demand in MG where the
3: If |v k+1 (S) − v k (S)| < 
(1 − λ)/2λ, go to step 4.                                           j th EV is located. The other default parameter settings are
   Otherwise, increase k by 1 and return to step 2.                                               summarized in Table II.
4: For each state s  ∈ S, compute the stationary optimal policy
                                                                                                   7 Some EVs do not want to submit their mobility profiles to the controller
                                                       
   δ(S) = arg max r (S, A) +          λP[S |S, A]v (S )
                                                   k+1                                            and some MGs (aggregators) do not provide their actual/ideal demands due
                 A∈A                        S  ∈S                                                to their privacy issues. In these situations, it is not easy to derive P[S  |S, A].
     and stop.                                                                                    In addition, the exact transition probability for P[S  |S, A] cannot be obtained
                                                                                                  if sufficient statistics are not collected.
                       This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 3. Effect of ω1 to balance the grid perspective reward function and the EV perspective reward. (a) Expected total reward. (b) Autonomy degree.
                                    TABLE II
                     D EFAULT PARAMETER S ETTINGS
A. Effect of ω1
   The effect of the weighted factor ω1 to balance the
reward functions with respect to the grid perspective and
the EV perspective on the expected total reward is shown
in Figure 3(a). From Figure 3(a), it can be found that SM AC A
has the highest expected total reward regardless of ω1 . This
can be explained as follows. SM AC A chooses the most appro-
priate action by considering comprehensively the estimated                                       Fig. 4. Effect of satisfaction degradation degree L on the expected number
                                                                                                 of situations where sufficient SOC is not supported.
and actual demands of MGs, electricity price, and satisfaction
on the charging. In other words, SM AC A commands actively
charging actions to EVs located at an energy-abundant MG
                                                                                                 the grid perspective reward (i.e., the autonomy of MG) is more
with low electricity price. In addition, in SM AC A , EVs are
                                                                                                 important than the EV perspective reward. In such a situation,
discharged when they are in an energy-scarce MG while
                                                                                                 SM AC A preferentially commands charging (or discharge)
considering the satisfaction on the charging (i.e., drivers need
                                                                                                 operations to EVs in energy-abundant (or energy-scarce) MGs.
not to stop by a charging station during his trip, because
                                                                                                 Moreover, EVs expected to move to energy-scarce (or energy-
SM AC A does not command excess discharging actions even
                                                                                                 abundant) MGs are aggressively charged (or discharged)
though EVs are in an energy-scarce MG). Meanwhile, other
                                                                                                 to efficiently act as energy transporters among different
comparison schemes follow fixed actions without considera-
                                                                                                 MGs. In so doing, energy can be naturally transported from
tion of these parameters. Specifically, in S D BC , EVs located
                                                                                                 energy-abundant MGs to energy-scarce MGs, and therefore
in MGs whose actual demands are lower than their estimated
                                                                                                 SM AC A can achieve high MG autonomy. On the other hand,
demands are always charged, while EVs located in MGs
                                                                                                 SMC has the lowest autonomy degree among the comparison
whose actual demands are higher than their estimated demands
                                                                                                 schemes. This can be explained as follows. In SMC , the SOCs
are always discharged. This operation does not consider any
                                                                                                 of EVs are maintained at the minimum level (i.e., EVs are
perspectives of EV such as the electricity price and satisfaction
                                                                                                 charged or discharged to the minimum level). Therefore, EVs
on the charging. Therefore, when the reward with respect to
                                                                                                 in SMC cannot play a role as energy transporters efficiently.
the EV perspective is important (i.e., ω1 is small), the expected
total reward of S D BC can be very low.
   Figure 3(b) shows the autonomy degree as a function of ω1 .                                   B. Effect of L
Note that the autonomy degree of S D BC is normalized to 1.                                         Figure 4 shows the expected number of situations where
In Figure 3(b), it can be found that SM AC A operates adaptively                                 sufficient SOC is not supported, E[δ(c = −1)], as a function
even when ω1 is changed. Specifically, when ω1 is large                                          of L, which represents the satisfaction degradation degree
(i.e., 0.7 ∼ 0.9), the autonomy degree of SM AC A is larger than                                 when sufficient SOC is not supported. In this result, ω1 is
1. This can be explained as follows. Large ω1 represents that                                    set to 0.2. From Figure 4, it can be found that E[δ(c = −1)]
                        This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[15] H. Liu, Z. Hu, Y. Song, and J. Lin, “Decentralized vehicle-to-grid control                                            Sangheon Pack (SM’11) received the B.S. and
     for primary frequency regulation considering charging demands,” IEEE                                                  Ph.D. degrees in computer engineering from Seoul
     Trans. Power Syst., vol. 28, no. 3, pp. 3480–3489, Aug. 2013.                                                         National University, Seoul, South Korea, in 2000 and
[16] J. A. Jardini, C. M. V. Tahan, M. R. Gouvea, S. U. Ahn, and                                                           2005, respectively. From 2005 to 2006, he was a
     F. M. Figueiredo, “Daily load profiles for residential, commercial and                                                Post-Doctoral Fellow with the Broadband Commu-
     industrial low voltage consumers,” IEEE Trans. Power Del., vol. 15,                                                   nications Research Group, University of Waterloo,
     no. 1, pp. 375–380, Jan. 2000.                                                                                        Waterloo, ON, Canada. In 2007, he joined the Fac-
[17] W. Kempton and J. Tomić, “Vehicle-to-grid power implementation: From                                                 ulty of Korea University, Seoul, South Korea, where
     stabilizing the grid to supporting large-scale renewable energy,” J. Power                                            he is currently a Full Professor with the School
     Sources, vol. 144, no. 1, pp. 268–279, Jun. 2005.                                                                     of Electrical Engineering. His current research
[18] H. Ko, G. Lee, D. Suh, S. Pack, and X. Shen, “An optimized and                                                        interests include Future Internet, softwarized
     distributed data packet forwarding in LTE/LTE-A networks,” IEEE                              networking (SDN/NFV), information-centric networking/delay tolerant
     Trans. Veh. Technol., vol. 65, no. 5, pp. 3462–3473, May 2016.                               networking, and vehicular networks. He was the recipient of the IEEE/Institute
[19] M. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic                          of Electronics and Information Engineers Joint Award for IT Young Engineers
     Programming. Hoboken, NJ, USA: Wiley, 1994.                                                  Award 2017, the Korean Institute of Information Scientists and Engineers
[20] E. A. Feinberg and A. Shwartz, Handbook of Markov Decision                                   Young Information Scientist Award 2017, the Korean Institute of Communica-
     Processes: Methods and Applications. Norwell, MA, USA: Kluwer,                               tions and Information Sciences Haedong Young Scholar Award 2013, the LG
     2002.                                                                                        Yonam Foundation Overseas Research Professor Program in 2012, and the
[21] X. Hu, S. E. Li, and Y. Yang, “Advanced machine learning approach                            IEEE ComSoc APB Outstanding Young Researcher Award in 2009. He served
     for lithium-ion battery state estimation in electric vehicles,” IEEE Trans.                  as a Publicity Co-Chair of IEEE SECON 2012, a Co-Chair of the IEEE VTC
     Transport. Electrific., vol. 2, no. 2, pp. 140–149, Jun. 2016.                               2010-Fall Transportation Track and the IEEE WCSP 2013 Wireless Network-
[22] X. Wu, D. Freese, A. Cabrera, and W. Kitch, “Electric vehicles’ energy                       ing Symposium, the Publication Co-Chair of IEEE INFOCOM 2014 and ACM
     consumption measurement and estimation,” Elsevier Transp. Res. D,                            MobiHoc 2015, and a TPC Chair of EAI Qshine 2016. He is an Editor of
     Transp. Environ., vol. 34, no. 1, pp. 52–67, Jan. 2015.                                      Journal of Communications Networks and IET Communications. He is a Guest
[23] T. Hyodo, D. Watanabe, and M. Wu, “Estimation of energy consumption                          Editor of IEEE T RANSACTIONS ON E MERGING T OPICS IN C OMPUTING.
     equation for electric vehicle and its implementation,” in Proc. World
     Conf. Transp. Res. (WCTR), Jul. 2013, pp. 1–12.
[24] T. Guo, A. Ul Quddus, N. Wang, and R. Tafazolli, “Local mobility                                                       Victor C. M. Leung (S’75–M’89–SM’97–F’03)
     management for networked femtocells based on X2 traffic forwarding,”                                                   received the B.A.Sc. (Hons.) and Ph.D. degrees
     IEEE Trans. Veh. Technol., vol. 62, no. 1, pp. 326–340, Jan. 2013.                                                     in electrical engineering from University of British
[25] J. Pan and W. Zhang, “An MDP-based handover decision algo-                                                             Columbia (UBC) in 1977 and 1982, respectively.
     rithm in hierarchical LTE networks,” in Proc. IEEE Veh. Technol.                                                       He received the APEBC Gold Medal as the Head
     Conf. (VTC-Fall), Sep. 2012, pp. 1–5.                                                                                  of the graduating class in the Faculty of Applied
[26] H. Tabrizi, G. Farhadi, and J. Cioffi, “A learning-based network selection                                             Science. He attended the Graduate School, UBC,
     method in heterogeneous wireless systems,” in Proc. IEEE Global                                                        on a Canadian Natural Sciences and Engineering
     Telecommun. Conf. (Globecom), Dec. 2011, pp. 1–5.                                                                      Research Council Postgraduate Scholarship.
[27] M. E. Helou, M. Ibrahim, S. Lahoud, K. Khawam, D. Mezher, and                                                            From 1981 to 1987, he was a Senior Member of
     B. Cousin, “A network-assisted approach for RAT selection in hetero-                                                   Technical Staff and a Satellite System Specialist at
     geneous cellular networks,” IEEE J. Sel. Areas Commun., vol. 33, no. 6,                      MPR Teltech Ltd., Canada. In 1988, he was a Lecturer with the Department
     pp. 1055–1067, Jun. 2015.                                                                    of Electronics, The Chinese University of Hong Kong. He returned to UBC as
[28] C. Watkins and P. Dayan, “Technical Nnote: Q-learning,” Mach. Learn.,                        a Faculty Member in 1989, and currently holds the positions of a Professor
     vol. 8, no. 3, pp. 279–292, May 1992.                                                        and the TELUS Mobility Research Chair in advanced telecommunications
[29] H. Ko, J. Lee, and S. Pack, “MALM: Mobility-aware location man-                              engineering with the Department of Electrical and Computer Engineering. He
     agement scheme in femto/macrocell networks,” IEEE Trans. Mobile                              has co-authored over 1000 journal/conference papers, 39 book chapters, and
     Comput., vol. 16, no. 11, pp. 3115–3125, Nov. 2017.                                          co-edited 14 book titles. His several papers had been selected for best paper
                                                                                                  awards. His research interests include the broad areas of wireless networks
                                                                                                  and mobile systems.
                                                                                                    Dr. Leung is a fellow of the Royal Society of Canada, the Engineering Insti-
                                                                                                  tute of Canada, and the Canadian Academy of Engineering. He is a registered
                                                                                                  Professional Engineer in the Province of British Columbia, Canada. He was a
                                                                                                  Distinguished Lecturer of the IEEE Communications Society. He received the
                                                                                                  IEEE Vancouver Section Centennial Award, the 2011 UBC Killam Research
                                                                                                  Prize, and the 2017 Canadian Award for Telecommunications Research.
                                                                                                  He has co-authored papers that received the 2017 IEEE ComSoc Fred W.
                                                                                                  Ellersick Prize and the 2017 IEEE Systems Journal Best Paper Award. He has
                       Haneul Ko received the B.S. and Ph.D. degrees                              served on the editorial boards of IEEE J OURNAL ON S ELECTED A REAS IN
                       from the School of Electrical Engineering, Korea                           C OMMUNICATIONS —W IRELESS C OMMUNICATIONS S ERIES AND S ERIES
                       University, Seoul, South Korea, in 2011 and 2016,                          ON G REEN C OMMUNICATIONS AND N ETWORKING , IEEE T RANSACTIONS
                       respectively. From 2016 to 2017, he was a Post-                            ON W IRELESS C OMMUNICATIONS, IEEE T RANSACTIONS ON V EHICULAR
                       Doctoral Fellow with the Mobile Network and Com-                           T ECHNOLOGY, IEEE T RANSACTIONS ON C OMPUTERS , IEEE W IRELESS
                       munications Laboratory, Korea University. He is                            C OMMUNICATIONS L ETTERS , and Journal of Communications and Networks.
                       currently a Visiting Post-Doctoral Fellow with Uni-                        He has guest-edited many journal special issues, and provided leadership to the
                       versity of British Columbia, Vancouver, BC, Canada.                        organizing committees and technical program committees of numerous confer-
                       He is also with the Smart Quantum Communication                            ences and workshops. He is serving on the editorial boards of T RANSACTIONS
                       Research Center, Korea University. His research                            ON G REEN C OMMUNICATIONS AND N ETWORKING , IEEE T RANSACTIONS
                       interests include 5G networks, mobility manage-                            ON C LOUD C OMPUTING , IEEE A CCESS , Computer Communications, and
ment, mobile cloud computing, SDN/NFV, and Future Internet.                                       several other journals.