0% found this document useful (0 votes)
17 views8 pages

Federated Edge Learning: Design Issues and Challenges: Afaf Ta Ik and Soumaya Cherkaoui

Uploaded by

hadji salim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

Federated Edge Learning: Design Issues and Challenges: Afaf Ta Ik and Soumaya Cherkaoui

Uploaded by

hadji salim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1

Federated Edge Learning : Design Issues and


Challenges
Afaf Taı̈k Student Member, IEEE and Soumaya Cherkaoui, Senior Member, IEEE

Abstract—Federated Learning (FL) is a distributed machine significant delays can be caused by stragglers. Moreover,
learning technique, where each device contributes to the learning communication loads across devices limit the scalability of
model by independently computing the gradient based on its local FL for large models. Participating devices communicate full
training data. It has recently become a hot research topic, as it
model updates during every training iteration, which are of the
arXiv:2009.00081v2 [cs.DC] 27 Jan 2022

promises several benefits related to data privacy and scalability.


However, implementing FL at the network edge is challenging same size as the trained model. For large models, such as deep
due to system and data heterogeneity and resources constraints. neural networks, the model size can be in the range of giga-
In this article, we examine the existing challenges and trade-offs bytes. As a result, if communication bandwidth is limited or
in Federated Edge Learning (FEEL). The design of FEEL algo- communication is costly, FEEL can be deemed impractical or
rithms for resources-efficient learning raises several challenges.
These challenges are essentially related to the multidisciplinary unfeasible, as communication overhead becomes a bottleneck
nature of the problem. As the data is the key component of the for FEEL.
learning, this article advocates a new set of considerations for Furthermore, end devices have limited battery lives and
data characteristics in wireless scheduling algorithms in FEEL. varying available energy levels. As training ML models is
Hence, we propose a general framework for the data-aware a computation-heavy task, only devices that have enough
scheduling as a guideline for future research directions. We also
discuss the main axes and requirements for data evaluation and energy can be solicited to participate. Furthermore, energy
some exploitable techniques and metrics. and computational constraints limit both the size of the models
that can be trained on-device, and the number of local training
Keywords—Challenges; Data Diversity; Device Scheduling;
Design; Federated Learning; Resources Allocation. iterations.
Additionally, as the data collected by the clients depends
I. I NTRODUCTION on their local environment and usage pattern, both the size
and the distribution of the local datasets will typically vary
The growing interest in intelligent services motivates the
between different clients. This non-Independently and Iden-
integration of artificial intelligence (AI) in Internet of Things
tically Distributed (non-IID) and unbalanced nature of data
(IoT) applications. The collection of large volumes from the
across the network imposes significant challenges linked to
different devices and sensors is necessary for training AI
models’ convergence.
models. However, uploading massive data generated by con-
Consequently, designing an efficient FEEL algorithm should
nected devices to the cloud is usually impractical, mainly due
take into account the limited and heterogeneous nature of the
to issues including privacy, network congestion, and latency.
resources, alongside the non-IID and unbalanced aspect of the
Federated Edge Learning (FEEL) [1] is a Machine Learning
data distributions. In general, proposed FEEL algorithms target
(ML) setting that utilizes edge computing [2, 3] to tackle these
efficient selection of participant devices, optimization of the
concerns. In contrast to centralized ML, Federated Learning
resource allocation and usage, or adequate updates’ aggrega-
(FL) [4] consists of training the model on the devices, with
tion. However, it is hard to capture both the resources problems
the orchestration of a central entity, where only the resultant
and the learning goal, as there is no direct relation between
model parameters are sent to the edge servers to be aggregated.
the model’s loss function and the resource optimization. A
FEEL refers to the use of FL at the edge of the network, which
manageable approach found in current works is to focus on
makes it a promising solution for privacy preserving ML.
resource optimization with certain learning guarantees, such as
An important design decision for a FEEL algorithm is
maximizing the number of collected updates and maintaining
whether to choose either asynchronous or synchronous ag-
the level of local accuracy [6]. Nonetheless, these guarantees
gregation. Recent works tend to promote synchronous train-
are not sufficient, as a significant drop in accuracy is observed
ing, where, for instance, synchronization among participating
when data is non-IID and unbalanced. Therefore, we propose
devices is required for updates averaging [4] and privacy-
to lighten the effects of design trade-offs through the direct
preservation [5]. However, there are many challenges upon
integration of the data properties in the device selection and
using synchronous FL in edge environments.
resource optimization algorithms. In fact, data properties were
To begin with, the heterogeneity of resources across dif-
at the heart of FL since its inception, but they have been largely
ferent devices sparks new system challenges. For instance,
overlooked in the design of FEEL algorithms. Moreover, data
Afaf Taı̈k and Soumaya Cherkaoui are with the INTERLAB Re- diversity has long been premised on in active learning, where
search Laboratory, Faculty of Engineering, Department of Electri- models can be trained using few labelled data samples if the
cal and Computer Science Engineering, University of Sherbrooke,
Sherbrooke, QC J1k 2R1, Canada, (e-mail: afaf.taik@usherbrooke.ca, highly diverse data is selectively added to the training set.
soumaya.cherkaoui@usherbrooke.ca). Thus, data diversity should be considered in the design of
2

FEEL algorithms, as we advocate in this article. Limited Resources: In a contrast to the cloud, the computing
The main contributions of this article can be summarized and storage resources of the devices are very limited. There-
as follows: fore the models that can be trained on device are relatively
• We discuss the FEEL challenges imposed by the nature of simpler and smaller than the models trained on the cloud.
the edge environment, from an algorithms design perspec- Furthermore, devices are frequently offline or unavailable
tive. We review the challenges related to computational either due to low battery levels, or because their resources
and communication capacities, as well as data properties, are fully or partially used by other applications.
as they are at the core of the trade-offs in learning and As for the communication resources, the available band-
resource optimization algorithms. width is limited. It is therefore important to develop
• We propose a general framework for incorporating data communication-efficient methods that allow to send com-
properties in FEEL, by providing a guideline for a pressed or partial model updates. To further reduce com-
thorough algorithm design, and criteria for the choice of munication cost in FEEL settings, two potential directions
diversity measures in both datasets and models. are generally considered 1) reducing the total number of
• We present several possible measures and techniques to communication rounds until convergence [9], and 2) reducing
evaluate data and model diversity, which can be applied the size of the transmitted updates through compression and
in different scenarios (e.g., classification, time series partial updates [4].
forecasting), in an effort to assist fellow researchers to Data: In most cases, data distributions depend on the
further address FEEL challenges. users’ behaviour [10]. As a result, the local datasets are
The remainder of this article is as follows. In Section II, we massively distributed, statistically heterogeneous (i.e., non-
review the challenges found in designing FEEL algorithms, IID and unbalanced), and highly redundant. Additionally, the
and we derive the main trade-offs. Then, we shed the light raw generated data is often privacy-sensitive as it can reveal
on a new data-aware design direction for FEEL algorithms in personal and confidential information.
section III. Some possible techniques and methods to evaluate Small and widely distributed datasets: In FEEL scenarios,
diversity are detailed in this section. At last, a conclusion and a large number of devices participate in the FL training with
final remarks are presented in Section IV. a small average number of data samples per client. Learning
from small datasets makes local models prone to overfitting.
II. D ESIGN CHALLENGES : OVERVIEW Non-IID: The training data on a given device is typically
based on the usage of the device by a particular user, and hence
FEEL has several constraints related to the nature of the
any particular user’s local dataset will not be representative
edge environment. In fact, FEEL involves the participation
of the population distribution. This data-generation paradigm
of heterogeneous devices that have different computation and
fails to comply with the independent and identically distributed
communication capabilities, energy states, and dataset charac-
(IID) assumptions in distributed optimization, and thus adds
teristics. Under device and data heterogeneity, in addition to
complexity to the problem formulation and convergence analy-
resources constraints, participants selection [7] and resource
sis. The empirical evaluation of FEEL algorithms on non-IID
allocation [8] have to be optimized for an efficient FEEL
data is usually performed on artificial partitions of MNIST
solution.
or CIFAR-10, which do not provide a realistic model of a
federated scenario.
A. Design Challenges Unbalance: Similarly to the nature of the distributions, the
The core challenges associated with solving the distributed size of the generated data depends on the user. Depending on
optimization problem are twofold: Resources and Data. These users’ use of the device, these may have varying amounts of
challenges increase the FEEL setting complexity compared to local training data.
similar problems, such as distributed learning in data centers. Redundancy: The unbalance of the data is also observed
Resources: The challenges related to the resources, namely within the local datasets at a single device. In fact, IoT data is
computation, storage and communication, are mainly in terms highly redundant. In sequential data (e.g., video surveillance,
of their heterogeneity and scarcity. sensors data) for instance, only a subset of the data is infor-
Heterogeneity of the resources: The computation, storage mative or useful for the training.
and communication capabilities vary from a device to another. Privacy: The privacy-preserving aspect is an essential re-
Devices may be equipped with different hardware (CPU and quirement in FL applications. The raw data generated on each
memory), network connectivity (e.g., 4G/5G, Wi-Fi), and may device is protected by sharing only model updates instead
differ in available power (battery level). The gap in computa- of the raw data. However, communicating model updates
tional resources creates challenges such as delays caused by throughout the training process can still be reverse-engineered
stragglers. FEEL algorithms must therefore be adaptive to the to reveal sensitive information, either by a third-party or a
heterogeneous hardware and be tolerant toward device drop- malicious central server.
out and low or partial participation. A potential solution to
the straggler problem is asynchronous learning. However, the
reliability of asynchronous FL and the model convergence in B. Design Trade-offs
this setting are not always guaranteed. Thus, synchronous FL Several efforts were made to tackle the aforementioned
remains the preferred approach. challenges. However, FEEL is a multi-dimensional problem
3

that brings about several trade-offs. As a result, algorithms 2) Optimization axes


designed to address one issue at a time are deemed unprac-
tical. Perhaps a tractable solution may be to combine several In addressing FEEL challenges, three optimization axes
techniques when developing and deploying FEEL algorithms. are often considered: Time, Energy and Learning. In many
In general, an end-to-end FEEL solution should cover cases, the FEEL algorithm can be viewed as a Pareto optimal
devices selection, resource allocation, and updates aggregation. problem [14]. The relation between the three axes and the
In the following, we discuss major trade-offs that should be challenges is illustrated in Figure 1.
considered when designing solutions in the FEEL setting. Time optimization: Accelerating the learning time can be
evaluated with different lenses: learning round duration and
1) General FEEL solution time until learning convergence. Due to the synchronous model
Given the wide range of applications that can benefit aggregation of FEEL, the total duration of a round is deter-
from FEEL, there is no one-size-fits-all solution. However, in mined by the slowest device among all the scheduled devices
general, a FEEL solution needs to act on the following aspects: [15]. For this reason, more bandwidth should be allocated
for transmission by stragglers and less for faster devices.
Device selection: Participant selection refers to the selection This to some extent can equalize their total update time
of devices to receive and train the model in each training (computing plus communication time). Furthermore, to avoid
round. Ideally, a set of participants is randomly selected by squandering bandwidth on extremely slow devices, scheduling
the server to participate. Then, the server has to aggregate (i.e., joint selection and resource allocation) should exclude
parameter updates from all participants in the round before slowest devices by applying thresholds on their expected
taking a weighted average of the models. However, due to the completion time, which can be inferred using their computing
communication bottlenecks and the desire to tame the training capacities and channel states. From a learning perspective, the
latency, the device selection should be optimized in terms of learning latency is determined by the number of rounds until
resources [7] and data criteria. convergence. The optimization techniques centered on this
Resource allocation: Device selection should not be con- aspect mainly focus either on the selective upload of updates,
sidered independently from resource allocation, especially or on maximizing the participating devices in each round.
computation and bandwidth. We refer to the joint selection Energy optimization: Optimizing the energy consumption
and resource allocation as a scheduling algorithm. Indeed, across the network is necessary to reduce the rate of drop-
the number of scheduled devices is limited by the available out devices because of battery drainage. In fact, training and
bandwidth that can be allocated. Additionally, for an optimal transmission of large-scale models are energy consuming,
learning round duration and energy consumption, both band- while most edge and end devices have limited battery lives.
width and computation resources should be adapted based on Additionally, using the maximum capacity of the devices
the number of local iterations at each device, and the number would make the users less-likely willing to participate in
of global iterations (i.e., learning rounds) [11]. Due to the fast- the training. A design goal of a scheduling algorithm (i.e.,
changing aspect of the FEEL environment, the computational joint selection and resource allocation) would be to allocate
complexity of scheduling algorithms should be especially low. bandwidth based on the devices’ channel states and battery
Therefore, the use of meta-heuristics and heuristics should be levels. As a result, more bandwidth should be allocated to
encouraged. devices with weaker channels or poorer power states, to
Updates aggregation: This aspect of the solution design maximize the collected updates [6].
refers to how the updates are aggregated and how frequently Learning optimization: In contrast to centralized learning,
they are aggregated. For instance, the frequency of the optimizing the learning in the FEEL setting cannot be seen
communication and aggregation can be reduced with independently from time and energy optimization. However,
more local computation [11], or reduced through selective capturing the optimization of time, energy and the learning
communication of gradients [9]. For instance, FedAvg [12] goal in the same optimization problem is hard, because there
is one of the most used methods in aggregation which is no direct relation between the objective function of the
uses weighted average Stochastic Gradient Descent updates, learning (i.e., the loss function) and the time and energy
where the corresponding weights are decided by the volume minimization goal. A manageable approach used is to min-
of the training dataset. While FedAvg uses synchronous imize time and energy under a certain convergence speed
aggregation, in FedAsync [13] algorithm, newly received guarantee. For instance, some works argue that the number
local updates are weighted according to their staleness, where of collected updates in each round is inversely proportional to
stale updates received from stragglers are weighted less based the convergence speed, and therefore is used as a guarantee
on how many rounds elapsed. It should also be noted that [6]. Indeed, multi-user diversity (i.e., collecting a maximum
proposing new aggregation methods requires theoretical and of updates) can yield a high convergence speed, especially
empirical convergence analysis to guarantee that the learning in IID environments, however there is a significant chance of
loss function will converge to a global optimum. Updates choosing the same sets of devices repeatedly. To avoid this
aggregation should also be communication-efficient [4, 9] issue, a goal of the FEEL algorithm can be to maximize
and secure by the means of techniques such as differential the fairness in terms of the number of collected updates
privacy [5]. among devices [16]. The fairness measure maximizes the
chance of more diverse data sources, thus achieving gradient
4

Fig. 1: FEEL algorithms, challenges and optimization axes

diversity. Nonetheless, the number of collected updates in this be incorporated into the design of FL scheduling algorithms.
setting might be low. The fairness is also considered in the Nonetheless, the uncertainty measures used in Active Learning
aggregation by q-Fair FL (q-FFL) [17], which reweighs the targets individual samples from unlabeled data in a centralized
objective function in FedAvg to assign higher weights in the setting, thus, these measures cannot be directly integrated in
loss function to devices with higher loss. Another approach is FEEL. In the FEEL setting, the updates’ scheduling can be
to use data size priority, which maximizes the size of data used either before the training or after it, therefore the diversity
in the training, by using a probability of selection inversely measures should be selected depending on the time of schedul-
proportional to the available dataset’s size. In the background, ing. If the scheduling before the training is preferred, then
these scheduling algorithms all share the same idea : if the size the datasets’ diversity is to be considered. Otherwise, if the
of the training data is large then the training would converge scheduling is set after the training is over, the diversity to be
faster. However, IoT data is highly redundant and inherently considered is model diversity, as the diversity of the dataset
unbalanced. Thus, many of the proposed algorithms witness a can be reflected by the resulting model. In both cases, in
drop in performance in non-IID and unbalanced experiments. addition to maximizing the diversity through careful selection
Therefore, the data properties should be considered throughout of participating devices, the scheduling algorithm can focus
the FEEL algorithm. on minimizing the consumed resources in terms of completion
time of FL and transmission energy of participating devices.
III. DATA - AWARE FEEL DESIGN : F UTURE D IRECTION For the pre-training scheduling, local computation energy can
also be optimized. Furthermore, the scheduling problems’
Even if FL was first proposed with data as a central aspect, it
constraints are to be derived from the environment’s properties
has been overlooked in the design of proposed FEEL schedul-
concerning resources and data.
ing algorithms. With the significant drop of accuracy of models
In this section, and to better illustrate the data-aware solutions,
trained with resource-aware FEEL algorithms in non-IID and
we consider the architecture illustrated in Figure 2. The
unbalanced settings, it becomes clear that the data aspect
architecture is a cellular network composed of one base station
should be considered. Henceforth, we propose a new possible
(BS) equipped with a parameter server, and N devices that
data-aware end-to-end FEEL solution based on the diversity
collaboratively train a shared model. In the following, we dis-
properties of the different datasets. In general, diversity con-
cuss different constraints related to the scheduling algorithms
sists of two aspects, namely, richness and uncertainty. Richness
in this setting. Then, we present pre-training and post-training
quantifies the size of the data, while the uncertainty quantifies
algorithms guidelines, where we detail the key criteria for the
the information contained in the data. In fact, it has been long
design of data-aware FEEL solutions, and we present some
proven in Active Learning that by choosing highly uncertain
potential measures and methods to enable a variety of data-
data samples, a model can be trained using fewer labelled
aware FEEL applications, which are summarized in Figure 3.
data samples. This fact suggests that data uncertainty should
5

Fig. 2: The proposed FEEL system model

A. Scheduling Constraints and send the updates. Scheduling the devices before the
The scheduling algorithms’ must consider the following training allows to eliminate potential stragglers, and adapt the
constraints that arise from the FEEL environment’s properties: number of epochs based on the battery levels available at the
Energy consumption: Due to the limited energy level and the participating devices.
high computational requirements of training algorithms, it is
necessary to evaluate a device’s battery level before scheduling 1) Scheduling algorithm:
it for a training round. When first FL was proposed, the In this algorithm, the global model is initialized by the BS.
selected devices were limited to the ones plugged for charging. Afterwards, the following steps are repeated until the model
However, this criterion limits the number of devices that can converges or a maximum of rounds is attained:
be selected, leading to a slow convergence of the learning. • Step 1: At the beginning of each training round, the
Radio Channel State: It is important to consider the radio devices send their diversity indicators and battery levels
channel state changes in the scheduling. The quality of the to the server.
communication is critical for both the device selection and • Step 2: Based on the received information, alongside with
resource allocation. the evaluated channel state indicator, the server schedules
Expected completion time: The available computation re- a subset of devices and sends them the current global
sources, alongside data size, can be used to estimate the model.
completion time of the device. Potential stragglers can be • Step 3: Each device in the subset uses its local data to
discarded even before the training process. train the model.
Number of participants: A communication round cannot • Step 4: The updated models are sent to the server to be
be considered valid unless a minimum number of updates is aggregated.
obtained. Therefore, a training round can be dropped if there • Step 5: The PS aggregates the updates and created the
are not enough devices to schedule. new model.
Data size: The available data in the device is smaller in size
2) Datasets Diversity Measures:
than a required minimum, it can be immediately discarded
In the pre-training scheduling, dataset diversity will serve
from the selection process. For instance, if the number of
essentially as a lead for device selection, where it should
samples is less than the selected mini-batch size, the device
prioritize devices that have potentially informative datasets
should be excluded.
with less redundancy, to speed up the learning process. While
the richness of datasets can be easily quantified through the
B. Pre-training scheduling: Dataset Diversity total number of samples, the uncertainty of the dataset depends
The pre-training scheduling that we propose uses dataset strongly on the application. For supervised learning, the un-
diversity to choose devices that will conduct the training certainty can be evaluated through the evenness of the dataset
6

(i.e., the degree of balance between the classes in classification or angular based (e.g., cosine similarity). A higher value
problems), which can be calculated through entropy measures. is obtained if most of the data points in the sample are
For sequence data, the uncertainty is reflected by the regular- dissimilar, and thus the dataset should be considered as
ity of the series. Moreover, for unsupervised learning, local more diverse. It should be noted that angular based measures
dissimilarity between pseudo-classes or randomly sampled are invariant to scale, translation, rotation, and orientation,
data points can be considered. Furthermore, it is essential which makes them suitable for a wide range of applications,
to consider the privacy as a component of the used index. particularly multivariate datasets.
Sending the number of samples from each class for instance is
a violation of the privacy principle of FEEL. In the following,
we introduce some potential methods to evaluate datasets C. Post-training scheduling: Model Diversity
diversity.
Diversity measures for classification: The measures of The post-training setting uses model diversity to choose
diversity have long been used in Active learning. In fact, un- devices that will send the updates. The model diversity is
certainty is used to choose the samples that should be labeled evaluated on two different aspects: 1) by comparing the
as this task is costly. However, in FL, the client selection does dissimilarity between the local model’s parameters and the
not concern independent samples, instead the diversity should previous global model’s parameters. 2) by comparing the
be evaluated at the level of the entire dataset. Moreover, in diversity within the model’s parameters. In fact, choosing
the premise of supervised FL, the labels are already known, the local models that are divergent from the previous global
which gives the possibility to use more informed measures. For model will possibly improve the representational ability of
instance, Shannon Entropy or Gini-Simpson index are suitable the global model directly, by aggregating updates that have
measures for datasets’ uncertainty in classification problems. potentially new information. Furthermore, if a dataset is
Shannon Entropy and Gini-Simpson index both favor IID highly unbalanced and limited in size, the model’s parameters
partitions, where the maximum for both indexes is obtained for would be very similar. The redundancy within parameters
balanced distributions and the datasets with a single class has negatively affects the model’s representational ability. It is
the minimum possible value. The Shannon entropy quantifies therefore necessary to prioritize updates with high diversity.
the uncertainty (entropy or degree of surprise) of a prediction. In the following, we detail the post-training scheduling
It was first proposed to quantify the information content algorithm, then we present some possible measures for model
in strings of text. The underlying idea is that when a text diversity.
contains more different letters, with almost equal proportional
abundances, it will be more difficult to correctly predict 1) Scheduling algorithm:
which letter will be the next one in the string. However, Similarly to pre-training scheduling, the global model is
Shannon Entropy is not defined for the case of classes with initialized by the BS. Afterwards, the following steps are
no representative samples. Therefore, it may not practical in repeated until the model converges or a maximum of rounds
scenarios with high unbalance. Another possible measure is is attained:
the Gini-Simpson index. The Simpson index 𝜆 measures the • Step 1: At the beginning of each training round, devices
probability that two samples taken at random from the dataset receive the current model.
of interest are from the same class. The Gini–Simpson index is • Step 2: Each device in the subset uses its local data to
its transformation 1 − 𝜆, which represents the probability that train the model.
the two samples belong to different classes. Nonetheless, if • Step 3: The server sends an update request to the devices,
the number of classes is large, the distinction using this index to which each device responds by sending its model
will be hard. diversity index.
Diversity measures for time-series forecasting: In time • Step 4: Based on the received information, alongside with
series problems, other methods can be used, such as Ap- the evaluated channel state indicator, the server schedules
proximate Entropy (ApEn) and Sample Entropy (SampEn). a subset of devices to upload their models. Then, the
In sequential data, statistical measures such as the mean and updated models are sent to the server to be aggregated.
the variance are not enough to illustrate the regularity, as they • Step 5: The PS aggregates the updates and created the
are influenced by system noise. ApEn is proposed to quantify new global model.
the amount of regularity and the unpredictability of time-series 2) Model Diversity Measures:
data. It is based on the comparison between values of data in While the richness aspect of the diversity is irrelevant in
successive vectors, by quantifying how many data points vary models diversity due to fixed model size among devices, the
more than a defined threshold. SampEn was proposed as a information contained in the models can be quantified through
modification of ApEn. It is used for assessing the complexity how the local model’s vary compared to the global model, and
of time-series data, with the advantage of being independent how the parameters within the same model repulse from each
from the length of the vectors. other. Some possible measures are as follows:
Diversity measures for clustering tasks: For clustering Local and global models’ dissimilarity: Choosing the
tasks, a similarity measure between data points from a local models that are divergent from the previous global model
randomly sampled subset should be considered. The measure will possible improve the representational ability of the global
can be distance based (e.g., Euclidean distance, Heat Kernel) model directly [9]. Pairwise similarity measures such as cosine
7

Fig. 3: Diversity measures that can be used in pre-training and post-training scheduling

similarity and Euclidean distance can be used to evaluate ACKNOWLEDGEMENT


the similarity of the new local parameters and the global The authors would like to thank the Natural Sciences and
parameters. Moreover, Divergence, a Bayesian method used Engineering Research Council of Canada, for the financial
to measure the difference between different data distributions, support of this research.
can be used to evaluate diversity of the learned model com-
pared to the global model. Nonetheless, relying on model’s R EFERENCES
dissimilarity might lead to collecting updates from outliers. [1] G. Zhu, Y. Wang, and K. Huang, “Broadband Analog Aggregation
It is thereby necessary to regulate these diversity measures for Low-Latency Federated Edge Learning (Extended Version),”
arXiv:1812.11494 [cs, math], Jan. 2019. [Online]. Available: http:
through the use of thresholds in particular. //arxiv.org/abs/1812.11494
Parameters Dissimilarity: To evaluate the redundancy [2] A. Filali et al., “Multi-Access Edge Computing: A Survey,” IEEE
within the model’s parameters, the same similarity measures Access, vol. 8, pp. 197 017–197 046, 2020.
[3] A. Abouaomar et al., “Resource Provisioning in Edge Computing for
used for clustering can also be applied to the parameters. Latency-Sensitive Applications,” IEEE Internet of Things Journal, vol. 8,
Additionally, the 𝐿 2,1 norm can be used to obtain a group-wise no. 14, pp. 11 088–11 099, Jul. 2021, conference Name: IEEE Internet
sparse representation of the dissimilarity [18]. The internal 𝐿 1 of Things Journal.
[4] J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T.
norm encourages different parameters to be sparse, while the Suresh, and D. Bacon, “Federated Learning: Strategies for Improving
external 𝐿 2 norm is used to control the complexity of entire Communication Efficiency,” arXiv:1610.05492 [cs], Oct. 2017. [Online].
model. Available: http://arxiv.org/abs/1610.05492
[5] K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farhad, S. Jin, T. Q. S.
Quek, and H. V. Poor, “Federated Learning with Differential Privacy:
IV. C ONCLUSION Algorithms and Performance Analysis,” arXiv:1911.00222 [cs], Nov.
2019.
Federated Learning is a promising machine learning tech- [6] Q. Zeng, Y. Du, K. K. Leung, and K. Huang, “Energy-Efficient Radio
nique by virtue of its privacy-preserving aspect and ability Resource Allocation for Federated Edge Learning,” arXiv:1907.06040
[cs, math], Jul. 2019. [Online]. Available: http://arxiv.org/abs/1907.
to handle unbalanced and non-IID data. However, deploying 06040
federated learning based solutions at the edge of the network [7] T. Nishio and R. Yonetani, “Client Selection for Federated Learning
is subject to several challenges. In fact, FEEL is a multi- with Heterogeneous Resources in Mobile Edge,” ICC 2019, pp. 1–7,
May 2019. [Online]. Available: http://arxiv.org/abs/1804.08333
disciplinary problem that requires optimization over both the [8] A. Abouaomar et al., “A resources representation for resource allocation
resources and the data. Nonetheless, the data properties are in fog computing networks,” in 2019 IEEE Global Communications
overlooked in many parts of the proposed algorithms, despite Conference (GLOBECOM), 2019, pp. 1–6.
[9] L. Wang, W. Wang, and B. Li, “CMFL: Mitigating Communication
being the essence of federated learning. Several FEEL design Overhead for Federated Learning,” in IEEE ICDCS 2019, Dallas,
challenges and issues are introduced and discussed in terms of TX, USA, Jul. 2019, pp. 954–964. [Online]. Available: https:
trade-offs. Furthermore, a new research direction is presented //ieeexplore.ieee.org/document/8885054/
[10] A. Taı̈k and S. Cherkaoui, “Electrical Load Forecasting Using Edge
in an effort to incorporate the datasets’ diversity properties into Computing and Federated Learning,” in ICC 2020 - 2020 IEEE Inter-
the design of FEEL algorithms. Our proposed method sup- national Conference on Communications (ICC), Jun. 2020, pp. 1–6.
poses that the data quality and veracity are guaranteed, which [11] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and
K. Chan, “Adaptive Federated Learning in Resource Constrained Edge
requires leveraging other techniques such as the blockchain as Computing Systems,” IEEE Journal on Selected Areas in Communica-
a trusted third-party for data verification. tions, vol. 37, no. 6, pp. 1205–1221, Jun. 2019.
8

[12] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated


Learning with Non-IID Data,” arXiv:1806.00582 [cs, stat], Jun. 2018.
[Online]. Available: http://arxiv.org/abs/1806.00582
[13] C. Xie, S. Koyejo, and I. Gupta, “Asynchronous Federated
Optimization,” arXiv:1903.03934 [cs], Sep. 2019. [Online]. Available:
http://arxiv.org/abs/1903.03934
[14] J. Wang, C. Jiang, H. Zhang, Y. Ren, K.-C. Chen, and L. Hanzo, “Thirty
Years of Machine Learning: The Road to Pareto-Optimal Wireless
Networks,” IEEE Communications Surveys Tutorials, pp. 1–1, 2020.
[15] W. Shi, S. Zhou, and Z. Niu, “Device Scheduling with Fast Convergence
for Wireless Federated Learning,” arXiv:1911.00856 [cs, math], Nov.
2019. [Online]. Available: http://arxiv.org/abs/1911.00856
[16] H. H. Yang, A. Arafa, T. Q. S. Quek, and H. Vincent Poor, “Age-Based
Scheduling Policy for Federated Learning in Mobile Edge Networks,”
in ICASSP 2020, May 2020.
[17] T. Li, M. Sanjabi, A. Beirami, and V. Smith, “Fair Resource Allocation
in Federated Learning,” arXiv:1905.10497 [cs, stat], Feb. 2020.
[Online]. Available: http://arxiv.org/abs/1905.10497
[18] Z. Gong, P. Zhong, and W. Hu, “Diversity in Machine Learning,” IEEE
Access, vol. 7, pp. 64 323–64 350, 2019, conference Name: IEEE Access.

You might also like