2018 14th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)
Towards a context-aware Wi-Fi-based Fog Node
discovery scheme using cellular footprints
Zeineb Rejiba∗ , Xavier Masip-Bruin∗ , Eva Marı́n-Tordera∗
∗ Advanced Network Architectures Lab (CRAAX), Universitat Politècnica de Catalunya (UPC), Barcelona, Spain
{zeinebr, xmasip, eva}@ac.upc.edu
Abstract—Recently, new computing paradigms such as fog and broadcasting process, the FN advertises discovery information
edge computing started to emerge in an attempt to cope with within specific fields of the Wi-Fi beacon; whereas a device
the low latency requirements of new classes of IoT applications. willing to make use of that FN’s resources performs a scanning
In order for these paradigms to fully realize their potential, an
important challenge to address is the discovery of fog nodes (FNs) process to detect such beacons and consequently connect with
with spare computational resources that can be used to host time- that FN.
sensitive and computationally intensive application components.
We particularly focus on this problem in this paper, considering When evaluating the use of such a WiFi-based discovery
the case of a WiFi-based FN discovery process. More specifically, using a real smartphone-based implementation [4], we found
we evaluate how practical it is to trigger the discovery of fog that the periodic scan process it is based on consumes a
nodes based on a mobile phone’s historical cellular footprints in
order to obtain a high discovery rate and a low energy overhead.
significant amount of power, given its wireless nature. In fact,
To this end, we conducted a small-scale cellular data collection a too high scan frequency incurs a high power consumption
to be used to test different learning approaches, including the while a too low frequency may lead to potential FNs being
K-Nearest Neighbors and the decision tree algorithms as well as missed. Thus, instead of a periodic scan process, we study in
a Hidden Markov Model (HMM). According to our evaluation this paper the possibility of realizing a context-aware scanning
results, HMM was found to achieve the maximum discovery and
energy saving ratios. The impact of initial FN misdetections on
mechanism in order to achieve a smarter discovery process
the user-FN contact ratio has also been studied. with increased energy efficiency and a high discovery rate so
Index Terms—device discovery, fog computing, context- that the promised benefits of a fog-based execution can be
awareness, cellular footprints guaranteed.
I. I NTRODUCTION In order to achieve the proposed context-awareness scan, the
main idea consists in using previously-observed information
New classes of pervasive mobile and IoT applications are
about the surrounding cellular context of a mobile device (i.e.
constantly being developed and continuously raising tight QoS
the cellular tower ID and the signal strength) to infer locations
requirements. Such applications usually rely on a cloud-based
with a high likelihood of presence of a FN. Since a FN is
backend for smart data processing and long-term storage.
predicted to be present within such locations, then the scan
However, such an approach fails to cope with the aforemen-
should be enabled. However, if the current cellular information
tioned QoS constraints, especially in terms of perceived end-
has been correlated with a location where no FN has been seen
to-end latency. Besides, given the frequency of application-
previously, the scan should be disabled to save energy.
to-cloud data transmissions, this also leads to an unnecessary
volume of core network traffic overwhelming the underlying To evaluate the feasibility of such an approach in the
infrastructure, while such data is mostly needed at the edge of FN discovery process, we conducted a small-scale cellular
the network where it was generated. data collection experiment in a small section of the city and
These observations have led to the advent of new computing used the obtained data to train and test different learning
paradigms, such as the Fog computing [1] and the Fog-to- approaches, including the K-Nearest Neighbors (KNN) and the
Cloud computing [2], where the use of cloud resources is decision tree algorithms as well as a Hidden Markov Model
complemented by idle resources provided by distributed nodes (HMM). The discovery ratio, the saving ratio and the user-FN
at the edge of the network, generally referred to as fog nodes contact ratio have been provided as performance metrics for
(FNs). Leveraging the edge presence of such FNs and their the three considered approaches.
vicinity to end users, increased QoS levels can be guaranteed.
However, in order to fully take advantage of such FNs, the The remainder of this paper is organized as follows: Section
considered applications need to be equipped with a suitable II provides an overview of related works about cellular-
discovery service allowing the detection of nearby FNs, even based context-awareness mechanisms and discovery solutions
when no knowledge about their potential presence has been proposed in fog computing scenarios. Section III describes the
previously acquired. Within this context, we proposed in a pre- proposed cellular context-based fog node discovery. Section IV
vious work [3] a 802.11 beaconing-based discovery solution, highlights the evaluation results and some concluding remarks
comprised of two processes: broadcasting and scanning. In the are provided in Section V.
978-1-5386-6876-4/18/$31.00 ©2018 IEEE
2018 14th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)
II. R ELATED WORK B. Discovery in fog computing scenarios
In this section, we provide an overview of related works Aligned with our efforts in the area of discovery in an
about cellular-based context-awareness mechanisms in addi- edge/fog computing scenario and in addition to recent re-
tion to reporting recent contributions addressing the discovery lated efforts reported in our previous work [3], additional
problem in fog computing scenarios. contributions have been made that we describe next. Kinaara
A. The use of cellular footprints for context inference is proposed in [9] as a framework for distributed discovery
and allocation of mobile edge resources. Those resources
Different mechanisms have been studied in the literature
are organized within proximal clusters managed by trusted
to enable context-awareness. A trivial method would be the
mediator entities. Device discovery is facilitated by the use
use of the GPS coordinates. However, it incurs a high energy
of a distributed indexing scheme allowing the organization
cost while not being suitable for indoor environments. Another
of resources into a logical ring structure based on resource
option would be to use the Wi-Fi footprint of a given place
similarity. Although the use of such a similarity-based index-
(consisting in the list of detected APs, their characteristics and
ing yields fast discovery times, the paper does not mention
their signal strength), which was shown to provide promis-
how the mediator entities are actually discovered. eDisco
ing accuracy values. However, since it also uses the Wi-Fi
is proposed in [10], where DNS SRV records are used to
interface for context inference, it does not solve the energy
enable discovery of nearby edge servers on a client’s network
consumption issue that we intend to address. Finally, the use of
path. Those records need only to be added to the DNS zone
the cellular footprint of a certain location (the set of observed
of the entity in charge of the edge server’s installation. A
cellular tower IDs and their corresponding signal strengths)
rather different but related scenario is addressed in [11],
as a context indicator has been widely-studied in the fields of
where the use of a fog-based MQTT server is considered to
localization [5], [6], efficient discovery of access points [7]
assist IoT devices in the Bluetooth Low Energy (BLE)-based
and human places of interest [8]. This approach can achieve
discovery process. In fact, the fog server’s role is to keep track
promising results in terms of accuracy, while not causing an
of the BLE advertiser device trajectory and it controls the
additional energy cost, since cellular information is already
enabling/disabling of the BLE scanner’s interface according
continuously received by a mobile phone.
to the advertiser’s geolocation, transmitted through Wi-Fi to
This is proved for instance in the study conducted in [7],
the broker. This allows obtaining a high discoverability rate
where authors propose to exploit the correlation between
and significant power savings. However, this paper puts the
surrounding information (such as the cell tower ID, the Blue-
focus on increasing the energy-efficiency of the BLE scanning
tooth ID and the user speed) and previous Wi-Fi contacts
process, without considering the potential energy overhead
in history data to estimate the remaining time until the next
at the BLE advertiser’s side because of frequent location
Wi-Fi AP encounter. The authors found that the information
transmissions to the broker.
gains obtained from the cell ID approach are larger than
the ones from Bluetooth IDs in addition to being the most
energy-efficient. Authors in [5] leverage cellular information III. C ELLULAR CONTEXT- BASED FOG NODE DISCOVERY
for localization purposes and their proposed system was able
to estimate the position of a mobile user with a very small error
(20m) using information from three cell towers (one connected
and two neighboring). A dedicated application server is used
to serve localization requests, being sent by the device every
five seconds. In order to improve the location’s precision, a
continuous update of a cellular fingerprinting database is made
in addition to considering the previous estimated location to
predict the actual one. However, it is worth noting that more
recently, most modern phones no longer provide programmatic
access to neighboring cell information and only the serving
cell information is available for use. That is why, the work
conducted in [6] proposes the use of a single cell tower Fig. 1. Illustration of FN deployment in a city
information (the associated one) to provide localization. The As fog nodes start to be deployed in geo-distributed lo-
proposed solution is based on a Hidden Markov Model and it cations in a given city (an example deployment is shown in
allows achieving accurate GSM localization with a 50m error Fig 1), it is highly important for a mobile device to know
approximately in urban areas, which is very promising, given where these nodes are located in order to take advantage of
the limited information used for making predictions in nearby their resources to achieve enhanced application executions.
locations, where cellular footprints are likely to overlap. Therefore, leveraging the wide deployment of cellular towers
Similar to [6], we also consider the use of the serving cell in cities nowadays, our objective is to be able to predict the
information only in our proposed approach, as described in presence of a specific FN, given the cellular footprint that is
Section III. currently observed by a mobile device. A cellular footprint
2018 14th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)
here corresponds to the (serving cell id, signal strength) as possible. The Gini index [12] is used to evaluate
pair. the purity of the performed splits. To avoid potential
In order to achieve this objective, two phases are needed. overfitting, the splitting process stops when a given tree
First, the previously observed cellular footprints are stored depth or when a minimum number of data entries is
along with the information about the presence or absence of a reached. The prediction then corresponds to the majority
FN while that footprint is seen. Such information is acquired class found in the data entries associated to the considered
from initial non-context-aware discovery executions. Then, the tree node. Similar to what we did in the KNN approach,
collected data is given as an input to a learning algorithm to if the current cell ID is unknown, the scan is enabled.
predict the appropriate locations, in which the scan should be • HMM: A Hidden Markov Model is a statistical model
triggered. As a result, the scan for FNs should be enabled in allowing predicting the most likely sequence of states
the cases where the algorithm predicts the existence of a FN. (s1 , s2 , ..., sN ), given an input sequence of observations
On the other hand, in case the algorithm infers that the user is (o1 , o2 , ..., oN ), where N is the considered sequence
present at a non-FN location, the scan should be kept disabled length. In our case, the observations are the set of
in order to reduce the number of unnecessary scans, possibly cellular footprints whereas the hidden states to be inferred
affecting the device’s energy consumption. from those observations are the presence/absence of a
Since we are dealing with labeled data, where the predic- given FN. More specifically, the predictions can be made
tion values are previously known, the use of a supervised based on the following HMM characteristics: (i) Emission
classification algorithm is considered. More specifically, we probabilities referring to the probability of a certain
select the K-nearest neighbors (KNN) and Decision Tree (DT) observation (cellular footprint) given a state (FN ID /
algorithms for our evaluation, since they do not assume a none); (ii) transition probabilities corresponding to the
specific distribution for the data. We additionally evaluate the probability of transition from one state at time t to another
use of a probabilistic approach based on a Hidden Markov state at (t+1) and (iii) prior probabilities corresponding
Model (HMM) to learn from historical transition probabilities to the general probability of occurrence of a certain state.
from one state to the other (e.g. “none” → FN). The prediction in this case corresponds to the last item
In the following, we provide an overview of how the three of the state sequence, i.e. sN .
considered approaches have been used: A qualitative comparison of the three considered models is
• KNN: The basic idea behind the original KNN algorithm provided in Table I.
is quite simple. In fact, for each new data entry, the
IV. P ERFORMANCE EVALUATION
distance to each of the historical data entries is calculated.
Then, the data is sorted by decreasing order of the In this section, we detail our experimentation approach and
calculated distances and the top K entries are retrieved. we then analyze the obtained results.
Finally, the prediction corresponds to the majority class A. Experimentation approach
found within the top K elements. We further customize
the KNN algorithm in the following two ways. First, in For the purposes of this work, we assume a small-scale fog
order to avoid extensive distance calculations to each node deployment in specific locations of the city, including our
historical data item each time a scan decision has to research lab, another university building (engineering school)
be made, we reduce the considered dataset to a smaller as well as the city hall (all shown in Fig 2). This placement
subset only containing entries having the same cell ID as is justified by the fact that, in a real deployment, these
the one in the new cellular footprint. Then, we calculate locations could potentially be candidates for hosting fog-based
the distance in terms of signal strength, as follows: applications such as smart building management solutions or
enhanced entertainment services.
|ss1 − ss2|
distance =
max(|ss1| ,|ss2|)
where ssi is the signal strength at time ti .
Second, in the cases when the current cell ID has not
been previously seen in the historical data, no prediction
occurs and the scan is triggered in order to avoid missing
a FN, at the expense of potentially causing an energy
overhead.
• Decision Tree (DT): In our case, we use a classification
tree where predictions can be made based on the cell ID
and signal strength input variables. We initially split the
dataset based on the cell ID variable, thus obtaining a
subtree for each unique cell ID. Then, for each subtree, Fig. 2. Considered data collection scenario
we further split the data entries according to the signal In order to obtain the cellular context information corre-
strength variable to obtain splits, which are as pure sponding to this city area, we used an Android-based mobile
2018 14th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)
TABLE I
Q UALITATIVE COMPARISON OF THE 3 CONSIDERED MODELS
Algorithm Pros Cons
KNN • No training needed • Distance is calculated to each item in
• No assumption about underlying data the dataset
distribution
DT • Simple to interpret • Parameters needed such the max
depth
• Training phase needed to build the
tree
HMM • Capturing state transition sequences • Complexity associated with keeping
beneficial for time-series data track of the different probabilities and
finding the most likely state sequence.
• Long sequence means increased accu-
racy at the cost of delaying the initial
prediction [6]
network monitoring application1 , which logs cellular traces TABLE II
including the associated cell ID and its corresponding signal A LGORITHM PARAMETERS
strength every minute (or whenever the serving cell changes). Algorithm Parameters
Then, we repeatedly move (at walking speed) following the
path marked with the orange arrows in Fig 2 while keeping KNN • k=5
the monitoring application ON. A stop of a duration of 10
minutes is made at each FN location to account for the fact that
users generally stay for a non-trivial amount of time in specific DT • Max depth = 3
places of interest, as defined in [8]. This also ensures that we • Min entries = 3
obtain enough representative data entries of a particular FN
location. HMM • Emission probabilities, Transition
The previous data collection experiment is repeated five probabilities, Prior probabilities
times for two different network operators. The obtained data were determined based on their
was further labeled manually in order to tag the trace entries actual corresponding fractions in the
training data
corresponding to the fog node locations, since there is no real • Sequence length = 5
FN deployment in the city at this stage. The resulting data
entries have the following structure: (cell id, signal strength,
FN id), where FN id corresponds to the FN identifier, if any;
otherwise, it is set to “none”. which the user was indeed present at a FN location. It is
The three approaches described in Section III were imple- defined as follows:
mented in Python and their performance was evaluated using
5-fold cross-validation. Different values for the algorithm N b. of successf ul discoveries
parameters were tested and the ones yielding the best results Discovery ratio =
N b. of total discovery opportunities
(i.e. less overfitting) were chosen, as shown in Table II. The
discovery ratio as well as the saving ratio are provided next
As it can be seen in Fig 3, there is only a slight difference
as performance evaluation metrics.
in the performance of the KNN and DT algorithms in terms of
B. Obtained results discovery ratios. They were both able to successfully discover
FNs in ∼85% (resp. ∼76%) of the time for the first (resp.
1) Discovery ratio: An important metric to consider in our
second) operator. It is worth noting that both algorithms were
scenario is the discovery ratio (more generally referred to as
not able to achieve a better performance because of the
true positive rate) and it indicates the algorithm’s ability to
relatively small size of the considered datasets in addition
successfully discover a FN out of the total opportunities in
to the fact that measurements were carried out in nearby
1 https://play.google.com/store/apps/details?id=com.signalmonitoring locations, where it is difficult to make predictions based on
.gsmsignalmonitoring&hl=en limited information (only the serving cell ID and the signal
2018 14th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)
thus consuming energy to do so, it is still preferable to keep the
number of unnecessary scans as low as possible. This not only
helps conserving the device’s battery, but it also contributes to
an efficient use of the wireless channel.
C. Discussion
1) User - FN contact ratio: Sections IV-B1 and IV-B2 have
focused on the models’ performance mostly from a statistical
perspective, considering that a prediction is made each time a
cellular measurement is available (i.e. every minute or when
the serving cell changes). However, in certain time steps,
predictions are not required. In fact, if the user has already
detected a FN, then discovery is not needed as long as the
Fig. 3. Discovery ratio user is still connected to it. Therefore, the predictions that
matter the most are the ones leading to an initial successful
discovery of a FN. If such a successful discovery is delayed
strength). Moreover, signal fluctuations also contribute to the due to an incorrect prediction (see first two predictions in the
uncertainty causing erroneous predictions. On the other hand, example in Fig 5), then this may lead to shorter user – FN
HMM achieves a discovery ratio of up to 98 % even in contact duration, which in turn increases the risks of potential
presence of such limited and uncertain information. This is SLA violations due to the inability to use the FNs’ resources
mainly due to the fact that it leverages the whole sequence of earlier2 .
state transitions to make predictions.
2) Saving ratio: Since one of our goals is to reduce the
number of unnecessary scans in order to achieve energy
savings, we consider the saving ratio (more generally referred
to as true negative rate) as a second performance indicator. In
fact, we consider a saving was achieved when the algorithm
predicts “none” in a non-covered location, whereas saving op-
portunities correspond to the number of actual occurrences of
such non-covered locations. The ratio of both values represents
the saving ratio, as defined in the following:
N b. of savings achieved Fig. 5. Illustration of the User-FN contact ratio
Saving ratio =
N b. of actual saving opportunities Consequently, in order to evaluate the impact of using cellu-
lar context-based predictions for scan triggering, we consider
another metric, the “user – FN contact (UFC) ratio”, illustrated
in Fig 5 and defined as the fraction of time the user has spent
connected to a FN after the initial successful discovery divided
by the total time spent within a FN’s coverage. The closer the
UFC ratio to 1, the better it is, in order to fully benefit from
the FN’s resources.
Fig 6 and Fig 7 plot the UFC ratio (shown in percentage)
for the two considered operators and for the three considered
FNs of Fig 2. As it can be seen, all three algorithms guarantee
a minimum UFC ratio of 80% (resp. 92%) per FN (on
average, resp.). It is also worth noting that even using simple
algorithms such as KNN, a maximum of a 100% UFC rate can
be achieved for certain fog nodes (see FNlab in Fig 7), mainly
the ones characterized with relatively stable cellular patterns
Fig. 4. Saving ratio
(as opposed to FNschool where we observed many fluctuations
As depicted in Fig 4 and similarly to the discovery ratio for the second operator).
case, KNN and DT have similar saving patterns of up to 2) Impact of mobile FNs on false triggering: In this section,
62%. Such a behavior can be attributed to the aforementioned we consider the case of mobile FNs (cars, buses, etc. . . ) that
factors affecting the discovery ratio. On the other hand, HMM may show up during the user’s mobility. In fact, such mobile
can achieve considerable savings reaching 96%. Although in FNs might be detected by the discovery scheme in locations
a real-world scenario, the mobile device may be resorting to a
cloud-based execution in the case of a non-covered fog area, 2A cloud-level execution penalty may occur in this case.
2018 14th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)
inherent to the model’s functioning. On the other hand, the
use of decision trees and the simpler KNN algorithm achieve
lower performance. We believe this can be overcome by
dynamically learning from new cellular context values on
the go, in addition to increasing the sampling frequency
(compared to the considered 1 sample/minute). We also note
that the proposed context-awareness mechanism is targeted
towards mobile devices having access to an active SIM card.
Alternative context-awareness mechanisms (GPS-based or Wi-
Fi-based) may be envisioned for other devices, especially if
energy consumption is not a concern. Finally, our future work
will consider the fact that mobility might also affect FNs
(e.g. an in-vehicle FN) and as such, we aim to study suitable
Fig. 6. User-Fog contact ratio (Operator 1) mechanisms to maximize the user - FN contact ratio in the
case of such FN mobility.
ACKNOWLEDGMENT
This work was partially supported by the H2020 EU
mF2C Project ref. 730929 and by the Spanish Ministry of
Economy and Competitiveness and by the European Re-
gional Development Fund, under contract TEC2015-66220-R
(MINECO/FEDER).
R EFERENCES
[1] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, “Fog computing and its
role in the internet of things,” in Proceedings of the first edition of the
MCC workshop on Mobile cloud computing - MCC ’12, p. 13, ACM,
2012.
[2] X. Masip-Bruin, E. Marı́n-Tordera, G. Tashakor, A. Jukan, and G. J. Ren,
“Foggy clouds and cloudy fogs: A real need for coordinated management
of fog-to-cloud computing systems,” IEEE Wireless Communications,
Fig. 7. User-Fog contact ratio (Operator 2) vol. 23, no. 5, pp. 120–128, 2016.
[3] Z. Rejiba, X. Masip-Bruin, A. Jurnet, E. Marı́n-Tordera, and G.-J. Ren,
“F2C-Aware: Enabling Discovery in Wi-Fi-Powered Fog-to-Cloud (F2C)
Systems,” in 2018 6th IEEE International Conference on Mobile Cloud
where the cellular footprint is not generally representative of Computing, Services, and Engineering (MobileCloud), pp. 113–116,
a FN (i.e. locations where saving opportunities can occur) IEEE, 2018.
and therefore this information will be added to the data to [4] Z. Rejiba, X. Masip-Bruin, and E. Marı́n-Tordera, “Analyzing the
deployment challenges of beacon stuffing as a discovery enabler in
be used for future predictions. However, we note that this is fog-to-cloud systems,” in 2018 European Conference on Networks and
not likely to cause false triggering of scans in such locations, Communications (EuCNC), pp. 1–276, June 2018.
since the proposed methods are based on the use of maximum [5] G. Aloi, G. Caliciuri, V. Loscrı́, and P. Pace, “Accurate and energy-
efficient localization system for smartphones: A feasible implementa-
likelihoods to make predictions, whereas mobile FNs will be tion,” in IEEE International Symposium on Personal, Indoor and Mobile
seen infrequently in the same location by the same user. While Radio Communications, PIMRC, pp. 3478–3481, IEEE, 2013.
this can ensure robustness of the proposed approach to FN [6] M. Ibrahim and M. Youssef, “A hidden markov model for localization
using low-end GSM cell phones,” in Communications (ICC), 2011 IEEE
mobility in terms of false triggering, it does not solve the International Conference on, pp. 1–5, IEEE, 2011.
problem of finding the right context to discover a mobile FN [7] J. Jeong, J. Lee, Y. Kim, J. W. Lee, and S. Chong, “Impact of
because of the uncertainty caused by the FN’s mobility. We surrounding information on Wi-Fi sensing efficiency,” in International
Conference on ICT Convergence, pp. 410–415, IEEE, 2013.
aim to address this specific issue in a future work in order to [8] K. Yadav, V. Naik, A. Kumar, and P. Jassal, “PlaceMap: Discovering
increase the mobile user - mobile FN contact ratio. human places of interest using low-energy location interfaces on mobile
phones,” in ACM DEV 2014 - Proceedings of the 2014 Annual Sympo-
V. C ONCLUSION sium on Computing for Development, vol. 5, pp. 93–102, ACM, 2014.
[9] A. Salem, T. Salonidis, N. Desai, and T. Nadeem, “Kinaara: Distributed
In this paper, we explored the use of surrounding cellular discovery and allocation of mobile edge resources,” in Mobile Ad Hoc
information as a context indicator for Wi-Fi based fog node and Sensor Systems (MASS), 2017 IEEE 14th International Conference
discovery. The goal is to predict the presence or absence of on, pp. 153–161, IEEE, 2017.
[10] A. Zavodovski, N. Mohan, and J. Kangasharju, “eDisco: Discovering
a fog node based on the current cellular footprint in order to Edge Nodes Along the Path,” arXiv preprint arXiv:1805.01725, 2018.
trigger or disable the discovery accordingly. We specifically [11] R. Venanzi, B. Kantarci, L. Foschini, and P. Bellavista, “MQTT-Driven
compared three different prediction methods based on the Node Discovery for Integrated IoT-Fog Settings Revisited: The Impact
of Advertiser Dynamicity,” in Service-Oriented System Engineering
use of KNN, decision trees and HMM. We found that the (SOSE), 2018 IEEE Symposium on, pp. 31–39, IEEE, 2018.
HMM model obtains the highest performance in terms of [12] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, “Classifica-
discovery and saving ratios, at the cost of more complexity tion and regression trees,” 1984.