Fast and Accurate Performance Analysis of LTE Radio Access Networks
Fast and Accurate Performance Analysis of LTE Radio Access Networks
                                                                                                                 1
problem is not enough. The desire to troubleshoot RANs as                                  Mobility
                                                                                       Management Entity
                                                                                                                 Home
                                                                                                            Subscriber Server
                                                                                                                                Control Plane
                                                                                           (MME)                 (HSS)           Data Plane
fast as possible exposes the inherent tradeoff between latency
and accuracy that is shared by many ML algorithms.
                                                                            User        Base Station       Serving Gateway      Packet Gateway
   To illustrate this tradeoff, consider the natural solution             Equipment
                                                                                         (eNodeB)              (S-GW)              (P-GW)
                                                                                                                                                 Internet
                                                                            (UE)
of building a model on a per-base station basis. On one
hand, if we want to troubleshoot quickly, the amount of data
                                                                                        Figure 1: LTE network architecture
collected for a given base station may not be enough to learn
an accurate model. On the other hand, if we wait long enough                • We expose the fundamental trade-off between data col-
to learn a more accurate model, this will come at the cost                    lection latency and analysis accuracy present in many
of delaying troubleshooting and the learned model may not                     domains, which impedes the practicality of applying
be valid any longer. Another alternative would be to learn                    analytics for decision making on data collected in a
one model over the entire data set. Unfortunately, since base                 real-time fashion. We find that this trade-off may be re-
stations can have very different characteristics using a single               solved in several domains using two broad approaches:
model for all of them can also result in low accuracy (§2).                   intelligent grouping and domain specific formulations.
   In this paper, we present CellScope, a system that en-                   • Based on this insight, we present CellScope, a system
ables fast and accurate RAN performance diagnosis by re-                      that applies a domain specific formulation and appli-
solving the latency and accuracy trade-off using two broad                    cation of Multi-task Learning (MTL) to resolve the
techniques: intelligent data grouping and task formulations                   latency and accuracy trade-off in RAN performance
that leverage domain characteristics. More specifically, Cell-                analysis. It achieve this using three techniques: feature
Scope applies Multi-task Learning (MTL) [10, 44], a state-                    engineering to transform raw data into effective fea-
of-the-art machine learning approach, to RAN troubleshoot-                    tures, a novel PCA inspired similarity metric to group
ing. In a nutshell, MTL learns multiple related models in                     data from base stations sharing commonalities in perfor-
parallel by leveraging the commonality between those mod-                     mance, and a hybrid online-offline model for efficient
els. To enable the application of MTL, CellScope uses two                     model updates (§4).
techniques. First, it uses feature engineering to identify the              • We have built CellScope on Apache Spark, a big data
relevant features to use for learning. Second, it uses a PCA                  framework. Our evaluation shows that CellScope’s
based similarity metric to group base stations that share com-                accuracy improvements range from 2.5× to 4.4× while
mon features, such as interference and load. This is necessary                reducing the model update overhead by up to 4.8×
since MTL assumes that the models have some common-                           (§6). We have also validated CellScope by using it to
ality which is not necessarily the case in our setting, e.g.,                 analyze an operational LTE consisting of over 2 million
different base stations might exhibit different features. Note                subscribers for a period of over 10 months. Our analysis
that while PCA has been traditionally used to find network                    uncovered insights which were valuable for operators
anomalies, CellScope uses PCA for finding the common fea-                     (§7).
tures instead. We note that the goal of CellScope is not to           2      Background and Motivation
apply specific ML algorithms for systems diagnostics, but to
propose approaches to resolve the latency accuracy trade-off          In this section, we briefly discuss cellular networks, focusing
common in many domains.                                               on the LTE network architecture, protocol procedures and
                                                                      measurement data and then motivate the problem.
   To this end, CellScope uses MTL to create a hybrid model:
an offline base model that captures common features, and              2.1      LTE Network Primer
an online per-base station model that captures the individual
                                                                      LTE networks provide User Equipments (UEs) such as smart-
features of the base stations. This hybrid approach allows
                                                                      phones with Internet connectivity. When a UE has data to
us to incrementally update the online model based on the
                                                                      send to or receive from the Internet, it sets up a communi-
base model. This results in models that are both accurate and
                                                                      cation channel between itself and the Packet Data Network
fast to update. Finally, in this approach, finding anomalies is
                                                                      Gateway (P-GW). This involves message exchanges between
equivalent to detecting concept drift [19].
                                                                      the UE and the Mobility Management Entity (MME). In coor-
   To demonstrate the effectiveness of our proposal, we have          dination with the base station (eNodeB), the Serving Gateway
built CellScope on Spark [26, 41, 49]. Our evaluation shows           (S-GW), and P-GW, data plane (GTP) tunnels are established
that CellScope is able to achieve accuracy improvements               between the base station and the S-GW, and between the
upto 4.4× without incurring the latency overhead associated           S-GW and the P-GW. Together with the connection between
with normal approaches (§6). We have also used CellScope              the UE and the base station, the network establishes a com-
to analyze a live LTE network consisting of over 2 million            munication channel called EPS bearer (short for bearer). The
subscribers for a period of over 10 months. Our analysis              entities in the LTE network architecture is shown in figure 1.
reveals several interesting insights (§7).                               For network access and service, entities in the LTE net-
  In summary, we make the following contributions:                    work exchange control plane messages. A specific sequence
                                                                  2
                                                                            �����                                                           ����
  ����
                                                        �����������������
                     ������������                                           �����   ���������                                                      �����
  ����        ��������������������                                          �����                                                            ���   �����
                                                                                                                             ������������
  ����                                                                      �����                                                            ���
   ���                                                                      �����
   ���                                                                      �����                                                            ���
   ���                                                                      �����                                                            ���
   ���                                                                       ����
    ��                                                                         ��                                                             ��
         ������ �������������   �����       �������                                 �    �      �       �     �    ��   ��                         �       �   �   �     �    ��    ��
                           �����                                                    ���������������������������������                               ���������������������������������
(a) Global models are ineffective while spatial parti-(b) Lack of data at low latencies indicates the need (c) Using local models leads to accuracy and/or la-
tioning ignores performance similarity                for grouping/partitoning                             tency issues
Figure 2: Resolving the latency accuracy trade-off requires domain specific optimizations.
of such control plane message exchange is called a network                                              aid in this troubleshooting procedure by enabling efficient
procedure. For example, when a UE powers up, it initiates an                                            slicing and dicing on data. However, we have learned from
attach procedure with the MME which consists of establish-                                              domain experts that often it is desirable to apply different
ing a radio connection to the base station, authentication and                                          models or algorithms on the data for detailed diagnosis. Thus,
resource allocation. Thus, each network procedure involves                                              most of the RAN trouble tickets end up with experts who
the exchange of several control plane messages between two                                              work directly on the raw measurement data.
or more entities. The specifications for these are defined by                                           2.3       Need for Domain Specific Approach
various 3GPP Technical Specification Groups (TSG) [42].
   Network performance degrades and end-user experience                                                 We now discuss the difficulties in applying machine learning
is affected when procedure failures happen. The complex                                                 for solving the RAN performance analysis problem, thereby
nature of these procedures (due to the multiple underlying                                              motivating the need for a new domain specific solution.
message and entity interactions) make diagnosing problems                                               2.3.1      Ineffectiveness of Global Modelling
challenging. Thus, to aid RAN troubleshooting, operators                                                A common solution to applying ML on a dataset is to consider
collect extensive measurements from their network. These                                                the dataset as a single entity and build one model over the
measurements typically consist of per-procedure information                                             entire data. However, base stations in a cellular network ex-
(e.g., attach). To analyze a procedure failure, it is often useful                                      hibit different characteristics. This renders the use of a global
to look at the associated variables. For instance, a failed                                             model ineffective. To illustrate this problem, we conducted
attachment procedure may be diagnosed if the underlying                                                 an experiment where the goal was to build a model for call
signal strength information was captured 1 . Hence, relevant                                            drops in the network. We first run a decision tree algorithm to
metadata is also captured with procedure information. Since                                             obtain a single model for the network. The other extreme for
there are hundreds of procedures in the network and each pro-                                           this approach is to train a model per base station. Figure 2a
cedure can have many possible metadata fields, the collected                                            shows the results of this experiment which used data collected
measurement data contains several hundreds of fields2 .                                                 over an 1 hour interval to ensure there is enough data for the
2.2      RAN Troubleshooting Today                                                                      algorithms to produce statistically significant results. We see
                                                                                                        that the local model is significant better, with up to 20% more
Current RAN network monitoring depends on cell-level ag-
                                                                                                        accuracy while showing much lower variance.
gregate Key Performance Indicators (KPI). Existing practice
is to use performance counters to derive these KPIs. The de-                                            2.3.2      Latency/Accuracy Issues with Local Models
rived KPIs are then monitored by domain experts, aggregated                                             It is natural to think of a per base station model as the final
over certain pre-defined time window. Based on domain                                                   solution to this problem. However, this approach has issues
knowledge and operational experience, these KPIs are used                                               too. Due to the difference in characteristics of the base sta-
to determine if service level agreements (SLA) are met. For                                             tions, the amount of data they collect is different. Thus, in
instance, an operator may have designed the network to have                                             small intervals, they may not generate enough data to produce
no more than 0.5% call drops in a 10 minute window. When                                                valid results. This is illustrate in figure 2b which shows the
a KPI that is being monitored crosses the threshold, an alarm                                           quartiles, min and max amount of data generated and the
is raised and a ticket created. This ticket is then handled                                             latency required to collect them.
by experts who investigate the cause of the problem, often                                                 Additionally, algorithms may produce stale models with
manually. Several commercial solutions exists [3–5, 16] that                                            increasing latency. To show this, we conduct an experiment
1 Some  of the key physical layer parameters useful for diagnosis is
                                                                                                        with two different algorithms on data collected over varying
described in table 1.                                                                                   latencies. The first algorithm (Alg 1) builds a classification
2 Our dataset consists of almost 400 fields in the measurement data,                                    model for connection failures, while the second (Alg 2) builds
with each field possibly having additional nested information.                                          a regression model to predict and explain throughput anoma-
                                                                                                    3
lies. The results of this experiment is given in figure 2c. The                                        CellScope
first algorithm behavior is obvious; as it gets more data its                                  RAN Performance Analyzer                Dashboards
accuracy improves due to the slow varying nature of the un-                                      Throughput        Drop
derlying causes of failures. After an hour latency, it is able          Bearer Level
                                                                           Trace                        Domain-Specific MTL
to reach a respectable accuracy. However, the second algo-                                 Feature
                                                                                                       Gradient Boosted Trees
                                                                                         Engineering
rithm’s accuracy improves initially, but falls quickly. This is                                        PCA-Based Similarity Grouping   Self-Organizing
                                                                                                                                       Networks (SON)
counterintuitive in normal settings, but the explanation lies
                                                                                          Streaming                         ML Lib
in the spatio-temporal characteristics of cellular networks.
Many of the performance metrics exhibit high temporal vari-
ability, and thus need to be analyzed in smaller intervals. In                         Figure 3: CellScope System Architecture.
such cases, local modeling is ineffective.                             of these base stations 3 . We illustrate this using the drop
   It is important to note that an obvious, but flawed, conclu-        experiment. Figure 2a shows the performance of a spatial
sion is to think that models similar to Alg 1 would work once          model, where we combine data from nearby base stations
the data collection latency has been incurred once. This is            using a simple grid partitioner. The results show that the spa-
not true due to staleness issues which we discuss next.                tial partitioner is not much better than the global partitioner.
                                                                       We show comparisons with other smarter spatial partitioning
                                                                       approaches in 6.
2.3.3   Need for Model Updates
                                                                       3      CellScope Overview
Due to the temporal variations in cellular networks, models            CellScope presents a domain-specific formulation and appli-
need to be updated to retain their performance. To depict              cation of Multi-Task Learning (MTL) for RAN performance
this, we repeated the experiment where we built per base               diagnosis. Here, we provide a brief overview of CellScope
station decision tree model for call drops. However, instead           to aid the reader in following the rest of this paper.
of training and testing on parts of the same dataset, we train
                                                                       3.1     Problem Statement
on an hours worth of data, and apply it to the next hour.
Figure 2a shows that the accuracy drops by 12% with a stale            CellScope’s ultimate goal is to enable fast and accurate RAN
model. Thus, it is important to keep the model fresh by                performance diagnosis by resolving the trade-off between
incorporating learning from the incoming data while also               data collection latency and the achieved accuracy. The key
removing historical learnings. Such sliding updates to ML              difficulty arises from the fundamental trade-off between hav-
models in a general setting is difficult due to the overheads in       ing not enough data to build accurate-enough models in
retraining them from scratch. To add to this, cellular networks        short timespans and waiting to collect enough data that en-
consist of several thousands of base stations. This number is          tails stale results that is impossible to resolve in a general
on the rise with the increase in user demand and the ease of           setting. Additionally, we must support efficient modifications
deployment of small cells. Thus, a per base station approach           to the learned models to account for the temporal nature of
requires creating, maintaining and updating a huge amount              our setting to avoid data staleness.
of models (e.g., our network consisted of over 13000 base              3.2     Architectural Overview
stations). This makes scaling hard.
                                                                       Figure 3 shows the high-level architecture of CellScope,
                                                                       which has the following key components:
2.3.4   Why not Spatial/Spatio-Temporal Partitioning?                  Input data: CellScope uses bearer-level traces that are read-
                                                                       ily available in cellular networks (§2.1). Base stations collect
The above experiments point towards the need for obtain-               traces independently and send them to the associated MME.
ing enough data with low latency. The obvious solution to              records if required (users move, The MME merges records if
combating this trade-off is to intelligently combine data from         required and hence, generate traces at multiple base stations)
multiple base stations. It is intuitive to think of this as a          and uploads them to a data center.4
spatial partitioning problem, since base stations in the real          Feature engineering: Next, CellScope uses domain knowl-
world are geographically separated. Thus, a spatial partitioner        edge to transform the raw data and constructs a set of fea-
which combines data from base stations within a geographi-             tures amenable to learning (e.g., computing interference ra-
cal region must be able to give good results. Unfortunately,           tios)(§4.1). We also leverage protocol details and algorithms
this isn’t the case which we motivate using a simple exam-             (e.g., link adaptation) in the physical layer.
ple. Consider two base stations, one situated at the center
                                                                       3 In our measurements, a base station in a highly popular spot serves
of times square in New York and the other a mile away at
                                                                       more than 300 UEs and carries multiple times uplink and downlink
a residential area. Using a spatial partitioning scheme that           traffic compared to another base station situated just a mile from it
divides the space into equal sized planes would likely result          that serves only 50 UEs.
in combining data from these base stations. However, this              4 The transfer of traces to a data center is not fundamental. Extending
is not desirable because of the difference in characteristics          CellScope to do geo-distributed learning is a future work.
                                                                   4
Domain-specific MTL: CellScope uses a domain specific                    tasks, each with its own data. However, each task does not
formulation and application of MTL that allows it to perform             have enough data to produce models with acceptable accu-
accurate diagnosis while updating models efficiently (§4.2).             racy in a given latency budget. This makes our setting an
Data partitioner: To enable correct application of MTL,                  ideal candidate for multi-task learning (MTL), a cutting-edge
CellScope implements a partitioner based on a similarity                 research area in machine learning. The key idea behind MTL
score derived from Principal Component Analysis (PCA) and                is to learn from other tasks by weakly coupling their param-
geographical distance (§4.3). The partitioner segregates data            eters so that the statistical efficiency of many tasks can be
to be analyzed into independent sets and produces a smaller              boosted [9, 10, 17, 44]. Specifically, if we are interested in
co-located set relevant to the analysis. This minimizes the              building a model of the form
need to shuffle data during the training process.
                                                                                         h(x) = m( f1 (x), f2 (x), ..., fk (x))            (1)
RAN performance analyzer: This component binds every-
thing together to build diagnosis modules. It leverages the              where m is a model composed of features f1 through fk ,
MTL component and uses appropriate techniques to build                   then the traditional MTL formulation, given dataset D =
call drop and throughput models. We discuss our experience               {(xi , yi , bsi ) : i = 1, ..., n}, where xi ∈ Rd , yi ∈ R and bsi de-
of applying these techniques to a live LTE network in §7.                notes the ith base station, is to learn
Output: Finally, CellScope can output analytics results to                              h(x) = mbs ( f1 (x), f2 (x), ..., fk (x))          (2)
external modules such as RAN performance dashboards. It
can also provide inputs to Self-Organizing Networks (SON).               where mbs is a per base station model.
                                                                            In this MTL formulation, the core assumption is a shared
4     Resolving Latency-Accuracy Trade-off                               structure or dependency across each of the learning problems.
In this section, we present how CellScope uses domain-                   Unfortunately, in our setting, the base stations do not share a
specific machine learning to mitigate the trade-off between              structure at a global level (§2). Due to their geographic sep-
latency and accuracy. We first discuss a high-level overview             aration and the complexities of wireless signal propagation,
of RAN specific feature engineering that prepares the data               the base stations share a spatio-temporal structure instead.
for learning (§4.1). Next, we describe CellScope’s MTL for-              Thus, we proposes a new domain-specific MTL formulation.
mulation (§4.2), discussing how it lets us build fast, accurate,         4.2.1   CellScope’s MTL Formulation
and incremental models. Then, we explain how CellScope
achieves grouping that captures commonalities among base                 In order to address the difficulty in applying MTL due to
stations using a novel PCA based partitioner (§4.3). Finally,            the violation of task dependency assumption in RANs, we
we summarize our approach in §4.4.                                       can leverage domain-specific characteristics. Although in-
                                                                         dependent learning tasks (learning per base station) are not
4.1   Feature Engineering                                                correlated with each other, they exhibit specific non-random
Feature engineering, the process of transforming the raw                 structure. For example, the performance characteristics of
input data to a set of features that can be effectively utilized         base stations nearby are influenced by similar underlying
by machine learning algorithms, is a fundamental part of ML              features. Thus, we propose exploiting this knowledge to seg-
applications [51]. Generally carried out by domain experts, it           regate learning tasks into groups of dependent tasks on which
is often the first step in applying learning techniques.                 MTL can be applied. MTL in the face of dependency viola-
   Bearer-level traces contain several hundred fields associ-            tion has been studied in the machine learning literature in the
ated with LTE network procedures. Unfortunately, many of                 recent past [20, 25]. However, they assume that each group
these fields are not suitable for model building as it is. Several       has its own set of features. This is not entirely true in our
fields are collected in a format that utilizes a compact repre-          setting, where multiple groups may share most or all features
sentation. For instance, block error rates need to be computed           but still need to be treated as separate groups. Furthermore,
across multiple records to account for time. Further, these              some of the techniques used for automatic grouping without
records are not self-contained, and multiple records need to             a priori knowledge are computationally intensive.
be analyzed to create a feature for a certain procedure. In §7,             Assuming we can club learning tasks into groups, we can
we describe in detail many of the specific feature engineerings          rewrite the MTL eq. (2) to captures this structure as
that helped CellScope uncover issues in the network.                                   h(x) = mg(bs) ( f1 (x), f2 (x), ..., fk (x))        (3)
4.2   Multi-Task Learning                                                where mg(bs) is the per-base station model in group g. We
The latency-accuracy trade-off makes it hard to achieve both             describe a simple technique to achieve this grouping based
low latency and high accuracy in applied machine learning                on domain knowledge in §4.3 and experimentally show that
tasks (§2). The ideal-case scenario in CellScope is if infi-             just grouping by itself can achieve significant gains in §6.
nite amount of data is available per base station with zero                 In theory, the MTL formulation in eq. (3) should suffice for
latency. In this scenario, we would have a learning task for             our purposes as it would perform much better by capturing
each base station that produce a model as an output with the             the inter-task dependencies using grouping. However, this
best achievable accuracy. In reality, our setting has several            formulation still builds an independent model for each base
                                                                     5
station. Building and managing a large amount of models                   each base station. However, just a yes or no answer to such
leads to significant performance overhead and would impede                questions are seldom useful. If there is a sudden increase
our goal of scalability. Scalable application of MTL in a                 in drops, then it is useful to understand if the issue affects a
general setting is an active area of research in machine learn-           complete region and the root cause of it.
ing [31], so we turn to problem-specific optimizations to                    Our MTL approach and the ability to do fast incremen-
address this challenge.                                                   tal learning enables a better solution for anomaly detection
   The model mg(bs) could be built using any class of learning            and diagnosis. Concept drift is a term used to refer the phe-
functions. In this paper, we restrict ourselves to functions of           nomenon where the underlying distribution of the training
the form F(x) = w.x where w is the weight vector associated               data for a machine learning model changes [19]. CellScope
with a set of features x. This simple class of function gives             leverages this to detect anomalies as concept drifts and pro-
us tremendous leverage in using standard algorithms that can              poses a simple technique for it. Since we process incoming
easily be applied in a distributed setting, thus addressing the           data in mini-batches (§5), each batch can be tested quickly
scalability issue. In addition to scalable model building, we             on the existing model for significant accuracy drops. An
must also be able to update the built models fast. However,               anomaly occurring just at a single base station would be
machine learning models are typically hard to update in real              detected by one model, while one affecting a larger area
time. To address this challenge, we discuss a hybrid approach             would be detected by many. Once anomaly has been detected,
to building the models in our MTL setting next.                           finding the root cause is as easy as updating the model and
4.2.2   Hybrid Modeling for Fast Model Updates                            comparing it with the old one.
Estimation of the model in eq. (3) could be posed as an `1                4.3     Data Grouping for MTL
regularized loss minimization problem [45]:                               Having discussed CellScope’s MTL formulation, we now
            min ∑ L(h(x : fbs ), y) + λ ||R(x : fbs )||        (4)        turn our focus towards how CellScope achieves efficient
                                                                          grouping of cellular datasets that enables accurate learning.
where L(h(x : fbs ), y) is a non-negative loss function com-              Our data partitioning is based on Principal Component Anal-
posed of parameters for a particular base station, hence cap-             ysis (PCA), a widely used technique in multivariate analy-
turing the error in the prediction for it in the group, and λ > 0         sis [32]. PCA uses an orthogonal coordinate transformation
is a regularization parameter scaling the penalty R(x : fbs ) for         to map a given set of points into a new coordinate space. Each
the base station. However, the temporal and streaming nature              of the new subspaces are commonly referred to as a principal
of the data collected means that the model must be refined                component. Since the coordinate space is equal to or smaller
frequently for minimizing staleness.                                      than the original , PCA is used for dimensionality reduction.
   Fortunately, grouping provides us an opportunity to solve                  In their pioneering work, Lakhina et.al. [28] showed the
this. Since the base stations are grouped into correlated task            usefulness of PCA for network anomaly detection. They
clusters, we can decompose the features used for each base                observed that it is possible to segregate normal behavior and
station into a shared common set fc and a base station specific           abnormal (anomalous) behavior using PCA—the principal
set fs . Thus, we can modify the eq. (4) as minimizing                    components explain most of the normal behavior while the
                                                                          anomalies are captured by the remaining subspaces. Thus, by
                                           !
 ∑ ∑ L(h(x : fs ), y) + λ ||R(x : fs )||     + λ ||R(x : fc )|| (5)       filtering normal behavior, it is possible to find anomalies that
                                                                          may otherwise be undetected.
where the inner summation is over dataset specific to each                    While the most common usecase for PCA has been dimen-
base station. This separation gives us a powerful advantage.              sionality reduction (in machine learning domains) or anomaly
Since we already grouped base stations, the feature set fs                detection (in networking domain), we use it in a novel way,
is minimal, and in most cases just a weight vector on the                 to enable grouping of datasets for multi-task learning. Due
common feature set rather than a complete new set of features.            to the lack of the ability to collect sufficient amount of data
   Because the core common features do not change often,                  from individual base stations, detecting anomalies in them
we need to update only the base station-specific parts in the             will not yield results. However, the data would still yield an
model frequently, while the common set can be reused. Thus,               explanation of normal behavior. We use this observation to
we end up with a hybrid offline-online model. Furthermore,                partition the dataset. We describe our notation first.
the choice of our learning functions lets us apply stochastic             4.3.1    Notation
methods [39] which can be efficiently parallelized.
                                                                          Since bearer level traces are collected continuously, we con-
4.2.3   Anomaly Detection Using Concept Drift                             sider a buffer of bearers as a measurement matrix A. Thus,
A common use case of learning tasks for RAN performance                   A consists of m bearer records, each having n observed pa-
analysis is in detecting anomalies. For instance, an operator             rameters making it an m × n time-series matrix. It is to be
may be interested in learning if there is a sudden increase               noted that n is in the order of a few 100 fields, while m can
in call drops. At the simplest level, it is easy to answer this           be much higher depending on how long the buffering interval
question by simply monitoring the number of call drops at                 is. We enforce n to be fixed in our setting—every measure-
                                                                      6
ment matrix must contain n columns. To make this matrix                happen in nearby areas, and drops might be concentrated)5 .
amenable to PCA analysis, we adjust the columns to have                However, SFCellScope doesn’t capture this phenomenon be-
zero mean. By applying PCA to any measurement matrix                   cause it only considers similarity in normal behavior. Con-
A, we can obtain a set of k principal components ordered by            sequently, it is possible for anomaly detection algorithms
amount of data variance they capture.                                  to miss geographically-relevant anomalies. To account for
                                                                       this domain-specific characteristic, we augment our simi-
4.3.2   PCA Similarity                                                 larity metric to also capture the geographical closeness by
                                                                       weighing the metric by geographical distance between the
It is intuitive to see that many measurement matrices may be
                                                                       two measurement matrices. Our final similarity metric is6 :
formed based on different criteria. Suppose we are interested
in finding if two measurement matrices are similar. One way                                                        k   n
to achieve this is to compare the principal components of the                   SFCellScope = wdistance(A,B) × ∑ ∑ |ai j − bi j |
                                                                                                                  i=1 j=1
two matrices. Krzanowski [27] describes such a Similarity
Factor (SF). Consider two matrices A and B having the                  4.3.4    Using Similarity Metric for Partitioning
same number of columns, but not rows. The similarity factor
between A and B is defined as:                                         With similarity metric, CellScope can now partition bearer
                                                                       records. We first group the bearers into measurement matri-
                                      k    k                           ces by segregating them based on the cell on which the bearer
           SF = trace(LM 0 ML0 ) = ∑ ∑ cos2 θi j                       originated. The grouping is based on our observation that
                                     i=1 j=1
                                                                       the cell is the lowest level at which an anomaly would mani-
where L, M are the first k principal components of A and               fest. We then create a graph G(V, E) where the vertices are
B, and θi j is the angle between the ith component of A and            the individual cell measurement matrices. An edge is drawn
the jth component of B. Thus, similarity factor considers all          between two matrices if the SFCellScope between them is be-
combinations of k components from both the matrices.                   low a threshold. To compute SFCellScope , we simply use the
                                                                       geographical distance between the cells as the weight. Once
4.3.3   CellScope’s Similarity Metric                                  the graph has been created, we run connected components
                                                                       on this graph to obtain the partitions. The use of connected
Similarity in our setting bears a slightly different notion: we        component algorithm is not fundamental, it is also possible
do not want strict similarity between measurement matrices,            to use a clustering algorithm instead. For instance, a k-means
but only need similarity between corresponding principal               clustering algorithm that could leverage SFCellScope to merge
components. This ensures that algorithms will still capture            clusters would yield similar results.
the underlying major influences and trends in observation
sets that are not exactly similar. Unfortunately, SF does not          4.3.5    Managing Partitions Over Time
fit our requirements; hence, we propose a simpler metric.              One important consideration is managing and handling group
    Consider two measurement matrices A and B as before,               changes over time. To detect group changes, it is necessary
where A is of size mA ×n and B is of size mB ×n. By applying           to establish correspondence between groups across time inter-
PCA on the matrices, we can obtain k principal components              vals. Once this correspondence is established, CellScope’s
using a heuristic. We obtain the first k components which              hybrid modeling makes it easy to accommodate changes. Due
capture 95% of the variance. From the PCA, we obtain the               to the segregation of our model into common and base station
resulting weight vector, or loading, which is a n × k matrix:          specific components, small changes to the group do not affect
for each principal component in k, the loading describes the           the common model. In these cases, we can simply bootstrap
weight on the original n features. Intuitively, this can be seen       the new base station using the common model, and then start
as a rough measure of the influence of each of the n features          learning specific features. On the other hand, if there are
on the principal components. The Euclidean distance between            significant changes to a group, then the common model may
the corresponding loading matrices gives us similarity:                no longer be valid, which is easy to detect using concept drift.
                                                                       In such cases, the offline model could be rebuilt.
                        k              k   n
        SFCellScope = ∑ d(ai , bi ) = ∑ ∑ |ai j − bi j |               4.4     Summary
                      i=1             i=1 j=1
                                                                       We now summarize how CellScope resolves the fundamental
                                                                       trade-off between latency and accuracy. To cope with the fact
where a and b are the column vectors representing the load-
ings for the corresponding principal components from A and             5 Proposals for conducting geographically weighted PCA (GW-PCA)
B. Thus, SFCellScope captures how closely the underlying               exist [21], but they are not applicable since they assume a smooth
features explain the variation in the data.                            decaying user provided bandwidth function.
                                                                       6 A similarity measure for multivariate time series is proposed
   Due to the complex interactions between network compo-              in [48], but it is not applicable due to its stricter form and dependence
nents and the wireless medium, many of the performance is-             on finding the right eigenvector matrices to extend the Frobenius
sues in RANs are geographically tied (e.g., congestion might           norm.
                                                                   7
grouped = DStream . groupBySimilarityAndWindow                        methods, it is easy to implement our hybrid online-offline
        ( windowDuration , slideDuration )                            model; the shared features can be incorporated as a static
reduced = DStream . reduceBySimilarityAndWindow                       model and the per base station model can be a separate input.
        ( func , windowDuration , slideDuration )
 joined = DStream . joinBySimilarityAndWindow                           We modified the MLlib implementation of Gradient Boosted
        ( windowDuration , slideDuration )                            Tree (GBT) [18] model, an ensemble of decision trees. This
                                                                      implementation supports both classification and regression,
               Listing 1: CellScope’s Grouping API
                                                                      and internally utilizes stochastic methods. Our modification
                                                                      supports a cached offline model in addition to online models.
that individual base stations cannot produce enough data              To incorporate incremental and window methods, we simply
for learning in a given time budget, CellScope uses MTL.              add more models to the ensemble when new data comes in.
However, our datasets violate the assumption of learning task         This is possible due to the use of stochastic methods. We also
dependencies. As a solution, we proposed a novel way of               support weighing the outcome of the ensemble, so as to give
using PCA to group data into sets with the same underlying            more weights to the latest models.
performance characteristics. Directly applying MTL on these
groups would still be problematic in our setting due to the           6     Evaluation
inefficiencies with model updates. To solve this, we proposed
                                                                      We have evaluated CellScope through a series of experiments
a new formulation for MTL which divides the model into an
                                                                      on real-world cellular traces from a live LTE network from a
offline and online hybrid. On this formulation, we proposed
                                                                      large geographical area. Our results are summarized below:
using simple learning functions are amenable to incremental
and distributed execution. Finally, CellScope uses a simple               • CellScope’s similarity based grouping provides up to
concept drift detection to find and diagnose anomalies.                     10% improvement in accuracy on its own compared to
                                                                            the best case scenario of space partitioning schemes.
5     Implementation                                                      • With MTL, CellScope’s accuracy improvements range
                                                                            from 2.5× to 4.4× over different collection latencies.
We have implemented CellScope on top of Spark [49], a
                                                                          • Our hybrid online-offline model is able to reduce model
big data cluster computing framework. In this section, we
                                                                            update times upto 4.8× and is able to learn changes in
describe its API that exposes our commonality based group-
                                                                            an online fashion with virtually no loss in accuracy.
ing based on PCA (§5.1), and implementation details on the
hybrid offline-online MTL models (§5.2).                              We discuss these results in detail in the rest of this section.
                                                                      Evaluation Setup: Due to the sensitive nature of our dataset,
5.1   Data Grouping API                                               our evaluation environment is a private cluster consists of
CellScope’s grouping API is built on Spark Streaming [50],            20 machines. Each machine consists of 4 CPUs, 32GB of
since the data arrives continuously, and we need to operate           memory and a 200GB magnetic hard disk.
on this data in a streaming fashion. Spark Streaming already          Dataset: We collected data from a major metro-area LTE
provides support for windowing functions on streams of data,          network occupying a large geographical area for a time period
thus we extended the windowing functionality with three               of over 10 months. The network serves more than 2 million
APIs in listing 1. In this section, we use the words grouping         active users and carries over 6TB of traffic per hour.
and partitions interchangeably.                                       6.1   Benefits of Similarity Based Grouping
    Both the APIs leverage the DStream abstraction provided
by Spark Streaming. The groupBySimilarityAndWindow                    We first attempt to answer the question "How much benefits
takes the buffered data from the last window duration, applies        do the similarity based grouping provide?". To answer this
the similarity metric to produce outputs of grouped datasets          question, we conducted two experiments, each with a differ-
(multiple DStreams) every slide duration. The reduceBy-               ent learning algorithm. The first experiment, whose aim is
SimilarityAndWindow allows an additional user defined as-             to detect call drops, uses a classification algorithm while the
sociative reduction operation on the grouped datasets. Finally,       second, whose aim is to predict throughput, uses a regression
it also provides a joinBySimilarityAndWindow which joins              algorithm. We chose these to evaluate the benefits in two
multiple streams using similarity. We found these APIs suffi-         different classes of algorithms. In both these cases, we pick
cient for most of the operations, including group changes.            the data collection latency where the per base station model
                                                                      gives the best accuracy, which was 1 hour for classification
5.2   Hybrid MTL Modeling                                             and 5 minutes for regression. In order to compare the bene-
We use Spark’s machine learning library, MLlib [41] for               fits of our grouping scheme alone, we build a single model
implementing our hybrid MTL model. MLlib contains the                 per group instead of applying MTL. We compare the accu-
implementation of many distributed learning algorithms. To            racy obtained with three different space partitioning schemes.
leverage the many pre-existing algorithms in Mllib, we imple-         The first scheme (Spatial 1) just partitions space into grids
mented our multi-task learning hybrid model as an ensemble            of equal size. The second (Spatial 2) uses a sophisticated
method [13]. By definition, ensemble methods use multiple             space-filling curve based approach [23] that could create dy-
learning algorithms to obtain better performance. Given such          namically size partitions. Finally, the third (Spatial 3) creates
                                                                  8
                             �����               ���������                                                                                                         ����
                                                                                     ���                                ����
                          ���������             ��������                                    ������������                                                            ���
                                                                                                                               ������������������
                          ���������        ������������                              ���                                ����
                                                                                                                                                    ������������
                                                                      ������������
               ����                                                                  ���                                ����                                        ���
������������
                ���
                                                                                     ���                                ����                                        ���
                ���
                ���                                                                  ���                                ����                                                         ����������������
                                                                                                                                                                    ���
                ���                                                                  ���                                ����                                                               ���������
                ���                                                                                                                                                  ��
                                                                                      ��                                ��
                             ������������� ����������                                                                                                                     �   �     �    �     �    ��    ��
                                                                                           ����� ������ ������ �����
                                          �����                                                                                                                           ���������������������������������
                                                                                                 �����������
(a) CellScope’s partitioning by itself is able to pro-                                                                                              (c) CellScope achieves up to 2.5× accuracy improve-
                                                                                      (b) Partitioning overheads are minimal
vide significant gains. MTL provides further gains                                                                                                  ments in drop rate classification
               ����                                                                                                                                                ����
                                                                                     ���
                ���                                                                  ���     �����                                                                  ���
������������
                                                                                                                                                    ������������
                                                                                     ��� ��������
                                                                      ������������
                ���                                                                         ������                                                                  ���
                                                                                     ���
                ���                                                                  ���                                                                            ���
                ���                   ����������������                               ���                                                                            ���              ����������������
                                            ���������                                ���                                                                                                   ���������
                 ��                                                                                                                                                  ��
                                                                                      ��
                      �       �       �     �       �    ��   ��                                                                                                          �   �     �    �     �    ��    ��
                                                                                             ���������� ������       �����
                      ���������������������������������                                                                                                                   ���������������������������������
                                                                                                     �����������
(d) Improvements in throughput model regression go                                                   (f) Online training due to the hybrid model helps
                                                   (e) Hyrid model reduces update time by up to 4.8×
up to 4.4×                                                                                           incur almost no loss in accuracy due to staleness.
Figure 4: CellScope is able to achieve high accuracy while reducing the data collection latency.
partitions using base stations that are under the same cellular                                                proves compared to the earlier approach of a single model
region. The results are shown in fig. 4a.                                                                      per group. The results are presented in fig. 4a. The ability of
   CellScope’s similarity grouping performs as good as the                                                     MTL to learn and improve models from other similar base
per base station model which gives the highest accuracy. It                                                    stations’ data results in an increase in the accuracy. Over
is interesting to note the performance of spatial partitioning                                                 the benefits of grouping, we see an improvement of 6% in
schemes which ranges from 75% to 80%. None of the spatial                                                      the connection drop diagnosis experiment, and 16.2% in the
schemes come close to the similarity grouping results. This is                                                 case of throughput prediction experiment. The higher bene-
because the drops are few, and concentrated. Spatial schemes                                                   fits in the latter comes from CellScope’s ability to capture
club base stations not based on underlying drop character-                                                     individual characteristics of the base station. This ability is
istics, but only based on spatial proximity. This causes the                                                   not so crucial in the former because of the limited variation in
algorithms to underfit or overfit. Since our similarity based                                                  individual characteristics over those found by the grouping.
partitioner groups base stations using the drop characteris-                                                   6.3    Combined Benefits of Grouping and MTL
tics, it is able to do as much as 17% better than the spatial
schemes.                                                                                                       We now evaluate the combined benefits of grouping and MTL
   The benefits are even higher in the regression case. Here,                                                  under different data collection latencies. Here, we are in-
the per base station model is unable to get enough data to                                                     terested in evaluating how CellScope handles the latency
build an accurate model and hence is only able to achieve                                                      accuracy trade-off. To do this, we do the same classification
around 66% accuracy. Spatial schemes are able to do slightly                                                   and regression experiments, but on different data collection
better than that. Our similarity based grouping emerges as                                                     latencies instead of one. We show the results from the clas-
a clear winner in this case with 77.3% accuracy. This result                                                   sification experiment in fig. 4c and that from the regression
depicts the highly variable performance characteristics of the                                                 experiment in fig. 4d, which compares CellScope’s accuracy
base stations, and the need to capture them for accuracy.                                                      against a per base station model’s.
   These benefits do not come at the cost of higher compu-                                                        When the opportunity to collect data at individual base
tational overhead to do the grouping. Figure 4b shows the                                                      stations is limited, CellScope is able to leverage our MTL
overhead of performing similarity based grouping on various                                                    formulation to combine data from multiple base stations,
dataset sizes. It is clear that even very large datasets can be                                                and build customized models to improve the accuracy. The
easily partitioned with very little overhead.                                                                  benefits of CellScope ranges up to 2.5× in the classification
                                                                                                               experiment, to 4.4× in the regression experiment. Lower
6.2             Benefits of MTL                                                                                latencies are problematic in the classification experiment
Next, we characterize the benefits of CellScope’s use of                                                       due to the extremely low probability of drops, while higher
MTL. To do this, we repeated the experiment before, and                                                        latencies are a problem in the regression experiment due to
apply MTL to the grouped data to see if the accuracy im-                                                       the temporal changes in performance characteristics.
                                                                                                           9
6.4   Hybrid model benefits                                                             LTE Physical Layer Parameters
                                                                        Name                          Description
Finally, we evaluate the benefits of our hybrid modeling.               RSRP        Reference Signal Received Power: Average of ref-
Here, we are interested in learning how much overhead does                          erence singal power (in watts) across a specified
it reduce during model updates, and can it do online learning.                      bandwidth. Used for cell selection and handoff.
                                                                        RSRQ        Reference Signal Recieved Quality: Indicator of
    To answer the first question, we conducted the following                        interference experienced by the UE. Derived from
experiment: we considered three different data collection                           RSRP and interference metrics.
latencies: 10 minute, 1 hour and 1 day. We then learn a                  CQI        Channel Quality Indicator: Carries information on
decision tree model on this data in a tumbling window fashion.                      how good/bad communication channel quality is.
So for the 10 minute latency, we collect data for 10 minutes,            SINR       Signal to Interference plus Noise Ratio: Indicates
                                                                                    the ratio of the power of the signal to the interference
then build a model, wait another 10 minutes to refine the                           power and background noise.
model and so on. We compare our hybrid model strategy                   BLER        Block Error Ratio/Rate: Ratio of the number of erro-
to two different strategies: a naive approach which rebuilds                        neous blocks received to the total blocks sent.
the model from scratch every time, and a better, strawman                PRB        Physical Resource Block: The specific number of
approach which reuses the last model, and makes changes                             subcarriers allocated for a predetermined amount of
                                                                                    time for a user.
to it. Both builds a single model while CellScope uses our
hybrid MTL model and only updates the online part of the                      Table 1: A description of key parameters in LTE physical layer
model. The results of this experiment is shown in fig. 4e.
    The naive approach incurs the highest overhead, which                    • We find that connection drop is mainly due to uplink
is obvious due to the need to rebuild the entire model from                    SINR and then downlink SINR, and that RSRQ is more
scratch. The overhead increases with the increase in input                     reliable than downlink CQI.
data. The strawman approach, on the other hand, is able to                   • Our cell performance analysis shows that many un-
avoid this heavy overhead. However, it still incurs overheads                  known connection drops can be explained by coverage
with larger input because of its use of a single model which                   and uplink interference, and that throughput is seriously
requires changes to many parts of the tree. CellScope incurs                   impacted by inefficient link adaptation algorithm.
the least overhead, due to its use of multiple models. When            7.1     Analyzing Call Drop Performance
data accumulates, it only needs to update a part of an existing
tree, or build a new tree. This strategy results in a reduction        Operators are constantly striving to reduce the amount of
of up to 2.2× to 4.8× in model building time for CellScope.            drops in the network. With the move towards carrying voice
    To wrap up, we evaluated the performance of the hybrid             data also over the data network (VoLTE), this metric has
strategy on different data collection intervals. Here we are           gained even more importance. This section describes how we
interested in seeing if the hybrid model is able to adapt to           used CellScope to analyze drop performance.
data changes and provide reasonable accuracies. We use the             7.1.1     Feature Engineering
connection drop experiment again, but do it in a different
                                                                       Call drops are normally due to one of three metrics:
way. At different collection latencies, we build the model
at the beginning of the collection and use the model for the           Coverage It is intuitive to see that poor coverage leads to
next interval. Hence, for the 1 minute latency, we build a             dropped calls. As seen from fig. 5a, areas with RSRP < −130
model using the first minute data, and use the model for the           dBm have very high connection drop probability.
second minute (until the whole second minute has arrived).             Downlink interference For downlink interference, we con-
The results are shown in fig. 4f. We see here that the per             sider two metrics: RSRQ and downlink CQI. RSRQ is only
base station model suffers an accuracy loss at higher latencies        reported when the UE might need to handoff. CQI is avail-
due to staleness, while CellScope incurs almost zero loss in           able independent of handoffs. From fig. 5b and fig. 5c, we see
accuracy. This is because it doesn’t wait until the end of the         that the distributions do not match. To reveal the difference
interval, and is able to incorporate data in real time.                of these two distribution, we converted them to the common
                                                                       SINR. To convert CQI, we use the CQI to SINR table. For
7     RAN Performance Analysis Using CellScope                         RSRQ, we use the formula derived in [35], SINR = 1 1          .
                                                                                                                                       12RSRQ −ρ
To validate our system in real world, we now show how do-              ρ depends on subcarrier utilization. For two antennas, it is
main experts can use CellScope to build efficient RAN per-             between 1/3 and 5/3. For connection failure cases, we show
formance analysis solutions. To analyze RAN performance,               the emperical distribution of their SINR differences with 0%,
we consider two metrics that are of significant importance             50% and 100% subcarrier utilization in fig. 5d. We see that
for end-user experience: throughput and connection drops.              10% has a SINR difference of 10 dB. After revealing our
Our findings from the analysis are summarized below:                   finding to experts, it was discovered that P-CQI feedbacks
    • Our bearer performance analysis reveals interesting              through physical uplink control channel are not CRC pro-
      findings on the inefficiencies of P-CQI detection mech-          tected. When spurious P-CQIs are received, the physical link
      anism and link adaptation algorithm. Both of these               adaptation algorithm might choose an incorrect rate resulting
      findings were previously unknown to the operator.                in drops.
                                                                  10
                ����                                                             ��                                                         ����                                                             ��
                ����                                                           ����                                                                                                                        ����        �����
                ����                                                           ����                                                        �����                                                           ����
                                                                               ����                                                                                                                        ����         ���
  �����������
�����������
�����������
                                                                                                                                                                                             �����������
                ����                                                                                                                        ����                                                                       �����
                ����                                                           ����                                                                                                                        ����
                ����                                                           ����                                                        �����                                                           ����
                                                                               ����                                                                                                                        ����
                ����                                                           ����                                                         ����                                                           ����
                ����                                                           ����                                                        �����                                                           ����
                ����                                                           ����                                                                                                                        ����
                  ��                                                             ��                                                           ��                                                             ��
                   ����������������������� ��� ��� ��� ��� ���                     ��� ��� ��� ��� ��� ��� �� �� �� �� ��                          ��    ��   ��   ��   �� ��� ��� ��� ���                        ��       ��        ��     ���        ���   ���
                                  ����������                                                    ���������                                                               ���                                                     ��������������������
(a) Coverage (b) Downlink RSRQ (c) Downlink CQI (d) SINR gap
Figure 5: Factors affecting call drops. Although RSRQ and CQI measures downlink interference, they do not match showing an anomaly.
Uplink interference As shown in fig. 6a, the drop probability                                                                              be investigated by the expert. However, CellScope explained
for uplink SINR has a rather steep slope around and peaks                                                                                  them to relieve the burden off the operator.
at -17dB. The reason is that the scheduler stops allocating                                                                                   To estimate the confidence in our prediction, we analyzed
grants at this threshold.                                                                                                                  our results during the occurrence of these anomalies. We con-
                                                                                                                                           sider each connection drop or complete event as a Bernoulli
7.1.2               Decision Tree Model for Connection Drops                                                                               random variable with probability p (from decision tree). A
Based on feature engineering, we picked features that accu-                                                                                sequence of n connection events follow a binomial distri-
rately depict call drops. We then used CellScope to train                                                                                  bution.pThe 95% confidence interval is approximated by
a decision tree that explains the root causes for connection                                                                               np ± 2 np(1 − p). We determine that the alarm is false if
drops. One of the learned trees is shown in fig. 6b. As we                                                                                 X is within the confidence interval. For this particular exper-
see, the tree first classifies drops based on uplink SINR, and                                                                             iment, the bound was found to be (0.7958665, 0.8610155),
then makes use of RSRQ if available. We confirmed with                                                                                     thus we conclude that CellScope was successful.
experts that the model agrees with their experience. Uplink                                                                                7.2          Throughput Performance Analysis
SINR is more unpredictable because the interference comes
from subscribers associated with neighboring base stations.                                                                                Our traces report information that lets us compute RLC
In contrast, downlink interference is from neighboring base                                                                                throughput as ground truth. We would like to model how far
stations. CellScope’s models achieved an overall accuracy                                                                                  the actual RLC throughput is from the predicted throughput
of 92.1% here, while neither a per base station model nor a                                                                                using physical layer and MAC sub-layer information. This
global model was able to accurately identify Uplink SINR as                                                                                helps us understand the contributing factors of throughput.
the cause and attained less than 80% accuracy.
                                                                                                                                           7.2.1              Feature Engineering
7.1.3               Detecting Cell KPI Change False Positives Using
                                                                                                                                           SINR Estimation The base stations have two antennas and
                    Concept Drift and Incremental Learning
                                                                                                                                           are capable of MIMO spatial multiplexing (two streams) or
An interesting application of CellScope’s hybrid model is                                                                                  transmit diversity. For both transmissions, each UE reports its
in detecting false positives of KPI changes. As explained                                                                                  two wideband CQIs. We use the CQI to SINR mapping table
earlier, state-of-the-art performance problem detection sys-                                                                               used at the base station scheduler to convert CQI to SINR.
tems monitor KPIs, and raise alarms when thresholds are                                                                                    For transmission diversity, we convert the two CQIs to a sin-
crossed. A major issue with these systems is that the alarms                                                                               gle SINR as follows. First convert both CQIs to SINR, then
get raised even for known root causes. However, operators                                                                                  compute the two spectrum efficiencies (bits/sec/Hz) using
cannot confirm this without manual investigation resulting in                                                                              Shannon capacity. We average the two spectrum efficiencies
wasted time and effort. This problem can be solved if known                                                                                and convert it back to SINR. We then add a 3dB transmission
root causes can be filtered out before raising the alarm.                                                                                  diversity gain to achieve the final SINR. For spatial multi-
   We illustrate this using drop rate. To do so, we use Cell-                                                                              plexing, we convert the two CQIs to two SINRs.
Scope to apply the decision tree in an incremental fashion on                                                                              Account for PRB control overhead and BLER target Each
a week worth of data divided into 10 minute interval windows.                                                                              PRB is 180 KHz. But not all of it is used for data transmis-
We used this window length since it matches closely with an                                                                                sion. For transmit diversity, a 29% overhead is incurred per
interval that is usually used by the operators for monitoring                                                                              PRB on average because of resources allocated to physical
drop rates. In every window, we predict the number of drops                                                                                downlink control channel, broadcast channel and reference
using our technique. The predicted drops are explainable,                                                                                  signals. The BLER target is 10%.
because we know precisely why those drops happened. We                                                                                     Account for MAC sub-layer retransmissions The MAC
use a threshold of 0.5% for the drop rate, hence anything                                                                                  sub-layer performs retransmissions. We denote the MAC
above this threshold is marked as an anomaly. The results                                                                                  efficiency as βMAC . It is computed as the ratio of total first
from this experiment is depicted in fig. 6c. The threshold is                                                                              transmissions over total transmissions. Our traces provide
exceeded at numerous places. Normally, these would have to                                                                                 information to compute βMAC .
                                                                                                                            11
                                                                                                     Yes     Uplink SINR >     No
                �����                                                                                            -11.75                                                      ����                                                �����
                �����                                                                                                                                                       �����                    �����                       �����
                �����                                                                                                                                                                                                            �����
                                                                                                                                                            �������������
                                                                                                                                   Drop
                                                                                                                                                                                                 ���������
  �����������
                                                                                                                                                                                                                   �����������
                �����                                                                   Yes      RSRQ        No
                                                                                                                                                                             ����                                                �����
                �����                                                                          Available?
                                                                                                                                                                            �����                                                 ����
                �����                                                                                                                                                                                                            �����
                �����                                                                                                                                                        ����                                                �����
                                                                       Yes                    No               Yes                     No
                �����                                                         RSRQ > -16.5
                                                                                                                      Uplink SINR >
                                                                                                                          -5.86                                             �����                                                �����
                �����                                                                                                                                                                                                            �����
                   ��                                               Success                   Drop          Success
                                                                                                                             Yes
                                                                                                                                      CQI > 5.875
                                                                                                                                                    No                       ����                                                   ��
                     ��� ���     ��   ���   ���   ���   ���   ���                                                                                                                ��� ��� ��� ��� ��� ���     ���                         ��   ��    ���     ���     ���   ���
                                                                                                                        Success                     Drop
                                 ����������������                                                                                                                                              ���                                                 ��������������
(a) Uplink Interference (b) Sample decision tree (c) Detecting false Alarms (d) Loss of efficiency
Figure 6: Uplink interference also affects drops. CellScope is able to create local models that accurately detects false alarms.
                                                                                                                                                           12
cause these systems rely on traditional database technologies,            groups data from geographically nearby base stations sharing
it is hard for them to provide fine-grained prediction based              performance commonalities. Finally, it also incorporates a
on per-bearer model. In contrast, CellScope is built on top               hybrid online-offline model for efficient model updates. We
of efficient big data system, Apache Spark. One recent com-               have built CellScope on Apache Spark and evaluated it on
mercial cellular network analytics system [6] adopted the                 real data that shows accuracy improvements ranging from
Hadoop big data processing framework. Since it is built on                2.5× to 4.4× over direct applications of ML. We have also
top of WNG [3], it does not have visibility into RANs.                    used CellScope to analyze a live LTE consisting of over 2
Self-Organizing Networks (SON): The goal of SON [1] is                    million subscribers for a period of over 10 months, where it
to make the network capable of self-configuration (e.g. au-               uncovered several problems and insights.
tomatic neighbor list configuration) and self-optimization.                  For future work, we wish to explore the applicability of
CellScope focuses on understanding RAN performance and                    our techniques for resolving the trade-off in other domains
assists troubleshooting, thus can be used to assist SON.                  where similarity based grouping is possible. Further, since
Modelling and diagnosis techniques: Diagnosing problems                   our design is amenable to geo-distributed learning, we wish
in cellular networks has been explored in the literature in var-          to investigate the trade-offs in such settings.
ious forms [8, 22, 29, 33, 43], where the focus of the work
has either been detecting faults or finding the root cause of             Acknowledgments
failures. A probabilistic system for auto-diagnosing faults in            We thank all AMPLab members who provided feedback on
RAN is presented in [8]. It relies on KPIs. However, KPIs                 earlier drafts of this paper. This research is supported in
are not capable of providing diagnosis at high granularity.               part by NSF CISE Expeditions Award CCF-1139158, DOE
Moreover, it is unclear how their proposals capture complex               Award SN10040 DE-SC0012463, and DARPA XData Award
dependencies between different components in RAN. An                      FA8750-12-2-0331, and gifts from Amazon Web Services,
automated approach to locating anomalous events on hier-                  Google, IBM, SAP, The Thomas and Stacey Siebel Foun-
archical operational networks was proposed in [22] based                  dation, Apple Inc., Arimo, Blue Goji, Bosch, Cisco, Cray,
on hierarchical heavy hitter based anomaly detection. It                  Cloudera, Ericsson, Facebook, Fujitsu, HP, Huawei, Intel,
is unclear how their proposals carry over to RAN. Adding                  Microsoft, Pivotal, Samsung, Schlumberger, Splunk, State
autonomous capabilities to alarm based fault detection is dis-            Farm and VMware.
cussed in [29]. While their techniques can help systems auto-
heal faults, correlation based fault detection is insufficient for        References
fine granularity detection and diagnosis of faults. [33] looks             [1] 3 GPP. Self-organizing networks son policy network resource
at detecting call connection faults due to load imbalances.                    model (nrm) integration reference point (irp). http://www.
                                                                               3gpp.org/ftp/Specs/archive/32_series/32.521/.
In [47], a technique to detect and localize anomalies from an
ISP point of view is proposed. Finally, [43] discusses the use             [2] AGGARWAL , B., B HAGWAN , R., DAS , T., E SWARAN , S.,
of ML tools in predicting call drops and its duration.                         PADMANABHAN , V. N., AND VOELKER , G. M. Netprints: di-
Multi-Task Learning: MTL builds on the idea that related                       agnosing home network misconfigurations using shared knowl-
tasks can learn from each other to achieve better statisti-                    edge. In Proceedings of the 6th USENIX symposium on Net-
                                                                               worked systems design and implementation (Berkeley, CA,
cal efficiency [9, 10, 17, 44]. Since the assumption of task                   USA, 2009), NSDI’09, USENIX Association, pp. 349–364.
relatedness do not hold in many scenarios, techniques to auto-
matically cluster tasks have been explored in the past [20, 25].           [3] A LCATEL L UCENT.   9900 wireless network guardian.
However, these techniques consider tasks as black boxes and                    http://www.alcatel-lucent.com/products/
                                                                               9900-wireless-network-guardian, 2013.
hence cannot leverage domain specific structure. In contrast,
CellScope proposes a hybrid offline-online MTL formulation                 [4] A LCATEL L UCENT.    9959 network performance op-
on a domain-specific grouping of tasks based on underlying                     timizer.  http://www.alcatel-lucent.com/products/
performance characteristics.                                                   9959-network-performance-optimizer, 2014.
                                                                     13
 [8] BARCO , R., W ILLE , V., D ÍEZ , L., AND T ORIL , M. Learning            [24] K HANNA , G., Y U C HENG , M., VARADHARAJAN , P.,
     of model parameters for fault diagnosis in wireless networks.                 BAGCHI , S., C ORREIA , M. P., AND V ERÍSSIMO , P. J. Au-
     Wirel. Netw. 16, 1 (Jan. 2010), 255–271.                                      tomated rule-based diagnosis through a distributed monitor
                                                                                   system. IEEE Trans. Dependable Secur. Comput. 4, 4 (Oct.
 [9] BAXTER , J. A model of inductive bias learning. J. Artif. Int.                2007), 266–279.
     Res. 12, 1 (Mar. 2000), 149–198.
[10] C ARUANA , R. Multitask learning: A knowledge-based source               [25] K IM , S., AND X ING , E. P. Tree-guided group lasso for
     of inductive bias. In Proceedings of the Tenth International                  multi-task regression with structured sparsity. Intenational
     Conference on Machine Learning (1993), Morgan Kaufmann,                       Conference on Machine Learning (ICML) (2010).
     pp. 41–48.                                                               [26] K RASKA , T., TALWALKAR , A., D UCHI , J. C., G RIFFITH , R.,
[11] C OHEN , I., G OLDSZMIDT, M., K ELLY, T., S YMONS , J., AND                   F RANKLIN , M. J., AND J ORDAN , M. I. Mlbase: A distributed
     C HASE , J. S. Correlating instrumentation data to system states:             machine-learning system. In CIDR (2013).
     A building block for automated diagnosis and control. In
     Proceedings of the 6th Conference on Symposium on Opearting              [27] K RZANOWSKI , W. Between-groups comparison of principal
     Systems Design & Implementation - Volume 6 (Berkeley, CA,                     components. Journal of the American Statistical Association
     USA, 2004), OSDI’04, USENIX Association, pp. 16–16.                           74, 367 (1979), 703–707.
[12] C RANOR , C., J OHNSON , T., S PATASCHEK , O., AND                       [28] L AKHINA , A., C ROVELLA , M., AND D IOT, C. Diagnos-
     S HKAPENYUK , V. Gigascope: a stream database for net-                        ing network-wide traffic anomalies. In Proceedings of the
     work applications. In Proceedings of the 2003 ACM SIGMOD                      2004 Conference on Applications, Technologies, Architectures,
     international conference on Management of data (New York,                     and Protocols for Computer Communications (New York, NY,
     NY, USA, 2003), SIGMOD ’03, ACM, pp. 647–651.                                 USA, 2004), SIGCOMM ’04, ACM, pp. 219–230.
[13] D IETTERICH , T. G. Ensemble methods in machine learning.                [29] L IU , Y., Z HANG , J., J IANG , M., R AYMER , D., AND S TRASS -
     In Multiple classifier systems. Springer, 2000, pp. 1–15.                     NER , J. A model-based approach to adding autonomic capa-
                                                                                   bilities to network fault management system. In Network
[14] E LJAAM , B. Customer satisfaction with cellular network                      Operations and Management Symposium, 2008. NOMS 2008.
     performance: Issues and analysis.                                             IEEE (April 2008), pp. 859–862.
[15] E RICSSON. Ericsson RAN analyzer overview. http://
                                                                              [30] M URPHY, K. P. Machine learning: a probabilistic perspective.
     www.optxview.com/Optimi_Ericsson/RANAnalyser.pdf,
                                                                                   MIT press, 2012.
     2012.
[16] E RICSSON. Ericsson RAN analyzer. http://www.ericsson.                   [31] PAN , Y., X IA , R., Y IN , J., AND L IU , N. A divide-and-
     com/ourportfolio/products/ran-analyzer, 2014.                                 conquer method for scalable robust multitask learning. Neural
                                                                                   Networks and Learning Systems, IEEE Transactions on 26, 12
[17] E VGENIOU , T., AND P ONTIL , M. Regularized multi–task                       (Dec 2015), 3163–3175.
     learning. In Proceedings of the Tenth ACM SIGKDD Interna-
     tional Conference on Knowledge Discovery and Data Mining                 [32] P EARSON , K. On lines and planes of closest fit to systems of
     (New York, NY, USA, 2004), KDD ’04, ACM, pp. 109–117.                         points in space. Philosophical Magazine 2, 6 (1901), 559–572.
[18] F RIEDMAN , J. H. Greedy function approximation: a gradient              [33] R AO , S. Operational fault detection in cellular wireless base-
     boosting machine. Annals of statistics (2001), 1189–1232.                     stations. IEEE Trans. on Netw. and Serv. Manag. 3, 2 (Apr.
                                                                                   2006), 1–11.
[19] G AMA , J. A ., Ž LIOBAIT Ė , I., B IFET, A., P ECHENIZKIY, M.,
     AND B OUCHACHIA , A. A survey on concept drift adaptation.               [34] S AFWAT, A. M., AND M OUFTAH , H. 4g network technologies
     ACM Comput. Surv. 46, 4 (Mar. 2014), 44:1–44:37.                              for mobile telecommunications. Network, IEEE 19, 5 (2005),
[20] G ONG , P., Y E , J., AND Z HANG , C. Robust multi-task feature               3–4.
     learning. In Proceedings of the 18th ACM SIGKDD Interna-                 [35] S ALO , J. Mobility parameter planning for 3GPP LTE: Basic
     tional Conference on Knowledge Discovery and Data Mining                      concepts and intra-layer mobility. www.lteexpert.com/lte_
     (New York, NY, USA, 2012), KDD ’12, ACM, pp. 895–903.                         mobility_wp1_10June2013.pdf.
[21] H ARRIS , P., B RUNSDON , C., AND C HARLTON , M. Geo-
     graphically weighted principal components analysis. Interna-             [36] S ESIA , S., T OUFIK , I., AND BAKER , M. LTE: the UMTS
     tional Journal of Geographical Information Science 25, 10                     long term evolution. Wiley Online Library, 2009.
     (2011), 1717–1736.                                                       [37] S HAFIQ , M. Z., E RMAN , J., J I , L., L IU , A. X., PANG , J.,
[22] H ONG , C.-Y., C AESAR , M., D UFFIELD , N., AND WANG ,                       AND WANG , J. Understanding the impact of network dy-
     J. Tiresias: Online anomaly detection for hierarchical opera-                 namics on mobile video user engagement. In The 2014 ACM
     tional network data. In Proceedings of the 2012 IEEE 32Nd                     International Conference on Measurement and Modeling of
     International Conference on Distributed Computing Systems                     Computer Systems (New York, NY, USA, 2014), SIGMET-
     (Washington, DC, USA, 2012), ICDCS ’12, IEEE Computer                         RICS ’14, ACM, pp. 367–379.
     Society, pp. 173–182.
                                                                              [38] S HAFIQ , M. Z., J I , L., L IU , A. X., PANG , J., V ENKATARA -
[23] I YER , A., L I , L. E., AND S TOICA , I. Celliq : Real-time                  MAN , S., AND WANG , J. A first look at cellular network per-
     cellular network analytics at scale. In 12th USENIX Symposium                 formance during crowded events. In Proceedings of the ACM
     on Networked Systems Design and Implementation (NSDI 15)                      SIGMETRICS/International Conference on Measurement and
     (Oakland, CA, May 2015), USENIX Association, pp. 309–                         Modeling of Computer Systems (New York, NY, USA, 2013),
     322.                                                                          SIGMETRICS ’13, ACM, pp. 17–28.
                                                                         14
[39] S HALEV-S HWARTZ , S., AND T EWARI , A. Stochastic methods             [52] Z HENG , A. X., L LOYD , J., AND B REWER , E. Failure diag-
     for l1-regularized loss minimization. J. Mach. Learn. Res. 12               nosis using decision trees. In Proceedings of the First Interna-
     (July 2011), 1865–1892.                                                     tional Conference on Autonomic Computing (Washington, DC,
                                                                                 USA, 2004), ICAC ’04, IEEE Computer Society, pp. 36–43.
[40] S MITH , C. 3G wireless networks. McGraw-Hill, Inc., 2006.
[44] T HRUN , S. Is learning the n-th thing any easier than learning
     the first? In Advances in Neural Information Processing
     Systems (1996), The MIT Press, pp. 640–646.
[46] WANG , H. J., P LATT, J. C., C HEN , Y., Z HANG , R., AND
     WANG , Y.-M. Automatic misconfiguration troubleshooting
     with peerpressure. In Proceedings of the 6th Conference on
     Symposium on Opearting Systems Design & Implementation
     - Volume 6 (Berkeley, CA, USA, 2004), OSDI’04, USENIX
     Association, pp. 17–17.
[47] YAN , H., F LAVEL , A., G E , Z., G ERBER , A., M ASSEY, D.,
     PAPADOPOULOS , C., S HAH , H., AND YATES , J. Argus:
     End-to-end service anomaly detection and localization from
     an isp’s point of view. In INFOCOM, 2012 Proceedings IEEE
     (2012), pp. 2756–2760.
[50] Z AHARIA , M., DAS , T., L I , H., H UNTER , T., S HENKER , S.,
     AND S TOICA , I. Discretized streams: Fault-tolerant streaming
     computation at scale. In Proceedings of the Twenty-Fourth
     ACM Symposium on Operating Systems Principles (New York,
     NY, USA, 2013), SOSP ’13, ACM, pp. 423–438.
15