MSC Dissertation - Sample 5
MSC Dissertation - Sample 5
2
Declaration
I hereby declare that except where specific reference is made to the work of others, the
contents of this dissertation are original and have not been submitted in whole or in part
for consideration for any other degree or qualification in this, or any other university.
This dissertation is my own work, except as specified in the text.
The copyright of this dissertation belongs to the author under the terms of the UK Copy-
right acts as amended by The University of West London regulations. Due
acknowledgement must always be made of the use of any materials contained in, or
derived from this thesis.
Lakini Senanayaka
3
Acknowledgement
As an international student, It has been an interesting time and a great opportunity to
study for the MSc in Cyber Security at the University of West London. During my period
of studying, I learnt a lot of new technologies, met a vast variety of interesting people
who showed me the meaning of my life and truly believed in me and encouraged me to
be the best.
Dr Wei Jie is one of the interesting people I met during my studies at my university and I
am extremely grateful to have him as my supervisor. I am really grateful for his
continuous support, invaluable advice and his wonderful energy which made me work
energetically on my master's thesis during the COVID pandemic season. Also, his
immense knowledge and experience in the field encouraged me to do great research.
Furthermore, this thesis is a combination of all the materials I learnt during my Master's
studies. Therefore, I also want to thank all the academic staff including all the lecturers
at the University of West London for the academic years of 2019/2020 and 2020/2021.
Not only that but also, I received amazing support and guidance from all my colleagues
at VeeLoop, especially my dearest CEO of the company, Randa Bennet. She always
encouraged me to pay more attention to my studies and helped me to balance studies
and work life. I am really lucky to work with her.
It is all of their kind help, encouragement and support that have made my study and life
in the UK a wonderful one.
4
Abstract
At present, “Cyber Attacks” receive a great amount of attention from the community due
to its highly destructive behaviors which can cause massive damages to the networks
and illegal access to the sensitive information. Attackers attack different systems due to
political or monetary gain. With the vast development of the technology, cyber attacks
including malware, phishing, denial of service attacks are getting more common and
hard to exploit from the cyber defence systems deployed on the networks. Therefore,
for the purpose of protecting the companies/systems from cyber attacks and threats, the
defensive mechanism of the systems also should be updated with the evolution of new
technologies. Some of the major cyber attacks in 2021 can be summarized as follows.
CNA Financials, which is the biggest cyber insurance firm in the USA was attacked by
ransomware and it disrupted the service for 3 continuous days. The recent cyber attack
on the Florida Water System also tried to poison the water supply by increasing its
chemical concentrations. Similarly, Microsoft’s Exchange Server which is used by most
of the government bodies and private companies was exploited by attackers and
caused a global, major effect (Meharchandani, 2021).
This paper introduces a new approach of prediction framework for cyber attack rate
using deep learning algorithms called BRNN-GRU aka Bi-Directional Recurrent Neural
Network with Gated Recurrent Units. This framework is optimised and fine-tuned for
high accuracy predictions with minimised error rates. The prediction capability of this
framework can guide the network defenders to proactively allocate their resources in a
cost-effective manner to reduce the severity damages of the incoming attacks or totally
stop them in the defender levels. Also, this paper directs the readers to the other
potential approaches to build deep learning models with elevated accuracy rates, such
as the usage of Generative Adversarial Networks. Furthermore, the BRNN-GRU
framework has early warning prediction capabilities.
5
Table of Contents
Acknowledgement 4
Abstract 5
Table of Contents 6
List of figures 8
List of Tables 8
List of Algorithms 8
Introduction 10
Literature Review 12
Deception Technologies 12
HoneyPots and HoneyNets 12
Network Telescopes 13
Definitions 15
Definitions for Statistical Approaches 15
Stochastic Process 15
Poisson Distributions 15
Stationarity Process 15
Long Range Dependence (LRD) 16
Autoregressive Moving Average Model (ARMA) 16
Fractional AutoRegressive Integrated Moving Average Model (FARIMA) 16
Generalized AutoRegressive Conditional Heteroskedasticity Model (GARCH) 17
Extreme Value Phenomena 17
Standard Autoregressive Conditional Duration (Standard ACD) 18
Log Autoregressive Conditional Duration (Standard ACD) 18
Definitions for Deep Learning Approaches 19
Recurrent Neural Network (RNN) 19
Bi Directional Recurrent Neural Network (BRNN) 20
Long - Short Term Memory Units (LSTM) 21
Definitions for Prediction Accuracy 21
Related Work 22
The research problem and question 37
Aims and Objectives 39
The proposed new framework 39
Scenario 39
Definitions 40
6
Gated Recurrent Unit(GRU) 40
Optimizers in deep learning networks 41
Hyperparameters 42
Proposed Algorithm 43
Methodology 44
Current Deep Learning Approaches for Cyber Attack Rate Prediction 44
Data acquisition 45
Data preprocessing 46
Selection of Technologies 47
Tensor processing Units - TPU 47
Tensorflow 48
Keras 48
Model Development 48
Model Evaluation 50
Conclusions 52
References 55
Word Count 59
Appendices 60
Appendix 1 60
7
List of figures
List of Tables
Table 1 : TST based fitting of attack rates per hour using updated Gray Box 30
model
List of Algorithms
Algorithm 1 : Pseudocode for predicting cyber attack rates using Gray Box 25
model(FARIMA+ARIMA)
8
Algorithm 3 : Pseudocode for fitting stationary data to M1 in EVT model in 28
updated Gray Box model(FARIMA+GARCH+EVT)
Algorithm 5 : Pseudocode for predicting cyber attack rates updated Gray Box 30
model(FARIMA+GARCH+EVT)
9
Introduction
The Internet has become a necessity factor in human lives. Almost everything in our
day-to-day life has an impact or involvement with the internet. The Internet helps the
world for quick globalization, fast information sharing and even a company can run fully
over the internet without having a dedicated office space. The Internet is immersive. It
can provide thousands of different services like infrastructure as a service, platform as a
service and also able to communicate with sensors in automobiles, nuclear and thermal
plants. Not only that, but the internet also became more important during the COVID-19
lockdowns. Most of the people had to work remotely and students had to learn remotely.
All of these opportunities open up thanks to the internet. Although cyberspace can
provide thousands of different services for the community, it is also not 100% safe. The
implementation of new and evolving technologies open up more opportunities for
attackers to exploit cyberspace. Attackers and Intruders can steal user information,
exploit large organizations over the internet and make monetary losses and even make
reputation damages. Hence all the users need to have extra attention when working
with the internet. Some people install anti-virus software on their personal computers to
protect from malware, worms - the ways attackers use to exploit the user devices.
Even though cybersecurity is a vast topic to discuss, this paper only focuses on
cybersecurity at the industrial level. Different companies protect their networks from
attackers in different ways. Most companies have separate cybersecurity divisions with
system administrators and ethical hackers to protect their systems. Intrusion detection
systems (IDS), Network-based Intrusion detection systems, Intrusion Prevention
Systems, honeypots and network telescopes also play an important role as deception
technologies in cybersecurity. Intrusion Detection systems monitor the network for
malicious activities or policy violations and trigger alerts for the security division. Then
the system administrators can analyse the alerts and take the necessary steps to stop
the propagation of the attacks or make the defenders ready to mitigate the attacks. At
present, most of these manual workflows are automated with different other
technologies. False-positive alarms in IDS are one of the significant issues faced by the
10
security teams. Not only that but also, the IDS cannot predict future attacks based on
patterns, some IDS cannot identify zero-day attacks and new types of attacks.
Therefore the researchers have made an immeasurable commitment to design and
suggest new methodologies to detect correct attacks, predict and forecast future attacks
and provide prevention methods for the future attacks. By using these approaches, the
system administrators can use their limited resources effectively and protect the
networks. Honeypots and Network Telescopes are popular cyber defence instruments
installed in networks to observe the internet traffic of a network. Honeypots are used to
attract attackers by acting as real production servers and collect information on the
attacks and attack types by interacting with them (Huang et al., 2019). As the internet
and cybersecurity evolve, the attackers' attack patterns also evolve. In Yang Li’s (Li et
al., 2019) research, he explained how attackers identify the dynamic honeypots and he
introduced a distributed honeypot approach to avoid the identification of honeypots. If
the attacker can identify the honeypot, they do not try to access it and then installation
of that honeypot becomes useless in the network. The attackers are getting smarter and
trying to attack stealthily. These results provide confirmatory evidence for the need for
proper detection, prediction and forecasting systems for cyberspace with high accuracy
prediction rates with fewer error values.
This paper is divided into 5 main sections. First section contains an introduction to this
report, the second section explains and analyses the literature related to the current
status of the prediction frameworks. It also discusses different types of deception
technologies, statistical properties in cyberattack data, current research work for cyber
attack rate predictions, identified problem statements, also the aims and objectives of
this research. Third section is dedicated to the details of the new deep learning
framework called BRNN-GRU. This section explains in depth discussion of data
preprocessing, analysing, model building and evaluations. Section four includes the
conclusion of the report and new paths for future work. This report ends with a
reference section.
11
Literature Review
Deception Technologies
12
● Medium - Interaction Honeypots
Medium interaction honeypots can capture more details about the attacks than low
interaction honeypots and less information than high interaction honeypots. These types
of honeypots do not provide access to the system but they are more sensitive to collect
information during attack attempts.
The data in high interaction honeypots come with privacy issues as it contains more
valuable information about attackers and also about the vulnerabilities of the network
and services of the company.
Network Telescopes
These are also dynamic and real-time deception tools which passively monitor the
network traffic. They are built on a portion of routed IP address space in which little or
legitimate traffic exists. Normally, the traffic for the unused address space and
illegitimate traffic is forwarded to the network telescopes for real-time analysis. Some
networks forward the whole network traffic to the network telescopes without sending
any response back to the user (Hunter et al., 2013).
Studying the honeypot data to understand attack patterns, categorise and prioritize
them started a few decades ago. Different researchers used different methodologies to
understand the data. Some of them used visualization techniques to understand the
honeypot data. For example, visualizing the ports observed in honeypots using neural
13
projection techniques (HERRERO et al., 2012). Even Though there are few techniques
to understand the honeypot data, the most common approach is statistical analysis.
With this, the researchers can understand the statistical properties exhibited in the
cyberattack data.
Honeypot and Network telescope data are massive resources for researchers to study
cyber attacks. Based on these datasets, different researchers have proposed and
designed approaches to understand anomaly behaviours and probing in the networks,
infer distance of denial of service attacks and track internet worms. Analysing the
internet of things(IoT) related botnets and exploited IoT devices are also studied in
(Torabi et al., 2020) & (Safaei Pour et al., 2020). Denial of service attacks during an
election is studied by (Lutscher et al., 2019). Darknet traffic stabilities are studied in
(Vichaidis et al., 2018). These examples show the usage of honeypot and telescope
data to secure systems in different ways. Apart from the above mentioned usages,
Honeypots and Network telescopes play a major role in understanding and developing
cyber attack detection, prediction and forecasting models.
When I conducted the previous related work for predicting cyberattack rates, I figured
out that the development of introducing frameworks started from attack graphs - which
is considered as a primary level of predicting methodology. Currently, the researchers
have grown from that basic level to the machine learning level of frameworks and
provide many more useful details and accurate predictions to the defence systems to
mitigate and protect the networks from attacks before they exploit the systems. I have
conducted a survey related to cyber-attack detection, prediction and forecasting
methodologies. This survey explains the history and the development of methodologies
until the usage of machine learning and deep learning (Senanayaka, 2020).
In this paper, I only focus on the major statistical frameworks which were really helpful
to understand the honeypot data which I used as the resource for cyber attack data.
Before deep-diving into the existing frameworks, I would like to explain more technical
terms to understand the concepts clearly.
14
Definitions
This section describes some statistical preliminaries which will be useful in discussing
the concepts in this paper.
Stochastic Process
A stochastic(random) process is a statistical phenomenon consisting of a group of
random variables { X θ } indexed by a parameter θ , where θ belongs to some index set
Θ . In most cases, this Θ represents time (Zhang, 2021). Hence in the context of cyber
attack data, a stochastic (random) process is a statistical phenomenon consisting of a
collection of random variables ordered in time.
Poisson Distributions
If a statistical distribution is able to show the number of events that is likely to occur for
a given period of time, then this distribution is called a Poisson distribution. These
poisson distributions have independent events which occur with a constant rate over the
given time.
Stationarity Process
A stationary stochastic processes ensemble the same statistics for any value of time. A
time series is said to be stationary if there is no systematic change in mean/trend,
variance, and if it contains no strictly periodic variations. Honeypot cyber attack data are
considered to have stationary distribution.
15
Long Range Dependence (LRD)
A stationary time sequence which instantiates a stochastic cyber attack process, then
{X t : t ≥ 0 } is said to possess LRD if its autocorrelation function [2] for 0 < β < 1,
where h is called lag, is slowly varying for (Fang, 2018),
[1]
[2]
The degree of LRD is expressed using Hurst parameter ( H ) which relates to β in [2]
(Zhan et al., 2013)
β= H 2H [3]
[4]
where ε t is an independent and identical normal random variable with mean 0 and
variance
16
[5]
[6]
[7]
[8]
[9]
where .
17
value. The extreme values formulate a point process over a state space
χ = (0, t n ] × (u, ∞) as,
[10]
[11]
Where (x)+ = max{x, 0} , σ > 0 , μ , ξ are the scale, location and shape parameters
respectively (Peng et al., 2017).
[12]
Where ω , aj , bj ≥ 0 and p and q are positive numbers indicating the order of the
autoregressive terms(Peng et al., 2017).
[13]
In Log ACD the researchers set p = q = 1 , because the higher order does not increase
the prediction accuracy(Peng et al., 2017) (BauwensGiot, 2000) (Bauwens et al., 2004).
18
Definitions for Deep Learning Approaches
[14]
[15]
19
where W ax , W aa , W ya , ba , by are the coefficients that are shared temporally and g 1 ,
g 2 are the activation functions. The following diagram illustrates a RNN cell(AmidiAmidi,
2021).
20
Long - Short Term Memory Units (LSTM)
This is another RNN structure that is mainly used to fix the vanishing gradient issue that
exists on RNN. These LSTM consists of memory blocks that contain some memory
cells with self connections, and store the temporal states of the network states. Also
there are gates to control these memory states known as Input Gate, Forget Gate and
Output Gate. Forget gate decides which information to keep and remove about the state
of the previous cell (Fang, 2018).
Let Xm , Xm+1, ……, Xz are the observed data for attack rate Xm for m ≤ t ≤ z and Ym ,
Ym+1, ……, Yz are the predicted results from the model. Then the prediction error can
define as, et = Xt - Yt for m ≤ t ≤ z
[16]
21
Overall Underestimate error (UE) can be calculated as follows.
[17]
UE is useful when the defender is willing to over provision some defense resources to
mitigate incoming attacks.
Mean Square Error (MSE) is one of another metric to evaluate the accuracy of the
predictions in a model.A model with less MSE value is considered as the best model for
the predictions. MSE can be calculated as (Fang, 2018) ,
[18]
Related Work
Cyber attack detection, prediction and forecasting using data science is not really a new
topic and the history about that goes back a few decades ago. Researchers used
different approaches to do detection, prediction and forecasting using discrete models
such as graph theory, game theory and continuous models such as time series analysis,
grey models and machine learning approaches like neural networks, SVM etc (Husak et
al., 2019). The evolution of these models from attack graphs to deep neural networks
are surveyed in (Senanayaka, 2020) and it includes the advantages and disadvantages
of each type of approach.
As this paper mainly focused on building a prediction model for predicting cyberattack
rates before a given time, the following research outcomes played a major role in
22
understanding the statistical features of the cyber attack time series and laid a
foundation for the latest prediction models.
The statistical framework introduced by Zhenxin Zhan and his team consists of a novel
approach of a framework called Gray box which is implemented based on the stochastic
cyber attack process mathematical approach (Zhan et al., 2013). This model can predict
attack rates 1 hour ahead of time with an accuracy of 70.2 - 82.1%. The paramount
discovery of this framework is, this is the first research that found the existence of long
range dependence exhibits in honeypots captured cyber attack data. Also, this model
proved that the stochastic cyber-attack processes do not follow Poisson's distribution
and instead can have an LRD phenomenon. Also, they identified two possible causes
for existing LRD features in cyberattack data.
With keeping this in mind, this model can be instantiated on different resolutions such
as network-level using IP addresses, victim level using the service types and port level
using port addresses. This paper has calculated the existence of LRD in ,
● Network-level - 80%
● Victim level - 70%
● Port level - 44.5%
Therefore LDR is a must to consider when analysing and training models with honeypot
cyber attack data. In Addition to that, this framework has predictive power to predict
cyberattack rates ahead of time.
In technical terms, this framework consists of ARIMA and FARIMA models. In black-box
models(ARIMA), the user inputs the honeypot dataset to the black box framework and
23
trains and gets the prediction only using a mathematical model inside the framework. It
doesn't analyse the data based on the statistical properties.
Furthermore, when the statistical properties of the attacks are different, they should
analyse differently to get accurate results. Hence the prediction values taken from the
black box are less accurate and it is proven in this paper. To analyse the dataset with its
own statistical differences, this framework provides a gray box model where the different
time series analyses differently. To do that this model contains ARIMA and FARIMA
models. The definitions of these models can be found in the definition section. ARIMA
models are not capable of accommodating LRD features hence, this ARIMA model will
be used for the attack rates which do not exhibit LRD properties. The FARIMA models
are capable of handling LRD properties hence it will be considered as the LRD aware
model in this framework.
24
INPUT : Observed Attack Rates sequence for the given time period t {X 1 , ........, X t } , Number
of hours ahead we need to predict the attack h
PROCESS :
1. Repeat
2. Fit {X 1 , ........, X t } to obtain the finest model (M t ) from the Grey box. Time series with
LRD feature will pick FARIMA and time series without LRD feature will pick ARIMA
model for the prediction. Relevant p,q,d for the best fitting models will be selected using
AIC criterion(CryerKung-Sik, 2008)
3. Use M t to predict Y t+h The predicted cyber attack rates at t + h
4. X t+1 ← Newly observed attack rates at time t + 1 because this framework supports for
real time data
5. t ← t + 1 Observe more data when t evolves
6. Until no need to predict further cyber attack rates
Algorithm 1 : Pseudocode for predicting cyber attack rates using Gray Box
model(FARIMA+ARIMA)
INPUT : Observed Attack Rates sequence for the given time period t {X 1 , ........, X t } , Number
∈
of hours ahead we need to predict the attack h, p (0, 1) where p is the prediction starting
point
PROCESS :
1. t ←*
t p
2. While t ≤ (t h) do
3. Fit {X 1 , ........, X t } to obtain the finest model (M t ) from the Grey box. Time series
with LRD feature will pick FARIMA and time series without LRD feature will pick
ARIMA model for the prediction. Relevant p, q , d for the best fitting models will be
selected using AIC criterion(CryerKung-Sik, 2008)
4. Use M t to predict Y t+h . The predicted cyber attack rates at t + h
5. Computer prediction error et+h = X t+h
Y t+h In here Y t+h calculated from the
above algorithm
6. t ←t + 1 Observe more data when t evolves
7. end while
25
8. Compute PMAD, PMAD’ , OA, UA
9. return PMAD, PMAD’ , OA, UA
The dataset used in this research is from UCSD Network Telescope instrumentation
which captures hourly files of raw IPv4 packets of unsolicited traffic. This traffic contains
a wide range of events including misconfiguration, scanning of address space by
attackers or malware looking for vulnerable targets, backscatter from randomly spoofed
denial-of-service attacks, and the traffic from automated spread of malware(CAIDA,
2021).
The data files in CAIDA network telescopes come as pcap files. But in this paper, the
researchers have reassembled the pcap files as flows using COTS devices. These
devices are capable of extracting flows from UDP and TCP traffic. In the preprocessing
of data, they have disregarded the attacks against non production ports because such
connections are often dropped because with non production ports, attackers cannot
access the production servers. Also, flows without FIN, RST flags are also dropped if
the flow timeout exceeds 60s and flow lifetime exceeds 300s.
26
● Traditional time series model are much expensive and take more computational
power
● Analysing long term forecasts and heavy trailed processes give poor
performances and inaccurate results with ARIMA models (Zhai, 2005)
● ARIMA models tend to be unstable, both with respect to changes in observations
and model specification.
The same team conducted another research in 2015 to find out more statistical features
exhibited in the honeypot and telescope cyberattack data (Zhan et al., 2015). They
found that extreme value phenomena also exist on cyber attack data. According to that,
they have upgraded the above Gray box model to accommodate extreme values. In the
new framework, they integrated two complementary statistical approaches called
Extreme value theory (EVT) and time series theory (TST) to predict cyber attack rates in
an efficient way.
According to the model, EVT can offer long term predictions 24h ahead of time and gray
box TST model can predict attack rates 1h ahead of time with an accuracy of
86.0-87.9%. To accommodate extreme values in the TST model, they have introduced
the FARIMA + GARCH time series approach where GARCH can accommodate extreme
value phenomena and FARIMA can accommodate LRD features. The defender can use
EVT predictions while making the adjustments to the resources perfectly based on the
TST model prediction which will get 1h before. In this model, different GARCH models
such as SGARCH, IGARCH and skewed Student-T distribution (SSTD), skewed
Generalized Error Distribution (SGED) are used to facilitate different time series data
with different noises.
27
3. M3: GDP with time-invariant scale parameter σ but time dependent shape
parameter
The data with stationary extreme attack rates will use M1 and if M1 cannot fit well, then
the model will use non-stationary models M2,.., M4 to fit the rest of the non stationary
extreme attack values. Some standard goodness-of-fit statistics and QQ plots will be
used for evaluation of fitting the data to these models.
PROCESS :
1. Initialize quantileSet. Quantile set is an ordered set of quantiles where the maximum
quantile is chosen by assuring is has at least or more than 30 extreme values
2. For q ε quantileSet (minimum to maximum) do
3. Use the standard GPD to fit the extreme attack rates that are greater than the
threshold quantile q
4. Evaluate good-of-fit statistics CM, AD(ChoulakianStephens, 2001) and QQPlot
5. If fitting is good then
6. Estimate GPD parameters ( ξ , σ ), with extremal index θ
7. Return (q, ξ , σ , θ )
8. End if
9. End for
10. Return -1 when there is no stationary extreme value fitting
Algorithm 3 : Pseudocode for fitting stationary data to M1 in EVT model in updated Gray Box
model(FARIMA+GARCH+EVT)
28
INPUT : Attack rate time series which cannot fit M1 model
PROCESS :
1. Initialize quantileSet.
2. For q ε quantileSet (minimum to maximum) do
3. Use models M2, M3 and M4 to fit the extreme attack rates that are greater than
the threshold quantile q
4. Evaluate good-of-fit statistics via AIC (Akaike Information Criterion) and QQPlot
5. If any of the three models fits well then
6. Choose the model with the minimum AIC value or choose the simplest
model whose AIC value is closer to the minimum AIC value
11. Return (q,AIC value ) for the selected model of M2,M3,M4
12. End if
13. End for
14. Return -1 when there is no extreme value fitting
Algorithm 4 : Pseudocode for fitting stationary data to M2 ,M3, M4 in EVT model in updated Gray
Box model(FARIMA+GARCH+EVT)
The EVT model does not need a training phase like other machine learning processes
do. It predicts the next attack rates based on the prediction done with the quantile sets.
[19]
In here as well there are few models to use with different noises in the time series data.
29
The pseudocode for the TST model is as follows.
INPUT : Observed Attack Rates sequence for the given time period t {X 1 , ........, X t } , FARIMA
+ GARCH family (M5,M6,M7,M8), Number of hours ahead we need to predict the attack h , lag
value where 0 < l < 1
2.
3. While m + h ≤ n do
4. use {X 1,.............., X m } to obtain the best model( M i ) from the Gray box. Gray
box contains the FARIMA and GARCH models which together support LRD features and
extreme value theory.
5. Use M i to predict attack rates {X m+1 , ........, X m+h } The predicted cyber
attack rates at t + h
6. m = m+h
7. end while
8. evaluate PMAD values and AIC values for the predictions
9. end for
10. return M ε {M 5, M 6, M 7, M 8} with the smallest PMAD value.
Algorithm 5 : Pseudocode for predicting cyber attack rates updated Gray Box
model(FARIMA+GARCH+EVT)
Table 1 : TST based fitting of attack rates per hour using updated Gray Box model
Also the researchers have compared this new gray box model with Hidden markov
model and Symbolic dynamic models which are other approaches to predict the attack
30
rates. The analysis proved that the FARIMA+GARCH model gives more accurate
results than Hidden Markov model and Symbolic dynamics models with 1h ahead of
time prediction results.
Finally the researchers calculated prediction results using both EVT and TST
approaches and proved that the both approaches together can provide accurate results
in 24h prior to the attack rates and more precise attack predictions in 1 h ahead of time.
The following table includes all the details about their findings.
Table 2 : Comparison between EVT and TST predictions. In here H a b value means the predictions
correspond to the time intervals between the ath and the bth hour. Each period has three rows. The 1st row
indicates the prediction values from EVT based prediction and corresponding PMAD value in the 7th
column. The second row indicates the observed maximum attack rates and third row indicates maximum
attack rates predicted from the TST model with h = 1 and the corresponding PMAD value.
31
● Space complexity is also high as the model keeps another set of models to
support all GARCH models with different noises and EVT models for stationary
and non stationary data.
● According to table 3, the return levels of EVT based predictions of return level
are often higher than the observed maximum attack rates. But TST based
predictions of the maximum attack rates are often less than the observed
maximum attack rates.
With the influence of the above research, Peng and his research team has developed a
framework for predicting extreme cyber attack rates using only an EVT approach(Peng
et al., 2017). They introduced a marked point process approach framework to fit and
predict cyber attack rates. Not only that but also they introduced the existence of
correlations and interdependencies between extreme values which is an important
scenario to consider when predicting cyber attack rates. They found some drawbacks of
the new Gray model approach introduced by (Zhan et al., 2015) as follows.
- The new Gray box method uses the classic POT(Point Over Threshold) method
to model the magnitudes of the exceedances without considering the
dependencies between inter exceedance times(time intervals between extreme
values). Hence the accuracy of the model is low.
- Classical EVT considers the distribution of extreme values in the cyber attack
data as a poisson distribution although it is not.
- The GARCH model used in the Gray model is not theoretically proved to support
clustering behaviours of extreme values. Furthermore, EVT methods use
clustering the extreme values via quantiles.
In order to incorporate dependencies between inter exceedance with the classical POT
method, the researchers proposed a marked point process approach to facilitate the
32
modeling of magnitudes of the extreme values using the POT method and predicting
the arrival of extreme attack rates using Autocorrelation Conditional Duration (ACD) and
log ACD. ACDs effectively accommodate the slow decay of autocorrelation and burst of
extreme value clusters which GARCH models do not support theoretically. Also, they
can dynamically adjust quantile levels in order to predict extreme cyber attack rates
accurately. This ACD model consists of ground process and marked point distribution.
Value At Risk( VAR) has been used to measure cyber risk of intensive attacks. It can be
described as the probability of the severity of the extreme cyber attack rates and
according to these values the network defenders can allocate adequate resources to
mitigate the cyber attack.
The following diagram illustrates the flow of the Marked Point process framework.
33
Furthermore, this research used Honeypot data and Telescope data captured from the
CAIDA network (CAIDA, 2021). As in the previous frameworks, this also preprocessed
the telescope data according to (Claffy et al., 1995) and converted time series data to
flows and analysed them. They followed (Almotairi et al., 2008) for the preprocessing of
Honeypot data which I am going to use to introduce a deep learning framework for
cyber attack rate prediction in this paper.
[20]
︿
︿
Where m is the size if the input, y i and y i are the output values from the LSTM layers
and the observed values at step i, W and U are weight matrices. Also λ is the user
defined penalty parameter to minimize overfitting issues.
34
There are two main algorithms used in this model to facilitate extreme attack rates and
normal attack rates. Extreme attack rates are identified by fitting the values to another
algorithm.
INPUT : Historical cyber attack time series data { (t, y t ) | t = 1, ...., m }, iteration b = 10000 ,
penalty parameter λ = 0.001
PROCESS :
∈
1. for r {20, 30, 40} do
2. Divide the data set into mini batches of size r
3. ∈
for l {2, 3, 4, 5} do
4. Randomly initialize a l -layer BRNN-LSTM with parameter saved in Θ
5. j 0 ←
6. while j <= b do
7. Computer j according to the optimization equation by performing
forward propagation
8. Update Θ using the Adam optimizer
9. j j+1 ←
10. end while
11. for each data point at t do
︿
12. Compute y t by performing forward propagation
13. Fitted value ←
︿
yt
14. end for
15. return Fitted values for combination (r, l)
16. end for
17. end for
Algorithm 6 : Pseudocode for computing fitted values in BRNN-LSTM framework (Fang, 2018)
The following algorithm is used to make mini batches to pass to the training model in
order to increase the computing efficiency. After the completion of training, the model
makes predictions for the testing dataset and provides prediction results and values for
evaluation matrices as the output. Apart from the number of neurons per layer, the other
values are hard coded in the algorithm.
35
INPUT : Historical time series data with in-sample set
{ (t, y t ) | t = 1, ...., m } and out-of -sample set { (t, y t ) | s = m + 1, ...., n } iteration b = 10000 ,
penalty parameter λ = 0.001 , (r, l) calculated from the Algorithm 6.
PROCESS :
1. Split the in-sample set into mini-batches of size r
2. Randomly initialize a l -layer BRNN-LSTM, with all the parameters saved in Θ
3. j ←
0
4. while j < b iteration do
5. for each mini-batch from the in-sample set do
6. Compute J by performing forward propagation
7. Update Θ using the Adam optimizer
8. end for
9. j ←
j+1
10. end while
11. Predictions ←
null
12. for each data point, s, in the out-of-sample set do
︿
13. Compute y s by performing forward propagation
14. Predictions
︿
ys ←
15. end for
16. return Predicted value
Algorithm 7 : Pseudocode for predicting cyber attack rates in BRNN-LSTM framework (Fang,
2018)
This model is compared against ARIMA, ARIMA+GARCH and a hybrid model and it is
evaluated against MSE, MAD(Mean Absolute Deviation), PMAD and MAPE(Mean
Absolute Percentage Error) accuracy matrices. The results proved that the
BRNN-LSTM model provides high accuracy results with a minimal error rate.
36
The research problem and question
According to the gap analysis conducted for this field, different researchers contributed
different ways to predict cyberattack rates with high accuracy. Most of the latest
research is based on the previous ones and has fixed most of the drawbacks and gaps
available on those research models. Not only that by taking all the challenges and the
gap analysis conducted in the literature review it shows that current prediction
methodologies need to be updated with high accuracy prediction models to warn the
network defenders about cyber attacks in a few hours ahead. It will allow the defenders
to allocate adequate defence resources purposefully to manage the attacks and protect
the whole network(Peng et al., 2016). Not only that but also most of the statistical
features exhibited in the cyber attack time series data are not identified yet. Therefore,
developing a model to predict the next cyber attack with 100% accuracy is a tedious
task.
37
overprovision the defending resources for a particular attack, which is an
expensive task.
6. Most of the available models can predict cyberattacks based on honeypot or
telescope data only. They cannot predict cyber attack data based on both types
of datasets.
7. Feature selection should be conducted for machine learning models
8. Most of the models are tested only for the low interaction honeypot datasets.
Not only the above mentioned problems, but we also found that the difficulty of working
with statistical approaches for a person without proper knowledge of statistics. Hence
we want to minimize the statistical approach model development in this paper and
redirect the user to use machine learning approaches to develop models to predict
cyber attack rates. The main issue found with the machine learning approaches are, the
neural networks can have only a few layers to train the model and the proper features
that contribute to the cyber attack predictions need to be inputted into the network. As
the cyber attack patterns change every day, using the same set of features is not a
good solution. Also running feature selection every time is a costly task. As a solution,
deep neural networks can be used to generate models with multiple neural layers. Also
the model trains by itself, so the user does not need to input the features into the model.
Aside from that, deep learning models can accommodate statistical features as well.
Deep learning models are easy to build and train with less computational power and
less code base.
38
Aims and Objectives
This proposed research is planned to implement based on the research carried out by
Fang (Fang, 2018) to build a deep neural network approach to predict the cyberattack
rates. This proposed solution will contribute to the following gaps in the current domain.
● Increase the accuracy level of the deep neural network by increasing the
performance of the network to identify long-range and short-range of
dependence and high nonlinearity of the data.
● Fine tune the model by introducing efficient hyperparameters to get high
accuracy rates
● Introducing new approaches in deep learning to predict cyber attack rates
39
different researchers in order to predict the cyber attacks ahead of a given time prior to
the attack. Accuracy of the predictions is the most crucial metric when designing a
prediction model. The models discussed in the literature review section explains the
development of the prediction models from statistical approach to deep learning
approach. The researchers are trying to develop hybrid models and new deep learning
models like timeGAN(Yoon et al., 2019) to work with time series data. But cyber attack
rates are not just a time series data and there are only a few important features
identified in cyber attack data in the current research world. Hence adapting to new
deep learning models will take more research time.
Despite that fact, this paper tries to improve the accuracy of the current deep learning
models - Bidirectional Recurrent Neural Network with LSTM by introducing Gated
Recurrent Unit (GRU) cells and tuning the model with many more hyperparameters in
order to increase the accuracy.
Definitions
GRU only consists of 2 gates as an Update Gate and a Reset Gate while LSTM
consists of 3 gates called Update gate, Reset Gate and Forget Gate. The following
diagram illustrates a simple GRU cell.
40
Figure 7 : Internal structure of GRU(Yang et al., 2020)
Research has been conducted by (Yang et al., 2020) to compare the performance of
LSTM and GRU on a text based prediction.
According to the above research conclusion, the GRU is not suitable for analysing a
very large dataset with many dependency features although it uses less computational
power and provides results in less time compared to LSTM.
But in our scenario, in predicting cyber attack rates we can use the GRU instead of
LSTM in the deep learning network as there are not many dependencies among
features of the time series data.
41
Adaptive Moment Estimation (Adam) optimizer has an adaptive learning rate that learns
during the training of the network. These types of optimizers are better to use than static
optimizers because the models learn and increase the accuracy with a number of
training cycles. At the same time, the loss values should be decreased and adaptive
with the number of training cycles. Adam optimizer works on first and second order
momentums. It stores the exponentially decaying average of past squared gradients
and the exponentially decaying average of past gradients. Therefore the Adam
optimizer is fast and converges very rapidly. Also, it rectifies vanishing learning rates
and high variance (Doshi, 2019).
Although Adam optimizer is the suitable optimiser for our scenario, there are a lot more
optimizers that can be used in different neural networks. A detailed comparison is
available in (Doshi, 2019).
Hyperparameters
The variables which determine the structure of the neural network are called
Hyperparameters. The model accuracy and the performances can be changed
according to different hyper parameter values. Weight, activation function, epochs,
learning rates and optimizers are some of the common hyperparameters defined in
neural networks. Hyperparameter tuning can help to increase the performance,
accuracy and reduce the error rates. Grid search, random search or Bayesian
optimization are common methodologies to optimize hyperparameters in deep neural
networks (LoshchilovHutter, 2016). More information about hyperparameter tuning can
be found from (LoshchilovHutter, 2016), (ReimersGurevych, 2017) and (Tensorflow,
2021)
42
Proposed Algorithm
PROCESS :
1. Split the dataset into training, validation and testing data
2. Create small data frames (j) for training dataset inorder to minimize overfitting of the
model by passing start index, end index and no.of rows per file
3. for Each mini-batch/data frame (j) do
4. Make each data frame compatible to pass GRU with specific shape
5. Shuffle the indices to get new order of files in the next iteration
6. Return GRU compatible dataframes
7. end for
8. Create small data frames (i) for testing dataset inorder to minimize overfitting of the
model by passing start index, end index and no.of rows per file
9. for Each mini-batch/data frame (i) do
10. Make each data frame compatible to pass GRU with specific shape
11. Return GRU compatible dataframes
12. end for
13. Create the model with Bi directional RNN with GRU, set dynamically changeable
hyperparameters
14. k ← 0
15. while k < b iteration do
16. for Each mini-batch from training sample set (j) do
17. Compute k by performing forward and backward propagation
18. Update model using the Adam optimizer
19. end for
20. k ← k+1
21. end while
22. Predictions ← 0
23. for Each mini-batch in the testing samples set (i) do
︿
24. Compute y by performing forward and backward propagation using the trained
model
25. Predictions ←︿
y
26. end for
27. return predicted value, MSE
Algorithm 8 : Pseudocode for predicting cyber attack rates using BRNN-GRU model
43
Methodology
As discussed in the literature review, deep learning prediction models are more
commonly used in finance, natural language processing and many more fields. But
when I was doing the literature review I found that there is a lack of research done for
cyber attack rate prediction using deep learning models. Not only for the attack rate
prediction, the categorization and other related cyber attack predictions are still at the
research level.
At present, there are only a few approaches available in deep neural network models to
analysing time series data. Recurrent neural networks are one example. In future, there
will be more network types available for time series analysis and will support much more
features exhibited in the time series data.
One of the major frameworks called BRNN-LSTM model (Fang et al., 2019) is a
prominent framework that uses deep learning models to accommodate statistical
features exhibited in the cyber attack time series. Also, it combines statistical features
with artificial intelligence theories to increase performance and its prediction accuracy.
The BRNN-LSTM framework is developed in Python and runs on the CUDA
platform(Fang, 2018). This BRNN-LSTM model will be used in this paper for
comparison purposes.
This model uses most of the latest techniques in deep neural networks to build the
prediction model. As there are not many approaches to work with time series data, the
only thing we can do so is using the same approach of BRNN-LSTM and tuning it with
different hyper parameters. The model development section includes the ways to
improve the current BRNN-LSTM framework and explains the use of other
hyperparameters to increase performance and accuracy.
44
Data acquisition
Most of the major frameworks we discussed in the literature review have used CAIDA
real time network telescope dataset(CAIDA, 2021), which is currently restricted access
for the public. Hence for the training and the evaluation of BRNN-GRU we use the
Kyoto University’s Honeypot dataset (http://www.takakura.com/Kyoto_data/). This
dataset contains time series data from 01-11-2006 to 31-12-2015.
This dataset consists of 24 main statistical features. 14 of them were extracted from the
raw traffic of the honeypot systems deployed in Kyoto university networks. These
conventional features are as follows.
45
This data was collected from Nepenthes, a low interaction honeypot service deployed
on Kyoto University networks. More details about the features and the data about the
Kyoto dataset can be found from (SONG et al., 2021).
Data preprocessing
The dataset consists of daily network connection details in a .txt file per day from
01-11-2006 to 31-12-2015. Therefore this model does not need to convert the
connections to flows as the other frameworks did. The CAIDA telescope datasets come
as .pcap files and for the analysis, they need to convert to flows.
This dataset is already preprocessed and removed irrelevant details by the research
project done by (Song et al., 2011). We assume that they followed (Claffy et al., 1995)
methodologies to preprocess the honeypot data, as (Claffy et al., 1995) is considered as
a major reference for preprocessing honeypot time series data. According to (SONG et
al., 2021) they have used the Bro 2.4 tool (The Bro Project, 2016) to convert traffic data
to session data. Bro is an open source and powerful network analysis tool that can be
used for network security monitoring. It is capable of monitoring traffic in very high
performance environments(Cloud, 2019).
After completing the preprocessing step, we had to create a separate file including
DateTime and the attack rates. We used a simple python function to archive that. The
final input file looks like the below example.
Before passing the input data to the model, the input file is further pre-processed to
remove null values from the dataset.
46
Date Time Attacks
Selection of Technologies
We used Google Colaboratory, which is a free online cloud based Jupyter (Python)
notebook environment that provides features for creating and training machine learning
and deep learning models using CPUs, GPUs and TPUs. It supports Tensorflow and
many more python based machine learning libraries such as Keras. The documentation
and the support for Google Colaboratory, tensorflow and Keras are immense. Due to
the cloud based feature of the Google Colaboratory, the researchers do not need to use
their local computational power and they are able to use TPUs with high processing
powers for free.
GPUs are organized around programmable cores, while TPUs use a classic vector
processor with a dedicated matrix multiply unit and excel the matrix according to the
relevant task. Hence TPU matix is an adaptable one.
When the user requests Cloud TPU v2 in Google Colaboratory, the user will get a virtual
machine that has a PCI attached TPU board. This TPU board has 4 dual-core TPU
chips. Each TPU core has a Vector Processing Unit and 128 x128 Matrix multiple Units
which can provide much more processing power than other hardware accelerators.
47
Tensorflow
Tensorflow is a very popular, end-to-end open source platform for Machine learning
tasks. Tensorflow contains most of the machine learning libraries, tools and community
support which are very helpful for machine learning researchers. Google Colaboratory
has its inbuilt tensorflow support. Tensorflow can ,
Keras
Keras is a high level, python based, machine learning API working on top of the
Tensorflow-2 platform. It provides essential abstractions and building blocks for
developing machine learning and deep learning models and ships them with high
interaction velocities(Team, 2021).
Due to the abstractions and easy to use APIs, the researchers do not want to spend
much more attention and time to develop the layers from scratch. The researchers can
build their models in 3 ways using Keras on Tensorflow2. They can use sequential
models, functional APIs or model subclassing approaches.
Model Development
As discussed in the previous chapters, the model introduced in this paper - BRNN-GRU
framework for cyber attack prediction is built on Tensorflow using the Keras library.
For evaluation purposes, we have built the basic RNN-LSTM model and BRNN-LSTM
model as well. The original BRNN-LSTM which was introduced by (Fang, 2018) has
been developed using the Python NumPy library. The source code for Algorithm 6 is
available on Fang’s github repository although there is no source code available for
Algorithm 5.
48
(https://github.com/xingfang912/time-series-analysis/blob/master/time_series_rolling.
py). We made some assumptions when transforming the designed model from NumPy
to Keras. Also, some of the hyperparameters were hardcoded in the original codebase.
It might be because of tuning the models to support the CAIDA dataset efficiently. Apart
from the number of RNN layers, the other hardcoded hyperparameters mentioned in the
github repository is used to develop the model on Keras. The assumptions are
commented on the source code on the Google Colabrary project.
The RNN-LSTM model is a basic LSTM RNN sequential model which has an LSTM
layer and default hyperparameters.
The BTNN-LSTM is taken as the basement for the new prediction model presented in
this paper. Hence the new model consists of the following features.
The pseudo code for the algorithm is defined in the proposed Algorithm section.
49
The developed model can be found from the following URL.
https://colab.research.google.com/drive/1HAbhZOYi7ePRDE9pKZXODRLgN_M6VWrk?
usp=sharing
Model Evaluation
Model evaluation is done using the Mean Squared Error(MSE) calculated for the
trained model with validation dataset. The model with less MSE is considered the best
model.
These values can be changed according to the time series data streams we pass for the
model and the availability of the features. The dataset we use here - The Kyoto dataset
might not contain extreme values or LRD features. Hence the model can give different
predictions and accuracy levels for a different dataset.
Fang and his team had compared the evaluation of BRNN-LSTM models with Gray
model, FARIMA+GARCH models, Hidden Markov models and marked point process
model which are the currently available research for predicting cyber attack rates. They
concluded that the BRNN-LSTM framework has higher accuracy, performance and
lower MSE values when compared to the other models. By keeping that in mind, here
we use only the RNN-LSTM and BRNN-LSTM models for the comparison of the
BRNN-GRU model. The MSE values for these models are as follows.
Model MSE
RNN-LSTM 0.40692530679349045
BRNN-LSTM 0.39681923587779341
BRNN-GRU 0.37891233569034562
Table 5 : Calculated MSE values for the BRNN-GRU network
50
According to the above outputs, the BRNN-GRU model can provide much more
accurate predictions than the other models. Moreover, the BRNN-LSTM with Keras
provides more accurate results than basic RNN-LSTM models.
In the BRNN-GRU model, we replaced most of the hyperparameters and LSTM layers
with GRU layers. The reason for the low MSE can be either hyperparameter tuning or
usage of GRUs. However, GRU is not very good for complex and large datasets with
high dependencies. Research found that a GRU can provide much more accurate
results with less computational power for small datasets. As we pass the large datasets
as small chunks of data frames to the BRNN-GRU model, GRU should perform better
than LSTM.
51
Conclusions
We proposed a BRNN-GRU framework for predicting cyberattack rates ahead of a
given time prior to the attack. This framework can accommodate complex phenomena
exhibited by cyber attack rate datasets, such as long-range dependence, highly
nonlinearity and extreme values. This Framework is designed to train with less
computational power, high accuracy, minimized loss and error values. This framework is
analysed against Kyoto university’s Honeypot dataset.
Moreover, the research attention for the cyber attack prediction using deep learning has
not got much more attention like other industrial fields. Fewer models available in deep
learning for time series analysis and less knowledge of statistical features available in
cyberattack time series make the researchers pay less attention to the cyber attack
predictions.
According to the literature review conducted, we have already proved that deep learning
models for cyber attack rate predictions are faster, use less computational power and
provide high prediction accuracy with minimal error values. Deep learning models can
skip the feature selection process in the machine learning models. The deep learning
models can learn and train by themselves. Hence it can identify the actual features of
the dataset, can learn with different sequences and minimise human errors. Apart from
that, with the currently available models in deep learning, only the RNN can provide
time series analysis which is suitable for cyber attack rate prediction. Therefore the only
enhancements we can do for the cyber attack prediction models is to keep RNN as the
base for the model and tune it with the best hyperparameter values. Also, we can
minimise the loss and optimize the model with different approaches like introducing mini
batches, backpropagation and directly working with RNN cells etc.
Although most of the researchers use RNN for the time series analysis, there are few
ongoing researchers for the time analysis using deep learning frameworks. One of the
approaches is timeGAN(Yoon et al., 2019) which uses Generative Adversarial Networks
for the implementation. Recently, most of the researchers pay much attention to
52
introducing time series predictions using Generative Adversarial Networks where two
models are trained simultaneously by an adversarial process. However, according to
(Alberto Carrillo Romero, 2018) findings on his time series analysis research, the best
performers were the Shallow LSTM with an accuracy of 74.16% and the GAN 72.68 %
and the Deep LSTM 62.85% followed by ARIMA 59.57%(Alberto Carrillo Romero,
2018).
Hybrid model development in deep learning for time series analysis is another type of
ongoing research. CNN-LSTM is one of the examples where Convolution Neural
Networks are combined with Recurrent Neural Networks.
Introducing the Attention mechanism for time series data analysis also conducted by
different researchers. Attention mechanism provides accurate predictions also in
presence of long input sequences according to (Del Pra, 2020) and (Rémy, 2021).
53
the future, the researchers will use these new research findings to improve
cybersecurity predictions.
To conclude the paper, although many solutions have been introduced in the
cybersecurity prediction, detection and forecasting fields, still, there is no definite model
for predicting all of these with 100% accuracy. Also, the newly designed time series
prediction models should be tested with cyberattack rates time series as they comprise
different statistical features. As technology evolves, the complexity of cyberspace and
attackers’ patterns also evolve. Hence this research area will always emerge with new
findings and will be open as an important research topic.
54
References
Alberto Carrillo Romero, R. (2018) Generative Adversarial Network for Stock Market price Prediction.
[Online]. Available at: https://cs230.stanford.edu/projects_fall_2019/reports/26259829.pdf [Accessed: 12
March 2021].
Almotairi, S., Clark, A., Mohay, G. and Zimmermann, J. (2008) Characterization of Attackers' Activities in
Honeypot Traffic Using Principal Component Analysis. 2008 IFIP International Conference on Network
and Parallel Computing, p.147-154. IEEE [Online]. Available at: doi:10.1109/npc.2008.82 [Accessed: 23
January 2021].
Amidi, A. and Amidi, S. (2021) CS 230 - Recurrent Neural Networks Cheatsheet. Stanford.edu. [Online].
Available at:
https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks#overview
[Accessed: 28 March 2021].
Amini, A. and Soleimany, A. (2020) MIT Deep Learning 6.S191. MIT Deep Learning 6.S191. [Online].
Available at: http://introtodeeplearning.com/ [Accessed: 20 May 2021].
Bauwens, L. and Giot, P. (2000) The Logarithmic ACD Model: An Application to the Bid-Ask Quote
Process of Three NYSE Stocks. Annales d'Économie et de Statistique, (60), p.117. JSTOR [Online].
Available at: doi:10.2307/20076257 [Accessed:18 February 2021].
Bauwens, L., Giot, P., Grammig, J. and Veredas, D. (2004) A Comparison of Financial Duration Models
via Density Forecasts. INTERNATIONAL JOURNAL OF FORECASTING, 20 (4), p.589–609. [Online].
Available at: http://hdl.handle.net/1854/LU-8649410 [Accessed: 9 April 2021].
CAIDA(2021) Historical and Near-Real-Time UCSD Network Telescope Traffic Dataset. [Online].
Available at: https://www.caida.org/catalog/datasets/telescope-near-real-time_dataset/ [Accessed: 4 April
2021].
Choulakian, V. and Stephens, M. (2001) Goodness-of-Fit Tests for the Generalized Pareto Distribution.
Technometrics, 43 (4), p.478-484. [Accessed: 19 December 2020].
Claffy, K., Braun, H. and Polyzos, G. (1995) A Parameterizable Methodology for Internet Traffic Flow
Profiling. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, 13 (8), p.1481-1494.
[Accessed: 13 February 2021].
Cloud, A. (2019) How to Install Bro IDS on Ubuntu 16.04. Medium. [Online]. Available at:
https://alibaba-cloud.medium.com/how-to-install-bro-ids-on-ubuntu-16-04-ce81d759ce1c [Accessed: 22
November 2020].
Cryer, J. and Kung-Sik, C. (2008) Time Series Analysis With Applications in R. Springer Texts in
Statistics. [Accessed: 14 March 2021].
Del Pra, M. (2020) Time Series Forecasting with Deep Learning and Attention Mechanism. Medium.
[Online]. Available at:
55
https://towardsdatascience.com/time-series-forecasting-with-deep-learning-and-attention-mechanism-2d0
01fc871fc [Accessed:27 April 2021].
Doshi, S. (2019) Various Optimization Algorithms For Training Neural Network. Medium. [Online].
Available at: https://towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6
[Accessed: 25 January 2021].
Fang, X., Xu, M., Xu, S. and Zhao, P. (2019) A deep learning framework for predicting cyber attacks
rates. EURASIP Journal on Information Security, 2019 (1), p.1-11. Springer Science and Business Media
LLC [Online]. Available at: doi:10.1186/s13635-019-0090-6 [Accessed: 25 April 2021].
Görner, M. (2021) Keras and modern convnets, on TPUs | Google Codelabs. Google Codelabs. [Online].
Available at: https://codelabs.developers.google.com/codelabs/keras-flowers-tpu#0 [Accessed:15 May
2021].
Huang, C., Han, J., Zhang, X. and Liu, J. (2019) Automatic Identification of Honeypot Server Using
Machine Learning Techniques. Security and Communication Networks, 2019, p.1-8. Hindawi Limited
[Online]. Available at: doi:10.1155/2019/2627608 [Accessed: 31 March 2021].
Hunter, S., Irwin, B. and Stalmans, E. (2013) Real-time distributed malicious traffic monitoring for
honeypots and network telescopes. 2013 Information Security for South Africa. IEEE [Online]. Available
at: doi:10.1109/issa.2013.6641050 [Accessed: 5 December 2020].
Husak, M., Komarkova, J., Bou-Harb, E. and Celeda, P. (2019) Survey of Attack Projection, Prediction,
and Forecasting in Cyber Security. IEEE Communications Surveys & Tutorials, 21 (1), p.640-660. Institute
of Electrical and Electronics Engineers (IEEE) [Online]. Available at: doi:10.1109/comst.2018.2871866
[Accessed: March 17 2021].
J. Shimeall, T. and M. Spring, J. (2014) Introduction to Information Security. Elsevier [Online]. Available
at: doi:10.1016/c2011-0-00135-7 [Accessed: 6 December 2020].
J. Triebe, O., Laptev, N. and Rajagopal, R. (2019) AR-Net: A simple Auto-Regressive Neural Network for
time-series. [Online]. Available at: https://arxiv.org/abs/1911.12436 [Accessed: 30 January 2021].
Justin, L. (2020) 3 Steps to Time Series Forecasting: LSTM with TensorFlow Keras - Just into Data. Just
into Data. [Online]. Available at:
https://www.justintodata.com/forecast-time-series-lstm-with-tensorflow-keras/ [Accessed: 1 May 2021].
Kotenko, I. and Chechulin, A. (2013) A Cyber Attack Modeling and Impact Assessment Framework. In:
5th International Conference on Cyber Conflict. Tallinn: NATO CCD COE Publications.. [Accessed: 21
January 2021].
56
Li, Y., Shi, L. and Feng, H. (2019) A Game-Theoretic Analysis for Distributed Honeypots. Future Internet,
11 (3), p.65. MDPI AG [Online]. Available at: doi:10.3390/fi11030065 [Accessed: 30 December 2020].
Loshchilov, I. and Hutter, F. (2016) CMA-ES For Hyper parameter Optimisation Of Deep Neural Networks.
ICLR 2016. [Accessed: 23 March 2021].
Luk, K. (2019) Downloading Datasets into Google Drive via Google Colab. Medium. [Online]. Available at:
https://towardsdatascience.com/downloading-datasets-into-google-drive-via-google-colab-bcb1b30b0166
[Accessed: 13 April 2021].
Lutscher, P., Weidmann, N., Roberts, M., Jonker, M., King, A. and Dainotti, A. (2019) At Home and
Abroad: The Use of Denial-of-service Attacks during Elections in Nondemocratic Regimes. Journal of
Conflict Resolution, 64 (2-3), p.373-401. SAGE Publications [Online]. Available at:
doi:10.1177/0022002719861676 [Accessed: 18 May 2021].
Meharchandani, D. (2021) 10 Major Cyber Attacks Witnessed Globally in Q1 2021 - Security Boulevard.
Security Boulevard. [Online]. Available at:
https://securityboulevard.com/2021/04/10-major-cyber-attacks-witnessed-globally-in-q1-2021/ [Accessed:
3 June 2021].
Milkovich, D. (2019) 15 Alarming Cyber Security Facts and Stats | Cybint. Cybint. [Online]. Available at:
https://www.cybintsolutions.com/cyber-security-facts-stats/ [Accessed: 4 March 2021].
Muncaster, P. (2020) Cyber-Attacks Up 37% Over Past Month as #COVID19 Bites. Infosecurity
Magazine. [Online]. Available at:
https://www.infosecurity-magazine.com/news/cyberattacks-up-37-over-past-month/ [Accessed: 22 May
2021]
Peng, C., Hu, T., Xu, S. and Xu, M. (2017) Modeling and predicting extreme cyber attack rates via marked
point processes. JOURNAL OF APPLIED STATISTICS, 44 (14), p.2534–2563. [Online]. Available at:
doi:http://dx.doi.org/10.1080/02664763.2016.1257590 [Accessed: 3 February 2021].
Reimers, N. and Gurevych, I. (2017) Optimal Hyperparameters for Deep LSTM-Networks for Sequence
Labeling Tasks. [Online]. Available at: https://arxiv.org/pdf/1707.06799.pdf [Accessed: 29 November
2020].
Safaei Pour, M., Mangino, A., Friday, K., Rathbun, M., Bou-Harb, E., Iqbal, F., Samtani, S., Crichigno, J.
and Ghani, N. (2020) On data-driven curation, learning, and analysis for inferring evolving
internet-of-Things (IoT) botnets in the wild. Computers & Security, 91, p.101707. Elsevier BV [Online].
Available at: doi:10.1016/j.cose.2019.101707 [Accessed: 16 February 2021].
Senanayaka, L. (2020) Survey on Cyber Attack Detection, Prediction and Forecasting methodologies.
[Online]. Available at:
https://docs.google.com/document/d/1j5aFTOgxu2fZC5IWyGlPDM53TUYxgS0Fm6PgKKwNGes/edit?usp
=sharing [Accessed: 30 May 2021].
57
Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D. and Nakao, K. (2011) Statistical Analysis of
Honeypot Data and Building of Kyoto 2006+ Dataset for NIDS Evaluation. BADGERS ’11 April 10-13,
2011, Salzburg., p.29-36. [Accessed: 20 May 2021].
SONG, J., Okabe, Y. and Takakura, H. (2021) Description of Kyoto University Benchmark Data.
[Accessed: 29 May 2021].
Taylor, S. and Letham, B. (2017) Forecasting at scale. PeerJ [Online]. Available at:
doi:10.7287/peerj.preprints.3190v2 [Accessed:23 February 2021].
Team, K. (2021) Keras documentation: About Keras. Keras.io. [Online]. Available at:
https://keras.io/about/ [Accessed: 16 April 2021].
Tensorflow, T. (2021) Introduction to the Keras Tuner | TensorFlow Core. TensorFlow. [Online]. Available
at: https://www.tensorflow.org/tutorials/keras/keras_tuner [Accessed: 6 May 2021].
TensorFlow, T. (2021) The Sequential model | TensorFlow Core. TensorFlow. [Online]. Available at:
https://www.tensorflow.org/guide/keras/sequential_model [Accessed: 27 April 2021].
TensorFlow, T. (2021) tf.keras.layers.GRU | TensorFlow Core v2.5.0. TensorFlow. [Online]. Available at:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU [Accessed: 29 April 2021].
The Bro Project, T. (2016) Bro Documentation Release 2.4.1. [Online]. Available at:
http://www.ncsa.illinois.edu/People/jazoff/bro-2.4.1.pdf [Accessed: 25 March 2021].
Torabi, S., Bou-Harb, E., Assi, C. and Debbabi, M. (2020) A Scalable Platform for Enabling the Forensic
Investigation of Exploited IoT Devices and Their Generated Unsolicited Activities. Forensic Science
International: Digital Investigation, 32, p.300922. Elsevier BV [Online]. Available at:
doi:10.1016/j.fsidi.2020.300922 [Accessed: 5 May 2021].
Triebe, O. (2020) Neural Prophet Extendable and Scalable Forecasting. In: 40th International Symposium
on Forecasting, 2020. [Online]. Available at:
https://github.com/ourownstory/neural_prophet/blob/master/notes/Presented_at_International_Symposiu
m_on_Forecasting.pdf [Accessed: 15 January 2021].
Vichaidis, N., Tsunoda, H. and Keeni, G. (2018) Analyzing darknet TCP traffic stability at different
timescales. 2018 International Conference on Information Networking (ICOIN). IEEE [Online]. Available
at: doi:10.1109/icoin.2018.8343098 [Accessed: 18 April 2021]
Yang, S., Yu, X. and Zhou, Y. (2020) LSTM and GRU Neural Network Performance Comparison Study:
Taking Yelp Review Dataset as an Example. 2020 International Workshop on Electronic Communication
and Artificial Intelligence (IWECAI). IEEE [Online]. Available at: doi:10.1109/iwecai50956.2020.00027
[Accessed: 3 January 2021].
Yoon, J., Jarrett, D. and van der Schaar, M. (2019) Time-series Generative Adversarial Networks. 33rd
Conference on Neural Information Processing Systems (NeurIPS 2019),. [Accessed: 14 November 2020].
58
Zhai, Y. (2005) TIME SERIES FORECASTING COMPETITION AMONG THREE SOPHISTICATED
PARADIGMS. [Accessed:6 February 2021].
Zhan, Z., Xu, M. and Xu, S. (2013) Characterizing Honeypot-Captured Cyber Attacks: Statistical
Framework and Case Study. IEEE Transactions on Information Forensics and Security, 8 (11),
p.1775-1789. Institute of Electrical and Electronics Engineers (IEEE) [Online]. Available at:
doi:10.1109/tifs.2013.2279800 [Accessed: 26 February 2021].
Zhan, Z., Xu, M. and Xu, S. (2015) Predicting Cyber Attack Rates With Extreme Values. IEEE
Transactions on Information Forensics and Security, 10 (8), p.1666-1677. Institute of Electrical and
Electronics Engineers (IEEE) [Online]. Available at: doi:10.1109/tifs.2015.2422261[Accessed: 28
February 2021].
Zhang, J. (2021) MA636: Introduction to stochastic processes. Kent.ac.uk. [Online]. Available at:
https://www.kent.ac.uk/smsas/personal/lb209/files/notes1.pdf [Accessed: 22 April 2021].
Word Count
Word count excluding quotations, references, titles for figures and tables,
acknowledgements and appendices is 11387.
59
Appendices
Appendix 1
Appendix 1: Prediction error analysis for the predictions fone by ARIMA and FARIMA models where p =
0.5 in Gray model
60