0% found this document useful (0 votes)

76 views60 pages

MSC Dissertation - Sample 5

This dissertation presents a deep learning framework for predicting cyber attack rates, specifically utilizing a Bi-Directional Recurrent Neural Network with Gated Recurrent Units (BRNN-GRU) to enhance prediction accuracy and minimize error rates. The work emphasizes the growing threat of cyber attacks and the need for updated defensive mechanisms in response to evolving technologies. The research aims to provide network defenders with tools to proactively allocate resources and mitigate the impact of potential attacks.

Uploaded by

Dulana Sasruka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views60 pages

MSC Dissertation - Sample 5

Uploaded by

Dulana Sasruka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Deep Learning Framework For

Cyber Attack Rate Prediction

Submitted on June 2021, in part fulfillment of the conditions to the award of
the degree of Msc in Cyber Security

Dissertation Supervisor: Dr.

Student:
(Student ID: 21 )

SCHOOL OF COMPUTING AND ENGINEERING

University of West London

I hereby declare that this dissertation is all my own work, except as

indicated in the text:

Signature: Lakini Senanayaka

Date: 06-06-2021
I would like to dedicate this dissertation to my mom for her immense support and love.

2
Declaration
I hereby declare that except where specific reference is made to the work of others, the
contents of this dissertation are original and have not been submitted in whole or in part
for consideration for any other degree or qualification in this, or any other university.
This dissertation is my own work, except as specified in the text.

The copyright of this dissertation belongs to the author under the terms of the UK Copy-
right acts as amended by The University of West London regulations. Due
acknowledgement must always be made of the use of any materials contained in, or
derived from this thesis.

Lakini Senanayaka

3
Acknowledgement
As an international student, It has been an interesting time and a great opportunity to
study for the MSc in Cyber Security at the University of West London. During my period
of studying, I learnt a lot of new technologies, met a vast variety of interesting people
who showed me the meaning of my life and truly believed in me and encouraged me to
be the best.

Dr Wei Jie is one of the interesting people I met during my studies at my university and I
am extremely grateful to have him as my supervisor. I am really grateful for his
continuous support, invaluable advice and his wonderful energy which made me work
energetically on my master's thesis during the COVID pandemic season. Also, his
immense knowledge and experience in the field encouraged me to do great research.
Furthermore, this thesis is a combination of all the materials I learnt during my Master's
studies. Therefore, I also want to thank all the academic staff including all the lecturers
at the University of West London for the academic years of 2019/2020 and 2020/2021.

Not only that but also, I received amazing support and guidance from all my colleagues
at VeeLoop, especially my dearest CEO of the company, Randa Bennet. She always
encouraged me to pay more attention to my studies and helped me to balance studies
and work life. I am really lucky to work with her.

Finally, I am grateful to express my gratitude to my parents, my little sister and my

grandmother, for their tremendous understanding and great support for all these years.
Without them, it would be impossible for me to complete my studies in the UK.

It is all of their kind help, encouragement and support that have made my study and life
in the UK a wonderful one.

4
Abstract

At present, “Cyber Attacks” receive a great amount of attention from the community due
to its highly destructive behaviors which can cause massive damages to the networks
and illegal access to the sensitive information. Attackers attack different systems due to
political or monetary gain. With the vast development of the technology, cyber attacks
including malware, phishing, denial of service attacks are getting more common and
hard to exploit from the cyber defence systems deployed on the networks. Therefore,
for the purpose of protecting the companies/systems from cyber attacks and threats, the
defensive mechanism of the systems also should be updated with the evolution of new
technologies. Some of the major cyber attacks in 2021 can be summarized as follows.
CNA Financials, which is the biggest cyber insurance firm in the USA was attacked by
ransomware and it disrupted the service for 3 continuous days. The recent cyber attack
on the Florida Water System also tried to poison the water supply by increasing its
chemical concentrations. Similarly, Microsoft’s Exchange Server which is used by most
of the government bodies and private companies was exploited by attackers and
caused a global, major effect (Meharchandani, 2021).

This paper introduces a new approach of prediction framework for cyber attack rate
using deep learning algorithms called BRNN-GRU aka Bi-Directional Recurrent Neural
Network with Gated Recurrent Units. This framework is optimised and fine-tuned for
high accuracy predictions with minimised error rates. The prediction capability of this
framework can guide the network defenders to proactively allocate their resources in a
cost-effective manner to reduce the severity damages of the incoming attacks or totally
stop them in the defender levels. Also, this paper directs the readers to the other
potential approaches to build deep learning models with elevated accuracy rates, such
as the usage of Generative Adversarial Networks. Furthermore, the BRNN-GRU
framework has early warning prediction capabilities.

5
Table of Contents

Acknowledgement 4

Abstract 5

Table of Contents 6

List of figures 8

List of Tables 8

List of Algorithms 8

Introduction 10

Literature Review 12
Deception Technologies 12
HoneyPots and HoneyNets 12
Network Telescopes 13
Definitions 15
Definitions for Statistical Approaches 15
Stochastic Process 15
Poisson Distributions 15
Stationarity Process 15
Long Range Dependence (LRD) 16
Autoregressive Moving Average Model (ARMA) 16
Fractional AutoRegressive Integrated Moving Average Model (FARIMA) 16
Generalized AutoRegressive Conditional Heteroskedasticity Model (GARCH) 17
Extreme Value Phenomena 17
Standard Autoregressive Conditional Duration (Standard ACD) 18
Log Autoregressive Conditional Duration (Standard ACD) 18
Definitions for Deep Learning Approaches 19
Recurrent Neural Network (RNN) 19
Bi Directional Recurrent Neural Network (BRNN) 20
Long - Short Term Memory Units (LSTM) 21
Definitions for Prediction Accuracy 21
Related Work 22
The research problem and question 37
Aims and Objectives 39
The proposed new framework 39
Scenario 39
Definitions 40

6
Gated Recurrent Unit(GRU) 40
Optimizers in deep learning networks 41
Hyperparameters 42
Proposed Algorithm 43

Methodology 44
Current Deep Learning Approaches for Cyber Attack Rate Prediction 44
Data acquisition 45
Data preprocessing 46
Selection of Technologies 47
Tensor processing Units - TPU 47
Tensorflow 48
Keras 48
Model Development 48
Model Evaluation 50

Conclusions 52

References 55

Word Count 59

Appendices 60
Appendix 1 60

7
List of figures

Figure 1 : Architecture of the RNN (AmidiAmidi, 2021) 19

Figure 2 : Structure of an (AmidiAmidi, 2021) 20

Figure 3 : Architecture of Bi-RNN (AmidiAmidi, 2021) 20

Figure 4 : Architecture of LSTM block (Fang, 2018) 21

Figure 5: Pseudocode for ARMA+FARIMA model 24

Figure 6 : Flow of the Marked Point process framework 33

Figure 7 : Internal structure of GRU(Yang et al., 2020) 41

List of Tables

Table 1 : TST based fitting of attack rates per hour using updated Gray Box 30
model

Table 2 : Comparison between EVT and TST predictions 31

Table 3 : Average performance of GRU(Yang et al., 2020) 41

Table 4: Sample data instance in the final input file 47

Table 5 : Calculated MSE values for the BRNN-GRU network 50

List of Algorithms

Algorithm 1 : Pseudocode for predicting cyber attack rates using Gray Box 25
model(FARIMA+ARIMA)

Algorithm 2 : Pseudocode for calculating accuracy of predictions using Gray Box 25

model(FARIMA+ARIMA)

8
Algorithm 3 : Pseudocode for fitting stationary data to M1 in EVT model in 28
updated Gray Box model(FARIMA+GARCH+EVT)

Algorithm 4 : Pseudocode for fitting stationary data to M2 ,M3, M4 in EVT model 29

in updated Gray Box model(FARIMA+GARCH+EVT)

Algorithm 5 : Pseudocode for predicting cyber attack rates updated Gray Box 30
model(FARIMA+GARCH+EVT)

Algorithm 6 : Pseudocode for computing fitted values in BRNN-LSTM framework 35

(Fang, 2018)

Algorithm 7 : Pseudocode for predicting cyber attack rates in BRNN-LSTM 36

framework (Fang, 2018)

Algorithm 8 : Pseudocode for predicting cyber attack rates using BRNN-GRU 43

model

9
Introduction
The Internet has become a necessity factor in human lives. Almost everything in our
day-to-day life has an impact or involvement with the internet. The Internet helps the
world for quick globalization, fast information sharing and even a company can run fully
over the internet without having a dedicated office space. The Internet is immersive. It
can provide thousands of different services like infrastructure as a service, platform as a
service and also able to communicate with sensors in automobiles, nuclear and thermal
plants. Not only that, but the internet also became more important during the COVID-19
lockdowns. Most of the people had to work remotely and students had to learn remotely.
All of these opportunities open up thanks to the internet. Although cyberspace can
provide thousands of different services for the community, it is also not 100% safe. The
implementation of new and evolving technologies open up more opportunities for
attackers to exploit cyberspace. Attackers and Intruders can steal user information,
exploit large organizations over the internet and make monetary losses and even make
reputation damages. Hence all the users need to have extra attention when working
with the internet. Some people install anti-virus software on their personal computers to
protect from malware, worms - the ways attackers use to exploit the user devices.

Even though cybersecurity is a vast topic to discuss, this paper only focuses on
cybersecurity at the industrial level. Different companies protect their networks from
attackers in different ways. Most companies have separate cybersecurity divisions with
system administrators and ethical hackers to protect their systems. Intrusion detection
systems (IDS), Network-based Intrusion detection systems, Intrusion Prevention
Systems, honeypots and network telescopes also play an important role as deception
technologies in cybersecurity. Intrusion Detection systems monitor the network for
malicious activities or policy violations and trigger alerts for the security division. Then
the system administrators can analyse the alerts and take the necessary steps to stop
the propagation of the attacks or make the defenders ready to mitigate the attacks. At
present, most of these manual workflows are automated with different other
technologies. False-positive alarms in IDS are one of the significant issues faced by the

10
security teams. Not only that but also, the IDS cannot predict future attacks based on
patterns, some IDS cannot identify zero-day attacks and new types of attacks.
Therefore the researchers have made an immeasurable commitment to design and
suggest new methodologies to detect correct attacks, predict and forecast future attacks
and provide prevention methods for the future attacks. By using these approaches, the
system administrators can use their limited resources effectively and protect the
networks. Honeypots and Network Telescopes are popular cyber defence instruments
installed in networks to observe the internet traffic of a network. Honeypots are used to
attract attackers by acting as real production servers and collect information on the
attacks and attack types by interacting with them (Huang et al., 2019). As the internet
and cybersecurity evolve, the attackers' attack patterns also evolve. In Yang Li’s (Li et
al., 2019) research, he explained how attackers identify the dynamic honeypots and he
introduced a distributed honeypot approach to avoid the identification of honeypots. If
the attacker can identify the honeypot, they do not try to access it and then installation
of that honeypot becomes useless in the network. The attackers are getting smarter and
trying to attack stealthily. These results provide confirmatory evidence for the need for
proper detection, prediction and forecasting systems for cyberspace with high accuracy
prediction rates with fewer error values.

This paper is divided into 5 main sections. First section contains an introduction to this
report, the second section explains and analyses the literature related to the current
status of the prediction frameworks. It also discusses different types of deception
technologies, statistical properties in cyberattack data, current research work for cyber
attack rate predictions, identified problem statements, also the aims and objectives of
this research. Third section is dedicated to the details of the new deep learning
framework called BRNN-GRU. This section explains in depth discussion of data
preprocessing, analysing, model building and evaluations. Section four includes the
conclusion of the report and new paths for future work. This report ends with a
reference section.

11
Literature Review

Deception Technologies

HoneyPots and HoneyNets

A honeypot is a decoy tool in network security that is used to lure attackers to interact
with it by providing the services as in the production servers. These services are just an
interface for the attackers to access, but these services are designed only to capture the
attacker’s sensitive data without providing the proper intended service as the production
servers. By doing that the honeypots try to exhaust the attackers' resources and capture
more details about the attack(J. ShimeallM. Spring, 2014). This can be a partial or full
duplication of the real servers, therefore the attacker tries to attack the systems without
knowing they are on a forgery server. Honeypots are considered as dynamic and
real-time analytical tools(Li et al., 2019). Due to this behaviour, all the traffic coming to
the honeypots are considered malicious traffic. There can be traffic caused due to
misconfiguration of the ports or IP addresses that are not created by the attacker. On
that account, such traffic should be separated when analysing the honeypot traffic data.

There are three types of commonly used honeypots in cybersecurity.

● Low - Interaction Honeypots

These types of honeypots are capable of collecting low-level information about attacks
such as source IP addresses, attack frequencies etc. These honeypots do not provide
any physical environment for the attackers to access so, this type of honeypots only
captures attackers' attempts and creates security alerts.
● High - Interaction Honeypots
These types of honeypots are more interactive with the attackers. It allows the attackers
to compromise the system by allowing access to the services. Therefore these
honeypots can capture very sensitive information about the attacks including attackers'
attack patterns, attack types, originating countries etc. These types of honeypots are
expensive to install.

12
● Medium - Interaction Honeypots
Medium interaction honeypots can capture more details about the attacks than low
interaction honeypots and less information than high interaction honeypots. These types
of honeypots do not provide access to the system but they are more sensitive to collect
information during attack attempts.

The data in high interaction honeypots come with privacy issues as it contains more
valuable information about attackers and also about the vulnerabilities of the network
and services of the company.

Honeynets are a collection of honeypots designed to offer more complex deception

interactions with attackers.

Network Telescopes
These are also dynamic and real-time deception tools which passively monitor the
network traffic. They are built on a portion of routed IP address space in which little or
legitimate traffic exists. Normally, the traffic for the unused address space and
illegitimate traffic is forwarded to the network telescopes for real-time analysis. Some
networks forward the whole network traffic to the network telescopes without sending
any response back to the user (Hunter et al., 2013).

The major difference between Honeypot and Telescopes are,

The honeypots are separate servers which use to lure the attackers to get the attacker’s
attention while network telescopes passively analyse the illegitimate traffic in the
production servers without setting up separate servers to lure the attackers(Hunter et
al., 2013).

Studying the honeypot data to understand attack patterns, categorise and prioritize
them started a few decades ago. Different researchers used different methodologies to
understand the data. Some of them used visualization techniques to understand the
honeypot data. For example, visualizing the ports observed in honeypots using neural

13
projection techniques (HERRERO et al., 2012). Even Though there are few techniques
to understand the honeypot data, the most common approach is statistical analysis.
With this, the researchers can understand the statistical properties exhibited in the
cyberattack data.

Honeypot and Network telescope data are massive resources for researchers to study
cyber attacks. Based on these datasets, different researchers have proposed and
designed approaches to understand anomaly behaviours and probing in the networks,
infer distance of denial of service attacks and track internet worms. Analysing the
internet of things(IoT) related botnets and exploited IoT devices are also studied in
(Torabi et al., 2020) & (Safaei Pour et al., 2020). Denial of service attacks during an
election is studied by (Lutscher et al., 2019). Darknet traffic stabilities are studied in
(Vichaidis et al., 2018). These examples show the usage of honeypot and telescope
data to secure systems in different ways. Apart from the above mentioned usages,
Honeypots and Network telescopes play a major role in understanding and developing
cyber attack detection, prediction and forecasting models.

When I conducted the previous related work for predicting cyberattack rates, I figured
out that the development of introducing frameworks started from attack graphs - which
is considered as a primary level of predicting methodology. Currently, the researchers
have grown from that basic level to the machine learning level of frameworks and
provide many more useful details and accurate predictions to the defence systems to
mitigate and protect the networks from attacks before they exploit the systems. I have
conducted a survey related to cyber-attack detection, prediction and forecasting
methodologies. This survey explains the history and the development of methodologies
until the usage of machine learning and deep learning (Senanayaka, 2020).

In this paper, I only focus on the major statistical frameworks which were really helpful
to understand the honeypot data which I used as the resource for cyber attack data.
Before deep-diving into the existing frameworks, I would like to explain more technical
terms to understand the concepts clearly.

14
Definitions
This section describes some statistical preliminaries which will be useful in discussing
the concepts in this paper.

Definitions for Statistical Approaches

Stochastic Process
A stochastic(random) process is a statistical phenomenon consisting of a group of
random variables { X θ } indexed by a parameter θ , where θ belongs to some index set
Θ . In most cases, this Θ represents time (Zhang, 2021). Hence in the context of cyber
attack data, a stochastic (random) process is a statistical phenomenon consisting of a
collection of random variables ordered in time.

Poisson Distributions
If a statistical distribution is able to show the number of events that is likely to occur for
a given period of time, then this distribution is called a Poisson distribution. These
poisson distributions have independent events which occur with a constant rate over the
given time.

Also, they exhibit memoryless properties as poisson distribution considers independent

events over time. Cyber attack rates are not constant at any given time, hence
honeypot cyber attack data are considered not to have a Poisson distribution.

Stationarity Process
A stationary stochastic processes ensemble the same statistics for any value of time. A
time series is said to be stationary if there is no systematic change in mean/trend,
variance, and if it contains no strictly periodic variations. Honeypot cyber attack data are
considered to have stationary distribution.

15
Long Range Dependence (LRD)
A stationary time sequence which instantiates a stochastic cyber attack process, then
{X t : t ≥ 0 } is said to possess LRD if its autocorrelation function [2] for 0 < β < 1,
where h is called lag, is slowly varying for (Fang, 2018),

[1]

[2]

The degree of LRD is expressed using Hurst parameter ( H ) which relates to β in [2]
(Zhan et al., 2013)

β= H 2H [3]

Autoregressive Moving Average Model (ARMA)

LRD-less model ARMA(p,q) is a very popular time series model. It can be described as
an autoregressive moving average process of orders p and q with,

[4]

where ε t is an independent and identical normal random variable with mean 0 and
variance

Fractional AutoRegressive Integrated Moving Average Model (FARIMA)

FARIMA(p,q,d) , Fractional ARIMA model is also a popular time series model which can
analyse time series data with LRD phenomenon. Same p,q variables as in ARMA
model, the d parameter where 0 < d < ½ and H = d+ ½

A stationary process X t is called FARIMA(p,q,d) if,

16
[5]

For some -½ < d < ½ , where

[6]

Where B is the back shift operator, BXt = Xt-1 , B2Xt = Xt-2 , ……

Generalized AutoRegressive Conditional Heteroskedasticity Model (GARCH)

A time series { X t } is called GARCH process if it follows,

[7]

This ε t is the standard white noise.

Standard GARCH(SGARCH) will calculate it’s variance using the following equation.

[8]

Integrated GARCH(IGARDCH) model uses,

[9]

where .

Extreme Value Phenomena

Let X 0 , ….., X n are the cyber attack rates of a target for a given hour. If u > 0 be the
threshold. Then any attack rate in the series X i is considered to be an extreme attack

17
value. The extreme values formulate a point process over a state space
χ = (0, t n ] × (u, ∞) as,

[10]

The point process can be treated as a non-homogeneous poisson process with

following intensity function,

[11]

Where (x)+ = max{x, 0} , σ > 0 , μ , ξ are the scale, location and shape parameters
respectively (Peng et al., 2017).

Standard Autoregressive Conditional Duration (Standard ACD)

[12]

Where ω , aj , bj ≥ 0 and p and q are positive numbers indicating the order of the
autoregressive terms(Peng et al., 2017).

Log Autoregressive Conditional Duration (Standard ACD)

[13]

In Log ACD the researchers set p = q = 1 , because the higher order does not increase
the prediction accuracy(Peng et al., 2017) (BauwensGiot, 2000) (Bauwens et al., 2004).

18
Definitions for Deep Learning Approaches

Recurrent Neural Network (RNN)

This type of neural network passes the output from the previous layer as inputs to the
next layer. These networks can handle sequential data efficiently and consist of input
layers, hidden layers and output layers. Not only that but also they can preprocess
input of any length. The model size is not increasing with the size of the input. Weights
are shared across the time and the input to the next layer is dependent on the activation
function, bias and input data. These networks suffer from vanishing gradient problems
due to long-range dependencies and computation being slow(AmidiAmidi, 2021).

Figure 1 : Architecture of the RNN (AmidiAmidi, 2021)

For each timestamp t, the activation function can be defined as,

[14]

Output from each layer can be defined as,

[15]

19
where W ax , W aa , W ya , ba , by are the coefficients that are shared temporally and g 1 ,
g 2 are the activation functions. The following diagram illustrates a RNN cell(AmidiAmidi,
2021).

Figure 2 : Structure of an (AmidiAmidi, 2021)

Bi Directional Recurrent Neural Network (BRNN)

Unidirectional RNN learns from only the past as it only takes one sequence as the input.
But bi-directional RNN learns from the past and future values. For that, it uses
backpropagation and front propagation. The values from both propagations are taken as
a single output of the relevant layer and input to the next layer.

Figure 3 : Architecture of Bi-RNN (AmidiAmidi, 2021)

20
Long - Short Term Memory Units (LSTM)
This is another RNN structure that is mainly used to fix the vanishing gradient issue that
exists on RNN. These LSTM consists of memory blocks that contain some memory
cells with self connections, and store the temporal states of the network states. Also
there are gates to control these memory states known as Input Gate, Forget Gate and
Output Gate. Forget gate decides which information to keep and remove about the state
of the previous cell (Fang, 2018).

The following figure illustrates a LSTM block with gates. In here,

it is the information gate, f t is the forget gate, ot is the output gate and ct is the cell
gate for a given time t .

Figure 4 : Architecture of LSTM block (Fang, 2018)

Definitions for Prediction Accuracy

Let Xm , Xm+1, ……, Xz are the observed data for attack rate Xm for m ≤ t ≤ z and Ym ,
Ym+1, ……, Yz are the predicted results from the model. Then the prediction error can
define as, et = Xt - Yt for m ≤ t ≤ z

Overall prediction error can be calculated using Percent Mean Absolute

Deviation(PMAD),

[16]

21
Overall Underestimate error (UE) can be calculated as follows.

[17]

UE is useful when the defender is willing to over provision some defense resources to
mitigate incoming attacks.

Mean Square Error (MSE) is one of another metric to evaluate the accuracy of the
predictions in a model.A model with less MSE value is considered as the best model for
the predictions. MSE can be calculated as (Fang, 2018) ,

[18]

Related Work
Cyber attack detection, prediction and forecasting using data science is not really a new
topic and the history about that goes back a few decades ago. Researchers used
different approaches to do detection, prediction and forecasting using discrete models
such as graph theory, game theory and continuous models such as time series analysis,
grey models and machine learning approaches like neural networks, SVM etc (Husak et
al., 2019). The evolution of these models from attack graphs to deep neural networks
are surveyed in (Senanayaka, 2020) and it includes the advantages and disadvantages
of each type of approach.

As this paper mainly focused on building a prediction model for predicting cyberattack
rates before a given time, the following research outcomes played a major role in

22
understanding the statistical features of the cyber attack time series and laid a
foundation for the latest prediction models.

The statistical framework introduced by Zhenxin Zhan and his team consists of a novel
approach of a framework called Gray box which is implemented based on the stochastic
cyber attack process mathematical approach (Zhan et al., 2013). This model can predict
attack rates 1 hour ahead of time with an accuracy of 70.2 - 82.1%. The paramount
discovery of this framework is, this is the first research that found the existence of long
range dependence exhibits in honeypots captured cyber attack data. Also, this model
proved that the stochastic cyber-attack processes do not follow Poisson's distribution
and instead can have an LRD phenomenon. Also, they identified two possible causes
for existing LRD features in cyberattack data.

1. Intense Attacks (Consecutive Attacks)

2. Heavy Tailed Processes - The probability distributions of these processes are
non exponentially bounded. These distributions tend to have many outliers with
high values.

With keeping this in mind, this model can be instantiated on different resolutions such
as network-level using IP addresses, victim level using the service types and port level
using port addresses. This paper has calculated the existence of LRD in ,
● Network-level - 80%
● Victim level - 70%
● Port level - 44.5%

Therefore LDR is a must to consider when analysing and training models with honeypot
cyber attack data. In Addition to that, this framework has predictive power to predict
cyberattack rates ahead of time.

In technical terms, this framework consists of ARIMA and FARIMA models. In black-box
models(ARIMA), the user inputs the honeypot dataset to the black box framework and

23
trains and gets the prediction only using a mathematical model inside the framework. It
doesn't analyse the data based on the statistical properties.

Furthermore, when the statistical properties of the attacks are different, they should
analyse differently to get accurate results. Hence the prediction values taken from the
black box are less accurate and it is proven in this paper. To analyse the dataset with its
own statistical differences, this framework provides a gray box model where the different
time series analyses differently. To do that this model contains ARIMA and FARIMA
models. The definitions of these models can be found in the definition section. ARIMA
models are not capable of accommodating LRD features hence, this ARIMA model will
be used for the attack rates which do not exhibit LRD properties. The FARIMA models
are capable of handling LRD properties hence it will be considered as the LRD aware
model in this framework.

The pseudocode of this framework is as follows.

Figure 5: Pseudocode for ARMA+FARIMA model

24
INPUT : Observed Attack Rates sequence for the given time period t {X 1 , ........, X t } , Number
of hours ahead we need to predict the attack h

OUTPUT : Prediction Results Y t+h , Y t+h+1 , ......

PROCESS :
1. Repeat
2. Fit {X 1 , ........, X t } to obtain the finest model (M t ) from the Grey box. Time series with
LRD feature will pick FARIMA and time series without LRD feature will pick ARIMA
model for the prediction. Relevant p,q,d for the best fitting models will be selected using
AIC criterion(CryerKung-Sik, 2008)
3. Use M t to predict Y t+h The predicted cyber attack rates at t + h
4. X t+1 ← Newly observed attack rates at time t + 1 because this framework supports for
real time data
5. t ← t + 1 Observe more data when t evolves
6. Until no need to predict further cyber attack rates

Algorithm 1 : Pseudocode for predicting cyber attack rates using Gray Box
model(FARIMA+ARIMA)

INPUT : Observed Attack Rates sequence for the given time period t {X 1 , ........, X t } , Number
∈
of hours ahead we need to predict the attack h, p (0, 1) where p is the prediction starting
point

OUTPUT : Prediction Accuracy

PROCESS :
1. t ←*
t p
2. While t ≤ (t h) do
3. Fit {X 1 , ........, X t } to obtain the finest model (M t ) from the Grey box. Time series
with LRD feature will pick FARIMA and time series without LRD feature will pick
ARIMA model for the prediction. Relevant p, q , d for the best fitting models will be
selected using AIC criterion(CryerKung-Sik, 2008)
4. Use M t to predict Y t+h . The predicted cyber attack rates at t + h
5. Computer prediction error et+h = X t+h
Y t+h In here Y t+h calculated from the
above algorithm
6. t ←t + 1 Observe more data when t evolves
7. end while

25
8. Compute PMAD, PMAD’ , OA, UA
9. return PMAD, PMAD’ , OA, UA

Algorithm 2 : Pseudocode for calculating accuracy of predictions using Gray Box

model(FARIMA+ARIMA)

The dataset used in this research is from UCSD Network Telescope instrumentation
which captures hourly files of raw IPv4 packets of unsolicited traffic. This traffic contains
a wide range of events including misconfiguration, scanning of address space by
attackers or malware looking for vulnerable targets, backscatter from randomly spoofed
denial-of-service attacks, and the traffic from automated spread of malware(CAIDA,
2021).

The data files in CAIDA network telescopes come as pcap files. But in this paper, the
researchers have reassembled the pcap files as flows using COTS devices. These
devices are capable of extracting flows from UDP and TCP traffic. In the preprocessing
of data, they have disregarded the attacks against non production ports because such
connections are often dropped because with non production ports, attackers cannot
access the production servers. Also, flows without FIN, RST flags are also dropped if
the flow timeout exceeds 60s and flow lifetime exceeds 300s.

The accuracy of the predictions is calculated using PMAD(Percent Mean Absolute

Deviation). Also, this framework calculated overall accuracy(1-PMAD) and
underestimation accuracy(1-PMAD`) on datasets in 5 periods in order to prove the
accuracy of the gray model against black box model. Appendix 1 illustrates a table
containing PMAD values for ARIMA and FARIMA approaches.

The major issues I identified in this model are as follows,

● One dataset which exhibits the LRD phenomenon gave fever accurate results for
the FARIMA model. This indicates that there can be many more statistical
phenomena in telescope cyber attack data

26
● Traditional time series model are much expensive and take more computational
power
● Analysing long term forecasts and heavy trailed processes give poor
performances and inaccurate results with ARIMA models (Zhai, 2005)
● ARIMA models tend to be unstable, both with respect to changes in observations
and model specification.

The same team conducted another research in 2015 to find out more statistical features
exhibited in the honeypot and telescope cyberattack data (Zhan et al., 2015). They
found that extreme value phenomena also exist on cyber attack data. According to that,
they have upgraded the above Gray box model to accommodate extreme values. In the
new framework, they integrated two complementary statistical approaches called
Extreme value theory (EVT) and time series theory (TST) to predict cyber attack rates in
an efficient way.

According to the model, EVT can offer long term predictions 24h ahead of time and gray
box TST model can predict attack rates 1h ahead of time with an accuracy of
86.0-87.9%. To accommodate extreme values in the TST model, they have introduced
the FARIMA + GARCH time series approach where GARCH can accommodate extreme
value phenomena and FARIMA can accommodate LRD features. The defender can use
EVT predictions while making the adjustments to the resources perfectly based on the
TST model prediction which will get 1h before. In this model, different GARCH models
such as SGARCH, IGARCH and skewed Student-T distribution (SSTD), skewed
Generalized Error Distribution (SGED) are used to facilitate different time series data
with different noises.

In EVT based analysis, the team has introduced 4 models as follows,

1. M1 : The standard GPD (Generalized Pareto Distribution)
2. M2 : GPD with time-invariant shape parameter ξ but time dependent scale
parameter

27
3. M3: GDP with time-invariant scale parameter σ but time dependent shape

parameter

4. M4 : GDP with time-dependent parameters and

The data with stationary extreme attack rates will use M1 and if M1 cannot fit well, then
the model will use non-stationary models M2,.., M4 to fit the rest of the non stationary
extreme attack values. Some standard goodness-of-fit statistics and QQ plots will be
used for evaluation of fitting the data to these models.

INPUT : Cyber Attack time series

OUTPUT : M1 fitting results

PROCESS :
1. Initialize quantileSet. Quantile set is an ordered set of quantiles where the maximum
quantile is chosen by assuring is has at least or more than 30 extreme values
2. For q ε quantileSet (minimum to maximum) do
3. Use the standard GPD to fit the extreme attack rates that are greater than the
threshold quantile q
4. Evaluate good-of-fit statistics CM, AD(ChoulakianStephens, 2001) and QQPlot
5. If fitting is good then
6. Estimate GPD parameters ( ξ , σ ), with extremal index θ
7. Return (q, ξ , σ , θ )
8. End if
9. End for
10. Return -1 when there is no stationary extreme value fitting

Algorithm 3 : Pseudocode for fitting stationary data to M1 in EVT model in updated Gray Box
model(FARIMA+GARCH+EVT)

28
INPUT : Attack rate time series which cannot fit M1 model

OUTPUT : fitting results

PROCESS :
1. Initialize quantileSet.
2. For q ε quantileSet (minimum to maximum) do
3. Use models M2, M3 and M4 to fit the extreme attack rates that are greater than
the threshold quantile q
4. Evaluate good-of-fit statistics via AIC (Akaike Information Criterion) and QQPlot
5. If any of the three models fits well then
6. Choose the model with the minimum AIC value or choose the simplest
model whose AIC value is closer to the minimum AIC value
11. Return (q,AIC value ) for the selected model of M2,M3,M4
12. End if
13. End for
14. Return -1 when there is no extreme value fitting

Algorithm 4 : Pseudocode for fitting stationary data to M2 ,M3, M4 in EVT model in updated Gray
Box model(FARIMA+GARCH+EVT)

The EVT model does not need a training phase like other machine learning processes
do. It predicts the next attack rates based on the prediction done with the quantile sets.

The TST approach uses the following FARIMA+GARCH model.

[19]

In here as well there are few models to use with different noises in the time series data.

1. M5 : FARIMA + SGARCH + SSTD

2. M6 : FARIMA + SGARCH + SGED
3. M7 : FARIMA + IGARCH + SSTD
4. M8 : FARIMA + IGARCH + SGED

29
The pseudocode for the TST model is as follows.

INPUT : Observed Attack Rates sequence for the given time period t {X 1 , ........, X t } , FARIMA
+ GARCH family (M5,M6,M7,M8), Number of hours ahead we need to predict the attack h , lag
value where 0 < l < 1

OUTPUT : Best prediction model M ε {M 4, ...., M 8}

PROCESS :
1. For i = 1 to 4 do

2.
3. While m + h ≤ n do
4. use {X 1,.............., X m } to obtain the best model( M i ) from the Gray box. Gray
box contains the FARIMA and GARCH models which together support LRD features and
extreme value theory.
5. Use M i to predict attack rates {X m+1 , ........, X m+h } The predicted cyber
attack rates at t + h
6. m = m+h
7. end while
8. evaluate PMAD values and AIC values for the predictions
9. end for
10. return M ε {M 5, M 6, M 7, M 8} with the smallest PMAD value.

Algorithm 5 : Pseudocode for predicting cyber attack rates updated Gray Box
model(FARIMA+GARCH+EVT)

Table 1 : TST based fitting of attack rates per hour using updated Gray Box model

According to the above PMAD comparisons, FARIMA+GARCH models provide highly

accurate prediction results than FARIMA models.

Also the researchers have compared this new gray box model with Hidden markov
model and Symbolic dynamic models which are other approaches to predict the attack

30
rates. The analysis proved that the FARIMA+GARCH model gives more accurate
results than Hidden Markov model and Symbolic dynamics models with 1h ahead of
time prediction results.

Finally the researchers calculated prediction results using both EVT and TST
approaches and proved that the both approaches together can provide accurate results
in 24h prior to the attack rates and more precise attack predictions in 1 h ahead of time.
The following table includes all the details about their findings.

Table 2 : Comparison between EVT and TST predictions. In here H a b value means the predictions
correspond to the time intervals between the ath and the bth hour. Each period has three rows. The 1st row
indicates the prediction values from EVT based prediction and corresponding PMAD value in the 7th
column. The second row indicates the observed maximum attack rates and third row indicates maximum
attack rates predicted from the TST model with h = 1 and the corresponding PMAD value.

The major issues in this model are,

● This model only considers LDR and extreme value phenomena in the cyber
attack rates. But there can be many more statistical features like cyclic trends
and seasonal patterns in the data which need to be considered for more accurate
results.
● The time complexity of the model is expressive as there are lots of loops in the
algorithm and when it comes to big data, the model will take a considerable
amount of time to process.

31
● Space complexity is also high as the model keeps another set of models to
support all GARCH models with different noises and EVT models for stationary
and non stationary data.
● According to table 3, the return levels of EVT based predictions of return level
are often higher than the observed maximum attack rates. But TST based
predictions of the maximum attack rates are often less than the observed
maximum attack rates.

With the influence of the above research, Peng and his research team has developed a
framework for predicting extreme cyber attack rates using only an EVT approach(Peng
et al., 2017). They introduced a marked point process approach framework to fit and
predict cyber attack rates. Not only that but also they introduced the existence of
correlations and interdependencies between extreme values which is an important
scenario to consider when predicting cyber attack rates. They found some drawbacks of
the new Gray model approach introduced by (Zhan et al., 2015) as follows.

- The new Gray box method uses the classic POT(Point Over Threshold) method
to model the magnitudes of the exceedances without considering the
dependencies between inter exceedance times(time intervals between extreme
values). Hence the accuracy of the model is low.
- Classical EVT considers the distribution of extreme values in the cyber attack
data as a poisson distribution although it is not.
- The GARCH model used in the Gray model is not theoretically proved to support
clustering behaviours of extreme values. Furthermore, EVT methods use
clustering the extreme values via quantiles.

This framework is able to simultaneously accommodate the magnitudes of the extreme

values, inter exceedance time between extreme values and the dependencies between
the arrival times of extreme values.

In order to incorporate dependencies between inter exceedance with the classical POT
method, the researchers proposed a marked point process approach to facilitate the

32
modeling of magnitudes of the extreme values using the POT method and predicting
the arrival of extreme attack rates using Autocorrelation Conditional Duration (ACD) and
log ACD. ACDs effectively accommodate the slow decay of autocorrelation and burst of
extreme value clusters which GARCH models do not support theoretically. Also, they
can dynamically adjust quantile levels in order to predict extreme cyber attack rates
accurately. This ACD model consists of ground process and marked point distribution.

Value At Risk( VAR) has been used to measure cyber risk of intensive attacks. It can be
described as the probability of the severity of the extreme cyber attack rates and
according to these values the network defenders can allocate adequate resources to
mitigate the cyber attack.

The following diagram illustrates the flow of the Marked Point process framework.

Figure 6 : Flow of the Marked Point process framework

33
Furthermore, this research used Honeypot data and Telescope data captured from the
CAIDA network (CAIDA, 2021). As in the previous frameworks, this also preprocessed
the telescope data according to (Claffy et al., 1995) and converted time series data to
flows and analysed them. They followed (Almotairi et al., 2008) for the preprocessing of
Honeypot data which I am going to use to introduce a deep learning framework for
cyber attack rate prediction in this paper.

The major drawbacks of this model are,

● This only predicts extreme cyber attack rates prior to 1h, 4h, 10h ahead of time
● There can be more statistical features exhibits in cyber attack data such as
noise, trend and seasonality features and this model does not support them

A deep learning approach is implemented by (Fang, 2018) to predict cyberattack rates

called the BRNN-LSTM framework which is capable of accommodating the statistical
features which we discussed in the above literature review. This model is composed of
an input layer, 2 hidden layers and 1 output layer. The 2 hidden layers are the LSTM
layers which are used for forwarding and backward propagation. The states of these 2
hidden layers are independent of each other and the output layers use the combined
version of the last state of the 2 hidden layers as an input to do the predictions. LSTM
cells help to accommodate long range dependence on the time series data. Therefore
this model is capable of handling vanishing gradient issues in the RNN network.
Furthermore, The training of the model follows the following objective function.

[20]

︿
︿
Where m is the size if the input, y i and y i are the output values from the LSTM layers
and the observed values at step i, W and U are weight matrices. Also λ is the user
defined penalty parameter to minimize overfitting issues.

34
There are two main algorithms used in this model to facilitate extreme attack rates and
normal attack rates. Extreme attack rates are identified by fitting the values to another
algorithm.

INPUT : Historical cyber attack time series data { (t, y t ) | t = 1, ...., m }, iteration b = 10000 ,
penalty parameter λ = 0.001

OUTPUT : Fitted values for various combinations of (r, l) ’s

PROCESS :
∈
1. for r {20, 30, 40} do
2. Divide the data set into mini batches of size r
3. ∈
for l {2, 3, 4, 5} do
4. Randomly initialize a l -layer BRNN-LSTM with parameter saved in Θ
5. j 0 ←
6. while j <= b do
7. Computer j according to the optimization equation by performing
forward propagation
8. Update Θ using the Adam optimizer
9. j j+1 ←
10. end while
11. for each data point at t do
︿
12. Compute y t by performing forward propagation
13. Fitted value ←
︿
yt
14. end for
15. return Fitted values for combination (r, l)
16. end for
17. end for

Algorithm 6 : Pseudocode for computing fitted values in BRNN-LSTM framework (Fang, 2018)

The following algorithm is used to make mini batches to pass to the training model in
order to increase the computing efficiency. After the completion of training, the model
makes predictions for the testing dataset and provides prediction results and values for
evaluation matrices as the output. Apart from the number of neurons per layer, the other
values are hard coded in the algorithm.

35
INPUT : Historical time series data with in-sample set
{ (t, y t ) | t = 1, ...., m } and out-of -sample set { (t, y t ) | s = m + 1, ...., n } iteration b = 10000 ,
penalty parameter λ = 0.001 , (r, l) calculated from the Algorithm 6.

OUTPUT : Predicted values

PROCESS :
1. Split the in-sample set into mini-batches of size r
2. Randomly initialize a l -layer BRNN-LSTM, with all the parameters saved in Θ
3. j ←
0
4. while j < b iteration do
5. for each mini-batch from the in-sample set do
6. Compute J by performing forward propagation
7. Update Θ using the Adam optimizer
8. end for
9. j ←
j+1
10. end while
11. Predictions ←
null
12. for each data point, s, in the out-of-sample set do
︿
13. Compute y s by performing forward propagation
14. Predictions
︿
ys ←
15. end for
16. return Predicted value

Algorithm 7 : Pseudocode for predicting cyber attack rates in BRNN-LSTM framework (Fang,
2018)

This model is compared against ARIMA, ARIMA+GARCH and a hybrid model and it is
evaluated against MSE, MAD(Mean Absolute Deviation), PMAD and MAPE(Mean
Absolute Percentage Error) accuracy matrices. The results proved that the
BRNN-LSTM model provides high accuracy results with a minimal error rate.

The major issues in this model are,

● The algorithms are complex and contain several for loops. When the dataset is
too big, this will use a considerable amount of computational power and time for
training and prediction
● The algorithm can tune more using hyperparameter tuning

36
The research problem and question
According to the gap analysis conducted for this field, different researchers contributed
different ways to predict cyberattack rates with high accuracy. Most of the latest
research is based on the previous ones and has fixed most of the drawbacks and gaps
available on those research models. Not only that by taking all the challenges and the
gap analysis conducted in the literature review it shows that current prediction
methodologies need to be updated with high accuracy prediction models to warn the
network defenders about cyber attacks in a few hours ahead. It will allow the defenders
to allocate adequate defence resources purposefully to manage the attacks and protect
the whole network(Peng et al., 2016). Not only that but also most of the statistical
features exhibited in the cyber attack time series data are not identified yet. Therefore,
developing a model to predict the next cyber attack with 100% accuracy is a tedious
task.

The major gaps available in this research domain are,

1. Most of the currently available models use traditional statistical approaches like
ARIMA, FARIMA models to predict the attack rates. These models are outdated
and contain lots of drawbacks and inefficiencies. Therefore the models need to
be upgraded with the latest statistical models which overcome the drawbacks
and inefficiencies.
2. These models need to evolve with the technology in order to handle new attack
patterns and attackers approaches.
3. The models need to be upgraded according to the newest features found in
cyber attack time series data. The majority of the current models also cannot
handle extreme cyberattack rates and dependencies among the features of
cyberattacks.
4. Most of the currently available models are not able to self learn and tune the
weights and bias according to the training cycles.
5. Most of the currently available models cannot predict the attacks more than
5-10h ahead. Also, there is no predefined value(hours) for the prediction of the
attack prior to the real attack. Sometimes the longer prediction hours can

37
overprovision the defending resources for a particular attack, which is an
expensive task.
6. Most of the available models can predict cyberattacks based on honeypot or
telescope data only. They cannot predict cyber attack data based on both types
of datasets.
7. Feature selection should be conducted for machine learning models
8. Most of the models are tested only for the low interaction honeypot datasets.

Not only the above mentioned problems, but we also found that the difficulty of working
with statistical approaches for a person without proper knowledge of statistics. Hence
we want to minimize the statistical approach model development in this paper and
redirect the user to use machine learning approaches to develop models to predict
cyber attack rates. The main issue found with the machine learning approaches are, the
neural networks can have only a few layers to train the model and the proper features
that contribute to the cyber attack predictions need to be inputted into the network. As
the cyber attack patterns change every day, using the same set of features is not a
good solution. Also running feature selection every time is a costly task. As a solution,
deep neural networks can be used to generate models with multiple neural layers. Also
the model trains by itself, so the user does not need to input the features into the model.
Aside from that, deep learning models can accommodate statistical features as well.
Deep learning models are easy to build and train with less computational power and
less code base.

38
Aims and Objectives
This proposed research is planned to implement based on the research carried out by
Fang (Fang, 2018) to build a deep neural network approach to predict the cyberattack
rates. This proposed solution will contribute to the following gaps in the current domain.
● Increase the accuracy level of the deep neural network by increasing the
performance of the network to identify long-range and short-range of
dependence and high nonlinearity of the data.
● Fine tune the model by introducing efficient hyperparameters to get high
accuracy rates
● Introducing new approaches in deep learning to predict cyber attack rates

The proposed new framework

Scenario
As discussed in the introduction section, predicting cyberattack rates prior to the actual
attack will help the defenders to efficiently protect the company networks or mitigate the
risks which could happen due to unexpected attacks. Allocating resources for a
defender task is considered as a costly task and over provisioning resources for a
particular attack will cause problems when allocating resources to handle other attacks.
For example, if the network defenders allocated 90% of its resources (Over Provisioning
of resources ) for an email attack which cannot create severe damage to the system
and at the same time allocated only 10% resources (under provisioning of resources)
for a high severity attack, then eventually the network is unable to protect itself from all
the attacks and malicious activities efficiently. Not only that but also, apart from highly
secured companies, most of the other companies do not allocate more resources for
network security as the defenders are expensive and they pay much more attention only
to the production related resources like servers and databases.

Therefore allocating defender resources efficiently is a high priority task in order to

safeguard the networks from cyber threats. There are different frameworks designed by

39
different researchers in order to predict the cyber attacks ahead of a given time prior to
the attack. Accuracy of the predictions is the most crucial metric when designing a
prediction model. The models discussed in the literature review section explains the
development of the prediction models from statistical approach to deep learning
approach. The researchers are trying to develop hybrid models and new deep learning
models like timeGAN(Yoon et al., 2019) to work with time series data. But cyber attack
rates are not just a time series data and there are only a few important features
identified in cyber attack data in the current research world. Hence adapting to new
deep learning models will take more research time.

Despite that fact, this paper tries to improve the accuracy of the current deep learning
models - Bidirectional Recurrent Neural Network with LSTM by introducing Gated
Recurrent Unit (GRU) cells and tuning the model with many more hyperparameters in
order to increase the accuracy.

Definitions

Gated Recurrent Unit(GRU)

GRUs are the newest popular variant of RNN just like LSTM which can handle long
term memory. RNN uses GRUs in order to handle it’s gradient explosion and gradient
disappearance problems that occur when it works with a large set of data (Yang et al.,
2020).

GRU only consists of 2 gates as an Update Gate and a Reset Gate while LSTM
consists of 3 gates called Update gate, Reset Gate and Forget Gate. The following
diagram illustrates a simple GRU cell.

40
Figure 7 : Internal structure of GRU(Yang et al., 2020)

Research has been conducted by (Yang et al., 2020) to compare the performance of
LSTM and GRU on a text based prediction.

Dataset Size Accuracy Rate Recall Rate F1 Value AUC

Large dataset 91.12% 97.07% 96.04% 86.98%

Small dataset 99.71% 100.67% 100.61% 99.46%

Table 3 : Average performance of GRU(Yang et al., 2020)

According to the above research conclusion, the GRU is not suitable for analysing a
very large dataset with many dependency features although it uses less computational
power and provides results in less time compared to LSTM.

But in our scenario, in predicting cyber attack rates we can use the GRU instead of
LSTM in the deep learning network as there are not many dependencies among
features of the time series data.

Optimizers in deep learning networks

Optimizers of deep learning networks are the algorithms that are used to change the
attributes of the network such as weights and learning rates in order to minimize the
loss of the network. A good network should have a minimum loss value. The main
optimizer which is used in the BRNN-GRU is an Adam Optimizer.

41
Adaptive Moment Estimation (Adam) optimizer has an adaptive learning rate that learns
during the training of the network. These types of optimizers are better to use than static
optimizers because the models learn and increase the accuracy with a number of
training cycles. At the same time, the loss values should be decreased and adaptive
with the number of training cycles. Adam optimizer works on first and second order
momentums. It stores the exponentially decaying average of past squared gradients
and the exponentially decaying average of past gradients. Therefore the Adam
optimizer is fast and converges very rapidly. Also, it rectifies vanishing learning rates
and high variance (Doshi, 2019).

Although Adam optimizer is the suitable optimiser for our scenario, there are a lot more
optimizers that can be used in different neural networks. A detailed comparison is
available in (Doshi, 2019).

Hyperparameters
The variables which determine the structure of the neural network are called
Hyperparameters. The model accuracy and the performances can be changed
according to different hyper parameter values. Weight, activation function, epochs,
learning rates and optimizers are some of the common hyperparameters defined in
neural networks. Hyperparameter tuning can help to increase the performance,
accuracy and reduce the error rates. Grid search, random search or Bayesian
optimization are common methodologies to optimize hyperparameters in deep neural
networks (LoshchilovHutter, 2016). More information about hyperparameter tuning can
be found from (LoshchilovHutter, 2016), (ReimersGurevych, 2017) and (Tensorflow,
2021)

42
Proposed Algorithm

INPUT : Historical cyber attack rate time series data

OUTPUT : Predicted values with Mean Squared Error

PROCESS :
1. Split the dataset into training, validation and testing data
2. Create small data frames (j) for training dataset inorder to minimize overfitting of the
model by passing start index, end index and no.of rows per file
3. for Each mini-batch/data frame (j) do
4. Make each data frame compatible to pass GRU with specific shape
5. Shuffle the indices to get new order of files in the next iteration
6. Return GRU compatible dataframes
7. end for
8. Create small data frames (i) for testing dataset inorder to minimize overfitting of the
model by passing start index, end index and no.of rows per file
9. for Each mini-batch/data frame (i) do
10. Make each data frame compatible to pass GRU with specific shape
11. Return GRU compatible dataframes
12. end for
13. Create the model with Bi directional RNN with GRU, set dynamically changeable
hyperparameters
14. k ← 0
15. while k < b iteration do
16. for Each mini-batch from training sample set (j) do
17. Compute k by performing forward and backward propagation
18. Update model using the Adam optimizer
19. end for
20. k ← k+1
21. end while
22. Predictions ← 0
23. for Each mini-batch in the testing samples set (i) do
︿
24. Compute y by performing forward and backward propagation using the trained
model
25. Predictions ←︿
y
26. end for
27. return predicted value, MSE

Algorithm 8 : Pseudocode for predicting cyber attack rates using BRNN-GRU model

43
Methodology

Current Deep Learning Approaches for Cyber Attack Rate

Prediction

As discussed in the literature review, deep learning prediction models are more
commonly used in finance, natural language processing and many more fields. But
when I was doing the literature review I found that there is a lack of research done for
cyber attack rate prediction using deep learning models. Not only for the attack rate
prediction, the categorization and other related cyber attack predictions are still at the
research level.

At present, there are only a few approaches available in deep neural network models to
analysing time series data. Recurrent neural networks are one example. In future, there
will be more network types available for time series analysis and will support much more
features exhibited in the time series data.
One of the major frameworks called BRNN-LSTM model (Fang et al., 2019) is a
prominent framework that uses deep learning models to accommodate statistical
features exhibited in the cyber attack time series. Also, it combines statistical features
with artificial intelligence theories to increase performance and its prediction accuracy.
The BRNN-LSTM framework is developed in Python and runs on the CUDA
platform(Fang, 2018). This BRNN-LSTM model will be used in this paper for
comparison purposes.

This model uses most of the latest techniques in deep neural networks to build the
prediction model. As there are not many approaches to work with time series data, the
only thing we can do so is using the same approach of BRNN-LSTM and tuning it with
different hyper parameters. The model development section includes the ways to
improve the current BRNN-LSTM framework and explains the use of other
hyperparameters to increase performance and accuracy.

44
Data acquisition

Most of the major frameworks we discussed in the literature review have used CAIDA
real time network telescope dataset(CAIDA, 2021), which is currently restricted access
for the public. Hence for the training and the evaluation of BRNN-GRU we use the
Kyoto University’s Honeypot dataset (http://www.takakura.com/Kyoto_data/). This
dataset contains time series data from 01-11-2006 to 31-12-2015.

This dataset consists of 24 main statistical features. 14 of them were extracted from the
raw traffic of the honeypot systems deployed in Kyoto university networks. These
conventional features are as follows.

1. Duration - The length of the connection is seconds

2. Service - Connection type such as http,ssh etc.
3. Source Bytes - The number of data bytes sent by the source IP address
4. Destination Bytes - The number of data bytes sent by the destination IP
address
5. Count - The number of connections whose source IP and destination ID
addresses are same for past 2 seconds
6. Same_srv_rate - Percentage of connections to the same service in count feature
7. Serror_rate - Percentage of connections that have “SYN” errors in count feature
8. Srv_serror_rate
9. Dst_host_count
10. Dst_host_srv_count
11. Dst_host_same_src_port_rate
12. Dst_host_serror_rate
13. Dst_host_srv_serror_rate
14. Flag - The state of the connection : example- REJ, S0, S1, OTH etc.(SONG et
al., 2021)

45
This data was collected from Nepenthes, a low interaction honeypot service deployed
on Kyoto University networks. More details about the features and the data about the
Kyoto dataset can be found from (SONG et al., 2021).

Data preprocessing

The dataset consists of daily network connection details in a .txt file per day from
01-11-2006 to 31-12-2015. Therefore this model does not need to convert the
connections to flows as the other frameworks did. The CAIDA telescope datasets come
as .pcap files and for the analysis, they need to convert to flows.

This dataset is already preprocessed and removed irrelevant details by the research
project done by (Song et al., 2011). We assume that they followed (Claffy et al., 1995)
methodologies to preprocess the honeypot data, as (Claffy et al., 1995) is considered as
a major reference for preprocessing honeypot time series data. According to (SONG et
al., 2021) they have used the Bro 2.4 tool (The Bro Project, 2016) to convert traffic data
to session data. Bro is an open source and powerful network analysis tool that can be
used for network security monitoring. It is capable of monitoring traffic in very high
performance environments(Cloud, 2019).

Before applying honeypot data preprocessing mechanisms mentioned in (Claffy et al.,

1995), the connections where the status does not end with FIN or RST flags and
connection lifetime more than 300s are removed from the individual dataset files (Zhan
et al., 2013).

After completing the preprocessing step, we had to create a separate file including
DateTime and the attack rates. We used a simple python function to archive that. The
final input file looks like the below example.

Before passing the input data to the model, the input file is further pre-processed to
remove null values from the dataset.

46
Date Time Attacks

13/01/2013 17:24:00 4216

Table 4: Sample data instance in the final input file

Selection of Technologies
We used Google Colaboratory, which is a free online cloud based Jupyter (Python)
notebook environment that provides features for creating and training machine learning
and deep learning models using CPUs, GPUs and TPUs. It supports Tensorflow and
many more python based machine learning libraries such as Keras. The documentation
and the support for Google Colaboratory, tensorflow and Keras are immense. Due to
the cloud based feature of the Google Colaboratory, the researchers do not need to use
their local computational power and they are able to use TPUs with high processing
powers for free.

Tensor processing Units - TPU

TPUs are hardware accelerators designed by Google to provide high computational
power for deep learning tasks. TPUs are more powerful than GPUs and CPUs which
are also used in deep learning and 3D rendering like high end tasks .

GPUs are organized around programmable cores, while TPUs use a classic vector
processor with a dedicated matrix multiply unit and excel the matrix according to the
relevant task. Hence TPU matix is an adaptable one.

When the user requests Cloud TPU v2 in Google Colaboratory, the user will get a virtual
machine that has a PCI attached TPU board. This TPU board has 4 dual-core TPU
chips. Each TPU core has a Vector Processing Unit and 128 x128 Matrix multiple Units
which can provide much more processing power than other hardware accelerators.

47
Tensorflow
Tensorflow is a very popular, end-to-end open source platform for Machine learning
tasks. Tensorflow contains most of the machine learning libraries, tools and community
support which are very helpful for machine learning researchers. Google Colaboratory
has its inbuilt tensorflow support. Tensorflow can ,

● Efficiently execute low level tensor operations on CPU, GPU or TPU

● Able to calculate the gradient of arbitrary differentiable expressions
● Scalable to many devices and parallelize them
● Export programs to external devices such as IOT, embedded systems and
servers(Team, 2021). Hence Tensorflow has cross platform support

Keras
Keras is a high level, python based, machine learning API working on top of the
Tensorflow-2 platform. It provides essential abstractions and building blocks for
developing machine learning and deep learning models and ships them with high
interaction velocities(Team, 2021).
Due to the abstractions and easy to use APIs, the researchers do not want to spend
much more attention and time to develop the layers from scratch. The researchers can
build their models in 3 ways using Keras on Tensorflow2. They can use sequential
models, functional APIs or model subclassing approaches.

Model Development
As discussed in the previous chapters, the model introduced in this paper - BRNN-GRU
framework for cyber attack prediction is built on Tensorflow using the Keras library.

For evaluation purposes, we have built the basic RNN-LSTM model and BRNN-LSTM
model as well. The original BRNN-LSTM which was introduced by (Fang, 2018) has
been developed using the Python NumPy library. The source code for Algorithm 6 is
available on Fang’s github repository although there is no source code available for
Algorithm 5.

48
(https://github.com/xingfang912/time-series-analysis/blob/master/time_series_rolling.
py). We made some assumptions when transforming the designed model from NumPy
to Keras. Also, some of the hyperparameters were hardcoded in the original codebase.
It might be because of tuning the models to support the CAIDA dataset efficiently. Apart
from the number of RNN layers, the other hardcoded hyperparameters mentioned in the
github repository is used to develop the model on Keras. The assumptions are
commented on the source code on the Google Colabrary project.

The RNN-LSTM model is a basic LSTM RNN sequential model which has an LSTM
layer and default hyperparameters.

The BTNN-LSTM is taken as the basement for the new prediction model presented in
this paper. Hence the new model consists of the following features.

● Contain sequential model including RNN and Bidirectional GRU layers

● Support major statistical features of the cyberattack rates. Example: Long Range
dependence and Extreme values
● Used adaptive optimizers for the layers - Adam Optimizer
● Use Rectified Linear Units(ReLu) as activation functions and backpropagation
techniques with GRU in order to minimize the vanishing gradient problems in
RNN networks
● The loss function is defined as Mean Squared Error
● The hyperparameters can be changed
● Mini batches of the data frames have been used to increase the efficiency of the
computation and the accuracy
● Swap the mini batches in different iterations. Thus the model trains in different
data variations and sequences.
● Runs on TPU. But this model is able to run on GPU and CPU as well.
● Use dropout layers for high regularizations of the model

The pseudo code for the algorithm is defined in the proposed Algorithm section.

49
The developed model can be found from the following URL.
https://colab.research.google.com/drive/1HAbhZOYi7ePRDE9pKZXODRLgN_M6VWrk?
usp=sharing

Model Evaluation
Model evaluation is done using the Mean Squared Error(MSE) calculated for the
trained model with validation dataset. The model with less MSE is considered the best
model.

These values can be changed according to the time series data streams we pass for the
model and the availability of the features. The dataset we use here - The Kyoto dataset
might not contain extreme values or LRD features. Hence the model can give different
predictions and accuracy levels for a different dataset.

Fang and his team had compared the evaluation of BRNN-LSTM models with Gray
model, FARIMA+GARCH models, Hidden Markov models and marked point process
model which are the currently available research for predicting cyber attack rates. They
concluded that the BRNN-LSTM framework has higher accuracy, performance and
lower MSE values when compared to the other models. By keeping that in mind, here
we use only the RNN-LSTM and BRNN-LSTM models for the comparison of the
BRNN-GRU model. The MSE values for these models are as follows.

Model MSE

Baseline (Tested with a trained dataframe) 0.428345914375

RNN-LSTM 0.40692530679349045

BRNN-LSTM 0.39681923587779341

BRNN-GRU 0.37891233569034562
Table 5 : Calculated MSE values for the BRNN-GRU network

50
According to the above outputs, the BRNN-GRU model can provide much more
accurate predictions than the other models. Moreover, the BRNN-LSTM with Keras
provides more accurate results than basic RNN-LSTM models.

In the BRNN-GRU model, we replaced most of the hyperparameters and LSTM layers
with GRU layers. The reason for the low MSE can be either hyperparameter tuning or
usage of GRUs. However, GRU is not very good for complex and large datasets with
high dependencies. Research found that a GRU can provide much more accurate
results with less computational power for small datasets. As we pass the large datasets
as small chunks of data frames to the BRNN-GRU model, GRU should perform better
than LSTM.

51
Conclusions
We proposed a BRNN-GRU framework for predicting cyberattack rates ahead of a
given time prior to the attack. This framework can accommodate complex phenomena
exhibited by cyber attack rate datasets, such as long-range dependence, highly
nonlinearity and extreme values. This Framework is designed to train with less
computational power, high accuracy, minimized loss and error values. This framework is
analysed against Kyoto university’s Honeypot dataset.

Moreover, the research attention for the cyber attack prediction using deep learning has
not got much more attention like other industrial fields. Fewer models available in deep
learning for time series analysis and less knowledge of statistical features available in
cyberattack time series make the researchers pay less attention to the cyber attack
predictions.

According to the literature review conducted, we have already proved that deep learning
models for cyber attack rate predictions are faster, use less computational power and
provide high prediction accuracy with minimal error values. Deep learning models can
skip the feature selection process in the machine learning models. The deep learning
models can learn and train by themselves. Hence it can identify the actual features of
the dataset, can learn with different sequences and minimise human errors. Apart from
that, with the currently available models in deep learning, only the RNN can provide
time series analysis which is suitable for cyber attack rate prediction. Therefore the only
enhancements we can do for the cyber attack prediction models is to keep RNN as the
base for the model and tune it with the best hyperparameter values. Also, we can
minimise the loss and optimize the model with different approaches like introducing mini
batches, backpropagation and directly working with RNN cells etc.

Although most of the researchers use RNN for the time series analysis, there are few
ongoing researchers for the time analysis using deep learning frameworks. One of the
approaches is timeGAN(Yoon et al., 2019) which uses Generative Adversarial Networks
for the implementation. Recently, most of the researchers pay much attention to

52
introducing time series predictions using Generative Adversarial Networks where two
models are trained simultaneously by an adversarial process. However, according to
(Alberto Carrillo Romero, 2018) findings on his time series analysis research, the best
performers were the Shallow LSTM with an accuracy of 74.16% and the GAN 72.68 %
and the Deep LSTM 62.85% followed by ARIMA 59.57%(Alberto Carrillo Romero,
2018).

Hybrid model development in deep learning for time series analysis is another type of
ongoing research. CNN-LSTM is one of the examples where Convolution Neural
Networks are combined with Recurrent Neural Networks.

Introducing the Attention mechanism for time series data analysis also conducted by
different researchers. Attention mechanism provides accurate predictions also in
presence of long input sequences according to (Del Pra, 2020) and (Rémy, 2021).

Facebook Prophet (TaylorLetham, 2017) is also significant research conducted recently,

and it is an automatic forecasting procedure of time series data based on an additive
model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus
holiday effects. It works best with time series that have strong seasonal effects and
several seasons of historical data. Prophet is robust to missing data and shifts in the
trend, and typically handles outliers well. Therefore, before using Facebook Prophet in
the cyberattack time series analysis, we have to confirm whether this data has strong
seasonal effects and seasonal trends. The statistical features we currently know about
the cyber attack time series data are, they exhibit long range dependencies, non
linearly, extreme values and do not follow a Poisson's distribution. Moreover, AR-Nets -
A simple auto-regressive Neural Network for time-series is also introduced by (J. Triebe
et al., 2019) which facilitates time series analysis inside the neural networks and will be
useful in cyber-attack predictions. Also, one of the ongoing research projects is Neural
Prophet (Triebe, 2020) which acts as a bridge between Time series and ML knowledge.
This framework is based on the Facebook Prophet framework and AR-Net. I assume in

53
the future, the researchers will use these new research findings to improve
cybersecurity predictions.

To conclude the paper, although many solutions have been introduced in the
cybersecurity prediction, detection and forecasting fields, still, there is no definite model
for predicting all of these with 100% accuracy. Also, the newly designed time series
prediction models should be tested with cyberattack rates time series as they comprise
different statistical features. As technology evolves, the complexity of cyberspace and
attackers’ patterns also evolve. Hence this research area will always emerge with new
findings and will be open as an important research topic.

54
References
Alberto Carrillo Romero, R. (2018) Generative Adversarial Network for Stock Market price Prediction.
[Online]. Available at: https://cs230.stanford.edu/projects_fall_2019/reports/26259829.pdf [Accessed: 12
March 2021].

Almotairi, S., Clark, A., Mohay, G. and Zimmermann, J. (2008) Characterization of Attackers' Activities in
Honeypot Traffic Using Principal Component Analysis. 2008 IFIP International Conference on Network
and Parallel Computing, p.147-154. IEEE [Online]. Available at: doi:10.1109/npc.2008.82 [Accessed: 23
January 2021].

Amidi, A. and Amidi, S. (2021) CS 230 - Recurrent Neural Networks Cheatsheet. Stanford.edu. [Online].
Available at:
https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks#overview
[Accessed: 28 March 2021].

Amini, A. and Soleimany, A. (2020) MIT Deep Learning 6.S191. MIT Deep Learning 6.S191. [Online].
Available at: http://introtodeeplearning.com/ [Accessed: 20 May 2021].

Bauwens, L. and Giot, P. (2000) The Logarithmic ACD Model: An Application to the Bid-Ask Quote
Process of Three NYSE Stocks. Annales d'Économie et de Statistique, (60), p.117. JSTOR [Online].
Available at: doi:10.2307/20076257 [Accessed:18 February 2021].

Bauwens, L., Giot, P., Grammig, J. and Veredas, D. (2004) A Comparison of Financial Duration Models
via Density Forecasts. INTERNATIONAL JOURNAL OF FORECASTING, 20 (4), p.589–609. [Online].
Available at: http://hdl.handle.net/1854/LU-8649410 [Accessed: 9 April 2021].

CAIDA(2021) Historical and Near-Real-Time UCSD Network Telescope Traffic Dataset. [Online].
Available at: https://www.caida.org/catalog/datasets/telescope-near-real-time_dataset/ [Accessed: 4 April
2021].

Choulakian, V. and Stephens, M. (2001) Goodness-of-Fit Tests for the Generalized Pareto Distribution.
Technometrics, 43 (4), p.478-484. [Accessed: 19 December 2020].

Claffy, K., Braun, H. and Polyzos, G. (1995) A Parameterizable Methodology for Internet Traffic Flow
Profiling. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATION, 13 (8), p.1481-1494.
[Accessed: 13 February 2021].

Cloud, A. (2019) How to Install Bro IDS on Ubuntu 16.04. Medium. [Online]. Available at:
https://alibaba-cloud.medium.com/how-to-install-bro-ids-on-ubuntu-16-04-ce81d759ce1c [Accessed: 22
November 2020].

Cryer, J. and Kung-Sik, C. (2008) Time Series Analysis With Applications in R. Springer Texts in
Statistics. [Accessed: 14 March 2021].

Del Pra, M. (2020) Time Series Forecasting with Deep Learning and Attention Mechanism. Medium.
[Online]. Available at:

55
https://towardsdatascience.com/time-series-forecasting-with-deep-learning-and-attention-mechanism-2d0
01fc871fc [Accessed:27 April 2021].

Doshi, S. (2019) Various Optimization Algorithms For Training Neural Network. Medium. [Online].
Available at: https://towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6
[Accessed: 25 January 2021].

Fang, X. (2018) xingfang912/time-series-analysis. GitHub. [Online]. Available at:

https://github.com/xingfang912/time-series-analysis/blob/master/time_series_rolling.py [Accessed: 7
March 2021].

Fang, X., Xu, M., Xu, S. and Zhao, P. (2019) A deep learning framework for predicting cyber attacks
rates. EURASIP Journal on Information Security, 2019 (1), p.1-11. Springer Science and Business Media
LLC [Online]. Available at: doi:10.1186/s13635-019-0090-6 [Accessed: 25 April 2021].

Görner, M. (2021) Keras and modern convnets, on TPUs | Google Codelabs. Google Codelabs. [Online].
Available at: https://codelabs.developers.google.com/codelabs/keras-flowers-tpu#0 [Accessed:15 May
2021].

HERRERO, Á., ZURUTUZA, U. and CORCHADO, E. (2012) A NEURAL-VISUALIZATION IDS FOR

HONEYNET DATA. International Journal of Neural Systems, 22 (02), p.1250005. World Scientific Pub Co
Pte Lt [Online]. Available at: doi:10.1142/s0129065712500050 [Accessed: 12 January 2021].

Huang, C., Han, J., Zhang, X. and Liu, J. (2019) Automatic Identification of Honeypot Server Using
Machine Learning Techniques. Security and Communication Networks, 2019, p.1-8. Hindawi Limited
[Online]. Available at: doi:10.1155/2019/2627608 [Accessed: 31 March 2021].

Hunter, S., Irwin, B. and Stalmans, E. (2013) Real-time distributed malicious traffic monitoring for
honeypots and network telescopes. 2013 Information Security for South Africa. IEEE [Online]. Available
at: doi:10.1109/issa.2013.6641050 [Accessed: 5 December 2020].

Husak, M., Komarkova, J., Bou-Harb, E. and Celeda, P. (2019) Survey of Attack Projection, Prediction,
and Forecasting in Cyber Security. IEEE Communications Surveys & Tutorials, 21 (1), p.640-660. Institute
of Electrical and Electronics Engineers (IEEE) [Online]. Available at: doi:10.1109/comst.2018.2871866
[Accessed: March 17 2021].

J. Shimeall, T. and M. Spring, J. (2014) Introduction to Information Security. Elsevier [Online]. Available
at: doi:10.1016/c2011-0-00135-7 [Accessed: 6 December 2020].

J. Triebe, O., Laptev, N. and Rajagopal, R. (2019) AR-Net: A simple Auto-Regressive Neural Network for
time-series. [Online]. Available at: https://arxiv.org/abs/1911.12436 [Accessed: 30 January 2021].

Justin, L. (2020) 3 Steps to Time Series Forecasting: LSTM with TensorFlow Keras - Just into Data. Just
into Data. [Online]. Available at:
https://www.justintodata.com/forecast-time-series-lstm-with-tensorflow-keras/ [Accessed: 1 May 2021].
Kotenko, I. and Chechulin, A. (2013) A Cyber Attack Modeling and Impact Assessment Framework. In:
5th International Conference on Cyber Conflict. Tallinn: NATO CCD COE Publications.. [Accessed: 21
January 2021].

56
Li, Y., Shi, L. and Feng, H. (2019) A Game-Theoretic Analysis for Distributed Honeypots. Future Internet,
11 (3), p.65. MDPI AG [Online]. Available at: doi:10.3390/fi11030065 [Accessed: 30 December 2020].

Loshchilov, I. and Hutter, F. (2016) CMA-ES For Hyper parameter Optimisation Of Deep Neural Networks.
ICLR 2016. [Accessed: 23 March 2021].

Luk, K. (2019) Downloading Datasets into Google Drive via Google Colab. Medium. [Online]. Available at:
https://towardsdatascience.com/downloading-datasets-into-google-drive-via-google-colab-bcb1b30b0166
[Accessed: 13 April 2021].

Lutscher, P., Weidmann, N., Roberts, M., Jonker, M., King, A. and Dainotti, A. (2019) At Home and
Abroad: The Use of Denial-of-service Attacks during Elections in Nondemocratic Regimes. Journal of
Conflict Resolution, 64 (2-3), p.373-401. SAGE Publications [Online]. Available at:
doi:10.1177/0022002719861676 [Accessed: 18 May 2021].

Meharchandani, D. (2021) 10 Major Cyber Attacks Witnessed Globally in Q1 2021 - Security Boulevard.
Security Boulevard. [Online]. Available at:
https://securityboulevard.com/2021/04/10-major-cyber-attacks-witnessed-globally-in-q1-2021/ [Accessed:
3 June 2021].

Milkovich, D. (2019) 15 Alarming Cyber Security Facts and Stats | Cybint. Cybint. [Online]. Available at:
https://www.cybintsolutions.com/cyber-security-facts-stats/ [Accessed: 4 March 2021].

Muncaster, P. (2020) Cyber-Attacks Up 37% Over Past Month as #COVID19 Bites. Infosecurity
Magazine. [Online]. Available at:
https://www.infosecurity-magazine.com/news/cyberattacks-up-37-over-past-month/ [Accessed: 22 May
2021]

Peng, C., Hu, T., Xu, S. and Xu, M. (2017) Modeling and predicting extreme cyber attack rates via marked
point processes. JOURNAL OF APPLIED STATISTICS, 44 (14), p.2534–2563. [Online]. Available at:
doi:http://dx.doi.org/10.1080/02664763.2016.1257590 [Accessed: 3 February 2021].

Reimers, N. and Gurevych, I. (2017) Optimal Hyperparameters for Deep LSTM-Networks for Sequence
Labeling Tasks. [Online]. Available at: https://arxiv.org/pdf/1707.06799.pdf [Accessed: 29 November
2020].

Rémy, P. (2021) philipperemy/keras-attention-mechanism. GitHub. [Online]. Available at:

https://github.com/philipperemy/keras-attention-mechanism [Accessed: 7 May 2021].

Safaei Pour, M., Mangino, A., Friday, K., Rathbun, M., Bou-Harb, E., Iqbal, F., Samtani, S., Crichigno, J.
and Ghani, N. (2020) On data-driven curation, learning, and analysis for inferring evolving
internet-of-Things (IoT) botnets in the wild. Computers & Security, 91, p.101707. Elsevier BV [Online].
Available at: doi:10.1016/j.cose.2019.101707 [Accessed: 16 February 2021].

Senanayaka, L. (2020) Survey on Cyber Attack Detection, Prediction and Forecasting methodologies.
[Online]. Available at:
https://docs.google.com/document/d/1j5aFTOgxu2fZC5IWyGlPDM53TUYxgS0Fm6PgKKwNGes/edit?usp
=sharing [Accessed: 30 May 2021].

57
Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D. and Nakao, K. (2011) Statistical Analysis of
Honeypot Data and Building of Kyoto 2006+ Dataset for NIDS Evaluation. BADGERS ’11 April 10-13,
2011, Salzburg., p.29-36. [Accessed: 20 May 2021].

SONG, J., Okabe, Y. and Takakura, H. (2021) Description of Kyoto University Benchmark Data.
[Accessed: 29 May 2021].

Taylor, S. and Letham, B. (2017) Forecasting at scale. PeerJ [Online]. Available at:
doi:10.7287/peerj.preprints.3190v2 [Accessed:23 February 2021].

Team, K. (2021) Keras documentation: About Keras. Keras.io. [Online]. Available at:
https://keras.io/about/ [Accessed: 16 April 2021].

Tensorflow, T. (2021) Introduction to the Keras Tuner | TensorFlow Core. TensorFlow. [Online]. Available
at: https://www.tensorflow.org/tutorials/keras/keras_tuner [Accessed: 6 May 2021].

TensorFlow, T. (2021) The Sequential model | TensorFlow Core. TensorFlow. [Online]. Available at:
https://www.tensorflow.org/guide/keras/sequential_model [Accessed: 27 April 2021].

TensorFlow, T. (2021) tf.keras.layers.GRU | TensorFlow Core v2.5.0. TensorFlow. [Online]. Available at:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU [Accessed: 29 April 2021].

The Bro Project, T. (2016) Bro Documentation Release 2.4.1. [Online]. Available at:
http://www.ncsa.illinois.edu/People/jazoff/bro-2.4.1.pdf [Accessed: 25 March 2021].

Torabi, S., Bou-Harb, E., Assi, C. and Debbabi, M. (2020) A Scalable Platform for Enabling the Forensic
Investigation of Exploited IoT Devices and Their Generated Unsolicited Activities. Forensic Science
International: Digital Investigation, 32, p.300922. Elsevier BV [Online]. Available at:
doi:10.1016/j.fsidi.2020.300922 [Accessed: 5 May 2021].

Triebe, O. (2020) Neural Prophet Extendable and Scalable Forecasting. In: 40th International Symposium
on Forecasting, 2020. [Online]. Available at:
https://github.com/ourownstory/neural_prophet/blob/master/notes/Presented_at_International_Symposiu
m_on_Forecasting.pdf [Accessed: 15 January 2021].

Vichaidis, N., Tsunoda, H. and Keeni, G. (2018) Analyzing darknet TCP traffic stability at different
timescales. 2018 International Conference on Information Networking (ICOIN). IEEE [Online]. Available
at: doi:10.1109/icoin.2018.8343098 [Accessed: 18 April 2021]

Yang, S., Yu, X. and Zhou, Y. (2020) LSTM and GRU Neural Network Performance Comparison Study:
Taking Yelp Review Dataset as an Example. 2020 International Workshop on Electronic Communication
and Artificial Intelligence (IWECAI). IEEE [Online]. Available at: doi:10.1109/iwecai50956.2020.00027
[Accessed: 3 January 2021].

Yoon, J., Jarrett, D. and van der Schaar, M. (2019) Time-series Generative Adversarial Networks. 33rd
Conference on Neural Information Processing Systems (NeurIPS 2019),. [Accessed: 14 November 2020].

58
Zhai, Y. (2005) TIME SERIES FORECASTING COMPETITION AMONG THREE SOPHISTICATED
PARADIGMS. [Accessed:6 February 2021].

Zhan, Z., Xu, M. and Xu, S. (2013) Characterizing Honeypot-Captured Cyber Attacks: Statistical
Framework and Case Study. IEEE Transactions on Information Forensics and Security, 8 (11),
p.1775-1789. Institute of Electrical and Electronics Engineers (IEEE) [Online]. Available at:
doi:10.1109/tifs.2013.2279800 [Accessed: 26 February 2021].

Zhan, Z., Xu, M. and Xu, S. (2015) Predicting Cyber Attack Rates With Extreme Values. IEEE
Transactions on Information Forensics and Security, 10 (8), p.1666-1677. Institute of Electrical and
Electronics Engineers (IEEE) [Online]. Available at: doi:10.1109/tifs.2015.2422261[Accessed: 28
February 2021].

Zhang, J. (2021) MA636: Introduction to stochastic processes. Kent.ac.uk. [Online]. Available at:
https://www.kent.ac.uk/smsas/personal/lb209/files/notes1.pdf [Accessed: 22 April 2021].

Word Count
Word count excluding quotations, references, titles for figures and tables,
acknowledgements and appendices is 11387.

59
Appendices

Appendix 1

Appendix 1: Prediction error analysis for the predictions fone by ARIMA and FARIMA models where p =
0.5 in Gray model

IEEE DL Survey Exact
No ratings yet
IEEE DL Survey Exact
82 pages
A Review On Application of Deep Learning in Cyber Forensics
No ratings yet
A Review On Application of Deep Learning in Cyber Forensics
4 pages
Research Proposal
100% (1)
Research Proposal
5 pages
Comparative Study of Various Open Source Cyber Security Tools
No ratings yet
Comparative Study of Various Open Source Cyber Security Tools
6 pages
DDo S13
No ratings yet
DDo S13
12 pages
Applications of Artificial Intelligence To Network
No ratings yet
Applications of Artificial Intelligence To Network
17 pages
Mitigating Phishing Threats
No ratings yet
Mitigating Phishing Threats
233 pages
The Significance of Machine Learning and Deep Learning Techniques in Cybersecurity - A Comprehensive Review
No ratings yet
The Significance of Machine Learning and Deep Learning Techniques in Cybersecurity - A Comprehensive Review
15 pages
Document From Aparnasoddy
No ratings yet
Document From Aparnasoddy
36 pages
02whole Thesis
No ratings yet
02whole Thesis
168 pages
Machine Learning in Cyber Defense
No ratings yet
Machine Learning in Cyber Defense
41 pages
A Survey On Machine Learning Techniques For Cyber Security in The Last Decade
No ratings yet
A Survey On Machine Learning Techniques For Cyber Security in The Last Decade
47 pages
Machine Learning in Cybersecurity
No ratings yet
Machine Learning in Cybersecurity
15 pages
31.artificial Intelligence For Cyber Security - Methods, Issues Sanjay Misra, Amit Kumar Tyagi - 1st Ed - 2021 Springer
No ratings yet
31.artificial Intelligence For Cyber Security - Methods, Issues Sanjay Misra, Amit Kumar Tyagi - 1st Ed - 2021 Springer
467 pages
Avinash
No ratings yet
Avinash
28 pages
Sample Copy OF REPORT
No ratings yet
Sample Copy OF REPORT
26 pages
Uel CN 7014 Reading Material Week 2 Summarizer
No ratings yet
Uel CN 7014 Reading Material Week 2 Summarizer
1 page
A Review of Cyber Security and Its Approaches With Recent Progress and Challenges
No ratings yet
A Review of Cyber Security and Its Approaches With Recent Progress and Challenges
9 pages
Wa0002.
No ratings yet
Wa0002.
15 pages
Dissertation Topics On Network Security
100% (2)
Dissertation Topics On Network Security
5 pages
Analysis Report On Attacks and Defence Modeling Ap
No ratings yet
Analysis Report On Attacks and Defence Modeling Ap
10 pages
Internship Report 1
No ratings yet
Internship Report 1
16 pages
PCX - Report
No ratings yet
PCX - Report
11 pages
Ig 19 022
No ratings yet
Ig 19 022
16 pages
Faculty of Science and Engineering Department of Mathematics and Computer Science
No ratings yet
Faculty of Science and Engineering Department of Mathematics and Computer Science
10 pages
Chapter Report
No ratings yet
Chapter Report
44 pages
Mathematics 11 03448
No ratings yet
Mathematics 11 03448
18 pages
Restrict Access To Thesis
100% (3)
Restrict Access To Thesis
5 pages
Machine Learning Approaches For Detecting and Mitigating Cybersecurity Threats
No ratings yet
Machine Learning Approaches For Detecting and Mitigating Cybersecurity Threats
52 pages
Supervised Machine Learning Model Classification of Network Traffic Cyberattacks 1
No ratings yet
Supervised Machine Learning Model Classification of Network Traffic Cyberattacks 1
16 pages
Inside The Enemy's Computer: Identifying Cyber Attackers 1st Edition Clement Guitton PDF Download
100% (2)
Inside The Enemy's Computer: Identifying Cyber Attackers 1st Edition Clement Guitton PDF Download
59 pages
A Survey of Deep Learning Models, Datasets, and Applications For Cyber Attack Detection
No ratings yet
A Survey of Deep Learning Models, Datasets, and Applications For Cyber Attack Detection
16 pages
Dissertation Topics in Intelligence and Security
100% (2)
Dissertation Topics in Intelligence and Security
7 pages
Cyber Security and Its Importance
No ratings yet
Cyber Security and Its Importance
5 pages
Preprints202405 0415 v1
No ratings yet
Preprints202405 0415 v1
11 pages
Security Threatts
No ratings yet
Security Threatts
18 pages
Term Paper - MIS 480N - 580N
No ratings yet
Term Paper - MIS 480N - 580N
5 pages
Analysis of Continual Learning Models For Intrusio
No ratings yet
Analysis of Continual Learning Models For Intrusio
22 pages
CyberHub Learning Center (Order #7032211 PG 6)
No ratings yet
CyberHub Learning Center (Order #7032211 PG 6)
8 pages
Bhagath Technical Seminar Report 2
No ratings yet
Bhagath Technical Seminar Report 2
46 pages
Cyber Security Meets Artificial in
No ratings yet
Cyber Security Meets Artificial in
13 pages
Dharma Report
No ratings yet
Dharma Report
10 pages
For Final Review of The Statistician Adviser and English Critic
No ratings yet
For Final Review of The Statistician Adviser and English Critic
44 pages
Zero-Day Attack Paper2
No ratings yet
Zero-Day Attack Paper2
25 pages
Artificial Intelligence Enabled Data Security and Integrity in Cloud Computing
No ratings yet
Artificial Intelligence Enabled Data Security and Integrity in Cloud Computing
146 pages
AR in Network Management
100% (1)
AR in Network Management
98 pages
When Llms Meet Cybersecurity: A Systematic Literature Review
No ratings yet
When Llms Meet Cybersecurity: A Systematic Literature Review
41 pages
Hasnain R3
No ratings yet
Hasnain R3
15 pages
A Seminar Report On
No ratings yet
A Seminar Report On
17 pages
Employing Model-Eliciting Activities in Cybersecurity Education
No ratings yet
Employing Model-Eliciting Activities in Cybersecurity Education
10 pages
Cyber Security Research
No ratings yet
Cyber Security Research
9 pages
Internship Report 11
No ratings yet
Internship Report 11
23 pages
Literature Iee 4 A Comprehensive Overview of Large Language
No ratings yet
Literature Iee 4 A Comprehensive Overview of Large Language
20 pages
Sanjose
No ratings yet
Sanjose
3 pages
Research Final
No ratings yet
Research Final
48 pages
Security Attacks and Defenses in Cyber Systems From An AI Perspective
No ratings yet
Security Attacks and Defenses in Cyber Systems From An AI Perspective
141 pages
2020 The Journalof Defense Modelingand Simulation
No ratings yet
2020 The Journalof Defense Modelingand Simulation
51 pages
Digital Imaging Lab for Students
No ratings yet
Digital Imaging Lab for Students
1 page
Levi Civita3 PDF
No ratings yet
Levi Civita3 PDF
8 pages
K-Nearest Neighbor (KNN)
No ratings yet
K-Nearest Neighbor (KNN)
27 pages
Air Compressibility Factor Calculation - Sample Problem - EnggCyclopedia
No ratings yet
Air Compressibility Factor Calculation - Sample Problem - EnggCyclopedia
3 pages
PCA for Students and Educators
No ratings yet
PCA for Students and Educators
12 pages
Enhanced Fuzzy Radial Basis Neural Network With Genetic Process Using Fuzzy Inferences
No ratings yet
Enhanced Fuzzy Radial Basis Neural Network With Genetic Process Using Fuzzy Inferences
14 pages
Report Convmixer and Ulite
No ratings yet
Report Convmixer and Ulite
5 pages
MR15
No ratings yet
MR15
9 pages
Digital Design of FIR LPF Filter
No ratings yet
Digital Design of FIR LPF Filter
12 pages
Evaluating Ascon Hardware On 7-Series FPGA Devices
No ratings yet
Evaluating Ascon Hardware On 7-Series FPGA Devices
14 pages
Literature Review On Linear Regression
100% (2)
Literature Review On Linear Regression
6 pages
HB NLvib Presentation
No ratings yet
HB NLvib Presentation
74 pages
Final Answers
No ratings yet
Final Answers
3 pages
IT - Sem VI - DMBI - Sample Questions
No ratings yet
IT - Sem VI - DMBI - Sample Questions
10 pages
Khamene 2000
No ratings yet
Khamene 2000
10 pages
Uber Price Prediction
No ratings yet
Uber Price Prediction
6 pages
CB - X - Math - CH 2 - Polynomials - Case Study Questions - QP
100% (1)
CB - X - Math - CH 2 - Polynomials - Case Study Questions - QP
3 pages
Adversarial Attacks on LLMs
No ratings yet
Adversarial Attacks on LLMs
31 pages
A Continuously Varying Physical Quantity by A Sequence of Discrete Numerical Values
No ratings yet
A Continuously Varying Physical Quantity by A Sequence of Discrete Numerical Values
24 pages
CS 106: Artificial Intelligence: Informed Search
No ratings yet
CS 106: Artificial Intelligence: Informed Search
48 pages
SHARP Calculator Guide for DSC1630
No ratings yet
SHARP Calculator Guide for DSC1630
41 pages
Object Detection and Recognition: Final Project Title
No ratings yet
Object Detection and Recognition: Final Project Title
6 pages
Image Processing Assignment
50% (2)
Image Processing Assignment
10 pages
Linear Optimization - Max
No ratings yet
Linear Optimization - Max
186 pages
WINSEM2024-25 BCSE204L TH VL2024250501510 2024-12-18 Reference-Material-I
No ratings yet
WINSEM2024-25 BCSE204L TH VL2024250501510 2024-12-18 Reference-Material-I
22 pages
Chatelin F. - Eigenvalues of Matrices - Revised Edition PDF
100% (1)
Chatelin F. - Eigenvalues of Matrices - Revised Edition PDF
429 pages
Class 12 Maths Notes Chapter 3 Studyguide360
No ratings yet
Class 12 Maths Notes Chapter 3 Studyguide360
23 pages
Calculus: Predicting Water Trends
No ratings yet
Calculus: Predicting Water Trends
4 pages
Machine Learning for Network Traffic Analysis
No ratings yet
Machine Learning for Network Traffic Analysis
5 pages
Mathematical Foundations of Deep Learning
No ratings yet
Mathematical Foundations of Deep Learning
174 pages