0% found this document useful (0 votes)
126 views5 pages

Dos Attack Detection Using Machine Learning and Neural Network

This document discusses detecting denial of service (DoS) attacks using machine learning and neural network algorithms. Specifically, it aims to detect application layer DoS attacks from the CIC IDS 2017 dataset using random forest (RF) and multi-layer perceptron (MLP) classifiers. The paper experiments with different partitions of the dataset to determine the best performing algorithms and analyzes the results, finding RF provides better accuracy than MLP for this task.

Uploaded by

Duttasaieswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views5 pages

Dos Attack Detection Using Machine Learning and Neural Network

This document discusses detecting denial of service (DoS) attacks using machine learning and neural network algorithms. Specifically, it aims to detect application layer DoS attacks from the CIC IDS 2017 dataset using random forest (RF) and multi-layer perceptron (MLP) classifiers. The paper experiments with different partitions of the dataset to determine the best performing algorithms and analyzes the results, finding RF provides better accuracy than MLP for this task.

Uploaded by

Duttasaieswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DoS Attack Detection using Machine Learning and

Neural Network
Shreekhand Wankhede Deepak Kshirsagar
Department of Computer Engineering and IT Department of Computer Engineering and IT
College of Engineering Pune, India College of Engineering Pune, India
Email: wankhedesm16.is@coep.ac.in Email: ddk.comp@coep.ac.in

Abstract—The current digital world is using the internet the targeted system and due to which server will not have any
almost everywhere. The usage of internet has been increasing, free resources that ultimately leads to the system unresponsive
however, threats are also increasing in numbers. One such threat to normal traffic. HTTP Flood is DoS attack which has the
is DoS attack which uses reasonable service requests to gain
excessive computing and network resources and results in an capacity to make changes in HTTP protocol packets and sends
inability to access them by legitimate users. The DoS attack the unnecessary requests to the web server with the aim of
can happen at different layers of OSI model such as network, making remote server unresponsive.
transport and application layers. The aim of this paper is to Therefore, the detection of Dos attack has been attracted
detect DoS attack effectively using Machine learning (ML) and researchers since past. Many detection approaches have been
Neural Network (NN) algorithms. The detection is specifically
focused on application layer DoS attack detection rather than proposed to detect the DoS Attacks. In this paper, DoS/DDoS
at transport and network DoS attack detection. The latest DoS attack dataset is used for the experiment, and Machine learning
attack dataset CIC IDS 2017 dataset is used in the experiment. algorithms such as Random Forest (RF) and Multi-Layer
The experimentation has divided the dataset into different splits Perceptron (MLP) are used to detect it. We have done the
and the best split is found for each algorithm i.e. RF and MLP. experiment on different partitions of the dataset and analyzed
Results of RF and MLP are compared and it is shown that RF
provides better results than MLP. the results which have shown different and interesting results.
Keywords -Denial of Service (DoS);Machine learn- This paper deals with effective detection of application layer
ing(ML);Neural Network(NN);Intrusion Detection System DoS attacks based on machine learning and neural network
(IDS) algorithms.The main objective of this paper is to use RF and
MLP algorithms to classify the CIC IDS 2017 dataset into
I. I NTRODUCTION binary classification.This work makes the following contribu-
There is a huge advancement in the current digital world tions.
in networking over the past decade. Along with advancements 1. The approach provides efficient classification DoS attacks
threats to the network are also increased. Nowadays the biggest with the help of CIC IDS 2017 dataset.
danger to the internet world is the Denial of Service (DoS) 2. The proposed approach reduces number of features and
attack. DoS originates from the network layer to application provides higher accuracy as compared to current state-of-the-
layer of the attacker system. The VeriSign distributed denial of art systems.
service trends report [19] in 2016 shows that continue increase 3. The experimentation shows the selection of training dataset
in DoS attack in the form of size, complexity, and frequency. and its accuracy as well as the performance of RF and MLP
DoS lead to the resources inaccessibility or may extend to algorithm in terms of accuracy.
causing failure at the targeted system. Therefore, there is
always a need to develop Intrusion Detection System (IDS) to II. CIC IDS 2017 DATASET
tackle the problems of DoS and preserving the confidentiality CIC IDS 2017 dataset [15] consists of labeled data of
and integrity. There are many types of DoS attacks; most current attacks such as DoS, DDoS along with full packet in-
common [13] are ICMP, UDP, SYN, and HTTP flood. formation in pcap format. This dataset is provided by Canadian
A UDP flood is the DoS attack that overwhelms a target Institute for Cybersecurity for the researchers. This flow-based
with User Datagram Protocol (UDP) packets. The attacker dataset includes the results of the analysis of network traffic
aim is to flood ports randomly on a remote host which make using CICFlowMeter based on the Source IP, destination IP,
the host scan port repeatedly for the application which can and time stamp.
ultimately lead to inaccessibility. Similar in nature of the UDP This dataset is generated the abstract behavior of twenty-
flood attack, an ICMP flood, flood out the system resource with five users based on various application layer protocols such as
the ICMP Echo-Request packets without waiting for replies HTTP, FTP, HTTPS and SSH using different available tools
which utilize both outgoing and incoming bandwidth, results for the period of 5 days. Initially, they have captured normal
into a significant slowdown in an overall system.SYN flood network traffic on Monday and labeled Benign. From Tuesday
attack has the capacity to send tremendous SYN requests to to Friday they captured intrusion traffic for various types of

978-1-5386-5257-2/18/$31.00 2018 IEEE


attacks such as DoS/DDoS, Brute Force, and Web attacks. In is tested with the help of Defense Advanced Research Project
this paper, we are mainly focusing on DoS/DDoS which is Agency (DARPA) dataset. It is observed that SVM takes more
generated by Hulk, GoldenEye and Slow HTTP test tools on time than RBFNN in the testing phase for new unidentified
Wednesday. This dataset mainly consists of DoS/DDoS attacks patterns. In the case of accuracy considered the main factor
such as Heart Bleed, Slowloris, Slow HTTP and HTTP flood and some misclassifications are accepted, SVM is suggested,
attacks. Table I consists of 84 features which are present in on the other side, if the classification time is chosen as a crucial
the dataset. factor RBF is suggested to use.
Attribute Name Attribute Name S. Seufert and D. O’Brien [2] proposed the system in which
FlowID BwdPackets/s they have used Neural Network to detect Distributed Denial
SourceIP MinPacketLength of Service Attacks. They have filtered the data at different
SourcePort MaxPacketLength layers and discovered that after using Neural Network (NN)
DestinationIP PacketLengthMean approach, it is possible to achieve low response time by the
DestinationPort PacketLengthStd server after the attack has happened.
Protocol PacketLengthVariance T. Subbulakshmi et al. [3] performed the detection of DoS
Timestamp FINFlagCount attack using Modified Support Vector Machines. Supervised
FlowDuration SYNFlagCount learning algorithm EMCSVM has experimented with the help
TotalFwdPackets RSTFlagCount of kernel functions such as linear, radial basis and polynomial.
TotalBackwardPackets PSHFlagCount The proposed system used their own dataset for the experi-
TotalLengthofFwdPackets ACKFlagCount mentation of DoS attack at network,transport and application
TotalLengthofBwdPackets URGFlagCount layer. EMCSVM - radial basis kernel outputs the better clas-
FwdPacketLengthMax CWEFlagCount sification. The different kernel functions and parameter values
FwdPacketLengthMin ECEFlagCount are considered to calculate the performance of the EMCSVM.
FwdPacketLengthMean Down/UpRatio S. Umarani, D. Sharmila [4] uses the access matrix concept
FwdPacketLengthStd AveragePacketSize from HTTP traces. The system can classify the traffic flow
BwdPacketLengthMax AvgFwdSegmentSize packets as either normal or DDoS attack. The Naı̈ve Bayes and
BwdPacketLengthMin AvgBwdSegmentSize K-Nearest neighborhood classifiers are used in the research.It
BwdPacketLengthMean FwdHeaderLength is observed that attribute selection done by PCA leads to
BwdPacketLengthStd FwdAvgBytes/Bulk average detection rate and average False Positive Rate (FPR)
FlowBytes/s FwdAvgPackets/Bulk improvement 0.9% and 4.11% respectively.
FlowPackets/s FwdAvgBulkRate Z. Ta et al. in [5] Proposed the system for DoS attack
FlowIATMean BwdAvgBytes/Bulk detection at network and transport layer using computer vision
FlowIATStd BwdAvgPackets/Bulk technique. The system uses multivariate correlation analysis to
FlowIATMax BwdAvgBulkRate describe network traffic records into respective images. The
FlowIATMin SubflowFwdPackets images are considered as observed objects for detecting DoS
FwdIATTotal SubflowFwdBytes attack, which is constructed based on the measure called Earth
FwdIATMean SubflowBwdPackets Movere’s Distance (EMD).The proposed detection System is
FwdIATStd SubflowBwdBytes carried with help of Ten-Fold cross-validations on ISCX 2012
FwdIATMax InitWinbytesforward IDS dataset and KDD cup 99 dataset. The experimentation
FwdIATMin InitWinbytesbackward shows the detection rates for KDD cup99 and ISCX 2012
BwdIATTotal actdatapktfwd IDS datasets are 99.95% and 90.12% respectively.
BwdIATMean minsegsizeforward E. Nosrati et al. [6] have focused on the detection of
BwdIATStd ActiveMean DoS attacks in Internet Multimedia Subsystem. The activity
BwdIATMax ActiveStd like devices perform to authenticate and inform its location
BwdIATMin ActiveMax within the network to a control entity, for Session Initiation
FwdPSHFlags ActiveMin Protocol (SIP). The paper proposed variant of Cumulative
BwdPSHFlags IdleMean Sum - CUSUM called adaptive z-score CUSUM; this adaptive
FwdURGFlags IdleStd variant has the feature to adapt itself to changes which could
BwdURGFlags IdleMax occur in the network traffic. The approach resulted in a low
BwdHeaderLength IdleMin detection time, and a False Rate, which could be adapted to
FwdPackets/s Label in different application cases.
B. Cui-Mei [7] proposed architecture for DDoS attack at
III. L ITERATURE R EVIEW network and transport layer.The data collected with the help of
G. Tsang et al. [1] used Support Vector Machine (SVM) Simple Network Based Protocol (SNMP), instead of using raw
and Radial Basis Function Neural Network (RBFNN) for the network packets. Traffic classification is performed to detect
detection of UDP flood. In this work, randomly selected 50% whether an attack is happened or not along with the specific
dataset is used for training and rest is for testing.The system attack type via an ensemble of SVM. This proposed approach

2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA)
gives a detection rate 99.27% with 1.9% false positives rate is performed randomly. Each group having the number of
and 0.73% false negative rate. trees serves as a single forest and for new unlabeled data,
M. Alkasassbe et al. [8] proposed the system which uses each forest is a prediction class [9].The random forest can be
DM techniques for the detection of HTTP,UDP flood. The used not only for regression tasks but also for classification
proposed system simulated by network simulator (NS2), due tasks. The classifiers generate many classification trees, and
to its high confidence capability of producing valid results. every tree is designed by making use of different parts of
The ML algorithms such as Random Forest,MLP, and Naı̈ve the general dataset. The separate object is created for every
Bayes were applied to the dataset to classify the DoS types specific tree to vote for decision after each tree classification
of attacks namely: HTTP-Flood, Smurf, UDP-Flood, and SID- in an unlabeled class; the forest with maximum votes is
DOS. The experimentation shows that MLP classifier achieved declared as a winner.
the highest accuracy rate. Random forest algorithm is as follows
D. Kshirsagar et al. [16] proposed signature or pattern
based intrusion detection system for network and transport 1. Divide dataset into x samples, such that (x = total count
layer DoS attacks such as TCP SYN,UDP flood and ICMP of trees).
flood.The proposed system [17] detects of TCP SYN flood 2. Each dataset sample has to be classified for each Entry.
attack based on patterns and reduces the load on CPU. The 3. Predict unlabeled class based on the number of predic-
proposed DIDS [14] also uses signature based approach for tions.
the detection of DoS attacks such as TCP and UDP flood in Random Forest algorithm need not require tree pruning
distributed environment.The system uses server and client IDS and in training phase, it is resistant to overfit problem.
which extends the scalability of IDS Hence, Random Forest is considered as the most suitable
machine learning algorithm for classification tasks in various
IV. E XPERIMENTATION applications.
The following section mainly consists of a description of the
proposed method and algorithms used for DoS attack detection Error rate and accuracy rate of Random Forest (RF) ma-
on CIC IDS 2017 dataset. chine learning algorithm classifiers are calculated by breaking
the dataset into training and testing sets. Accuracy rate is
A. Proposed Method calculated with the help of correctly classified labels versus
Following steps are followed on the part of the proposed incorrectly classified labels. One more technique of finding
method for classification of DoS attacks. the error rate which occurs in the training phase is Out of bag
Step 1: The system captures an input as CIC IDS 2017 (OOB).The count of trees and descriptors parameters should
Wednesday dataset with all attributes. be adjusted carefully to attain the improvement in accuracy
Step 2: Popular Machine learning tool Weka is used for rate with less false detection rate. The research available in
simulation purpose. [11] proved this that 500 trees are perfect and increasing count
Step 3: The proposed system uses machine learning algorithms of trees the system doesn’t improve accuracy rate and makes
for classification of traffic into Benign and DoS attack. poor training time and large use of resources.
Step 4: Preprocessing phase consist train the system with the D. Neural Network Algorithm-Multi-Layer Perceptron (MLP)
certain percent of data.
One or more hidden layers are available in Multilayer
Step 5: Finally, machine learning and Neural Network
perceptrons (MLPs) neural network i.e. hidden layers of
classifiers such as RF and MLP are simulated to classify the
processing elements which are not connected directly to the
dataset into Benign and DoS attack.
output links.

B. Weka
Waikato Environment for Knowledge Analysis (Weka) is
free and an open source tool for machine learning and data-
mining algorithms. Weka has a vast collection of visualization
tools and ML algorithms for exploring data and predictive
modeling along with GUI(Graphical User Interface). The data
files supported by Weka are CSV(comma separated file and
arff(Attribute File Format) file.For the simulation purpose, the
input is taken in the form of .arff file format with an all
attributes present in the dataset.
Figure 1. One hidden layer Multilayer Perceptron [18]
C. Machine Learning Algorithm-Random Forest (RF)
It uses tree classifiers to foretell new unlabeled data on Figure 1 shows a single hidden layer MLP with p number
the basis of the number of trees. The selection of features of inputs, q number of hidden PEs and r number of outputs

2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA)
Table I Table II
ACCURACY OF MLP ON CIC IDS 2017 DATASET ACCURACY OF RF ON CIC IDS 2017 DATASET

Sr.No # Training Records # Records Tested Accuracy Sr.No # Training Records # Records Tested Accuracy
1 40959(20%) 77744 98.3691% 1 40959(20%) 77744 99.9194%
2 61439(30%) 67979 98.8783% 2 61439(30%) 67979 99.9268%
3 81918(40%) 58181 98.5099% 3 81918(40%) 58181 99.9308%
4 102398(50%) 48535 98.8760% 4 102398(50%) 48535 99.9502%
5 122878(60%) 38844 98.8818% 5 122878(60%) 38844 99.9475%
6 143357(70%) 29152 98.8956% 6 143357(70%) 29152 99.9512%
7 163837(80%) 19429 98.8956% 7 163837(80%) 19429 99.9563%

(MLP(p-q-r)).In terms of discriminant functions, it is possible


to find the extra processing power that a layer of nonlinear PEs
takes[18].The non-linear activation function brings the power As shown in table II, experimentation observation is that
to multilayer perceptron. Most of the non-linear functions are with 30% learning dataset the accuracy is the best. It is
helpful for this purpose excluding polynomial functions. also observed that the accuracy is increasing over number of
packets used in training phase.
V. R ESULT A NALYSIS AND D ISCUSSION
B. Result Analysis for Random Forest Algorithm
In this work, we have applied machine learning algorithm
RF as well as Neural Network algorithm MLP to detect DoS We have performed experimentation using 20% to 80%
attacks. The proposed system is used to classify the packet is training records out of 204796 records from the dataset.
benign or DoS attack. The following tables 2 and 3 shows the As shown in table 3, the complete implementation is done
accuracy by using different behavior in training dataset for the on 204796 number of packets and for Random Forest machine
RF algorithm using 80 features. Table II shows, how the MLP learning algorithm the detection rate is increasing over a
works on the same dataset. The dataset had multiple classes learning from more number of records in training. Initially
of DoS attacks but, we have transformed it into a binary for 50959 number of packets detection rate is 99.9194% and
classification i.e. DoS or benign packet. The Accuracy [12] of for 163837 number of packets, the detection rate is 99.9563%.
proposed system is calculated using Accuracy=(X+O)/(M+N).
C. Discussion
Where, As we have chosen CIC IDS 2017 Wednesday dataset,
X =True Positive, which consists of only DoS attacks packets and Normal
O = True Negative, packets. The experiment is carried using two techniques i.e.
M = Condition Positive=X+O, Random Forest and Multilayer Perceptron algorithms. To find
N = Condition Negative=U+V, the best split of training and test phases the system is tested
U = False Positive and using different learning and testing splits i.e. a number of
V = False Negative records. It has been observed that higher number of learning
packets leads to higher detection rate.The experimental results
X = True positive is activated when the positive label shows, the MLP provides higher accuracy 98.87% with 30%
record is classified as a positive record. training records.Whereas the RF algorithm provides higher
O = True Negative is activated when the negative label record accuracy 99.95% with 50% training.The proposed system
is classified as a negative record. shows, the RF algorithm provides higher accuracy than the
M=Condition Positive is the term used for the total number MLP for effective detection of DoS attacks at application layer.
of True Negative (X) and True Positive (O).
U=False Positive is the term which is triggered when the VI. C ONCLUSION
Positive labeled record is classified as a negative record. This paper has proposed and implemented Machine Learn-
V=False Negative is referred when a negative labeled record ing and Neural Network algorithms such as Random Forest
is classified as a positive label record. and MLP respectively for the detection of DoS attack.These
algorithms efficiently detected application layer DoS attacks.
The results show that Random Forest algorithm provides
A. Result Analysis for MLP higher accuracy than MLP algorithm.
At the beginning of the experiment, we have used 20% However, this detection system classifies CIC IDS 2017
dataset for training i.e. 40959 numbers of packets subsequently dataset into Benign and DoS attack.The proposed system
81918,102398, 122878, 143357, 163837 total numbers of does not multi-classifies into attacks such as Hearbleed,
packets. There is no big difference in detection rate if we slowhttptest, slowloris and http flood.The further task is to
compare the 30% learning and 70% learning results as shown reduce the number of features and test the system for multi-
in the Table II. classification of DoS attacks.

2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA)
R EFERENCES
[1] G.C. Tsang, P.P. Chan, D.S. Yeung, and E. CC Tsang, ”Denial of service
detection by support vector machines and radial-basis function neural
network,” In Machine Learning and Cybernetics, 2004. Proceedings of
2004 International Conference on, vol. 7, pp. 4263-4268. IEEE, 2004.
[2] S. Seufert, and D. O’Brien, ”Machine learning for automatic defence
against distributed denial of service attacks,” In Communications, 2007.
ICC’07. IEEE International Conference on, pp. 1217-1222. IEEE, 2007.
[3] S. Umarani, and D. Sharmila, ”Predicting application layer DDoS attacks
using machine learning algorithms,” International Journal of Computer,
control Quantum and information Engineering 8, no. 10, 2014.
[4] T. Subbulakshmi, K. BalaKrishnan, S. Mercy Shalinie, D. AnandKumar,
V. GanapathiSubramanian, and K. Kannathal, ”Detection of DDoS attacks
using Enhanced Support Vector Machines with real time generated
dataset,” In Advanced Computing (ICoAC), 2011 Third International
Conference on, pp. 17-22. IEEE, 2011.
[5] Z. Tan, A. Jamdagni, X. He, P. Nanda, R.P. Liu, and J. Hu, ”Detection
of denial-of-service attacks based on computer vision techniques,” IEEE
transactions on computers 64, no. 9 (2015): 2519-2533.
[6] E. Nosrati, A.S. Kashi, Y. Darabian, and S.N.H. Tonekaboni, ”Register
flooding attacks detection in IP multimedia subsystems by using adaptive
z-score CUSUM algorithm,” In Information Technology and Multimedia
(ICIM), 2011 International Conference on, pp. 1-4. IEEE, 2011.
[7] C.M. Bao, ”Intrusion detection based on one-class svm and snmp mib
data,” In Information Assurance and Security, 2009. IAS’09. Fifth Inter-
national Conference on, vol. 2, pp. 346-349. IEEE, 2009.
[8] M. Alkasassbeh, G. Al-Naymat, A.B. Hassanat, and M. Almseidin, ”De-
tecting distributed denial of service attacks using data mining techniques,”
International Journal of Advanced Computer Science and Applications 7,
no. 1, 2016.
[9] S. Apale,R. Kamble, M. Ghodekar, H. Nemade, and R. Waghmode,
”Defense mechanism for DDoS attack through machine learning,” In-
ternational Journal of Research in Engineering and Technology 3, no. 10
: 291-294, 2014.
[10] A. Araar, and R. Bouslama, ”A comparative study of classification
models for detection in IP networks intrusions,” Journal of Theoretical
Applied Information Technology 64, no. 1, 2014.
[11] M. A. M. Hasan, M. Nasser, B. Pal, and S. Ahmad, ”Support vector
machine and random forest modeling for intrusion detection system
(IDS).”,Journal of Intelligent Learning Systems and Applications 6, no.
01, 2014.
[12] D.M. Powers, ”Evaluation: from precision, recall and F-measure to ROC,
informedness, markedness and correlation,” 2011.
[13] M. Khandelwal, D.K. Gupta, and P. Bhale, ”DoS attack detection
technique using back propagation neural network,” In Advances in Com-
puting, Communications and Informatics (ICACCI), 2016 International
Conference on, pp. 1064-1068. IEEE, 2016.
[14] D. Kshirsagar, S. Sawant, R. Wadje, and P. Gayal, ”Distributed intrusion
detection system for TCP flood attack,” In Proceeding of International
Conference on Intelligent Communication, Control and Devices, pp. 951-
958. Springer, Singapore, 2017.
[15] I. Sharafaldin, A.H. Lashkari, and A.A. Ghorbani, ”Toward generating
a new intrusion detection dataset and intrusion traffic characterization,”
In Information Systems Security and Privacy. ICISSP, 2018.
[16] D.D. Kshirsagar, S.S. Sale, D.K. Tagad, and G. Khandagale, ”Network
Intrusion Detection based on attack pattern,” In Electronics Computer
Technology (ICECT), 2011 3rd International Conference on, vol. 5, pp.
283-286. IEEE, 2011.
[17] D. Kshirsagar, S. Sawant, A. Rathod, and S. Wathore, ”CPU load
analysis minimization for TCP SYN flood detection,” Procedia Computer
Science 85 : 626-633, 2016.
[18] J.C. Principe, N.R. Euliano, and W.C. Lefebvre, ”Neural and adaptive
systems: fundamentals through simulations,” Vol. 672. New York: Wiley,
2000.
[19] ”Distributed Denial of Service Trends Report”,2017,[online].
Available:https://www.verisign.com/enIN/security-
services/ddosprotection/ddosreport/index.xhtml [Accessed 01-Jan-2018].

2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA)

You might also like