Mushkan Report
Mushkan Report
Traffic
Using Machine Learning
Malware Detection in Network traffic using Machine
Learning
Submitted by
MUSKAN YADAV
Supervisor
Professor
I would like to express my sincere gratitude to all those who have contributed to the
successful completion of this individual project conducted at Jawaharlal Nehru
University, Delhi.
First and foremost, I extend my heartfelt thanks to Dr. Manju Khari, my project
supervisor from JNU, for their invaluable guidance, continuous support, and
insightful feedback throughout the duration of the project. Their expertise and
mentorship played a crucial role in shaping the direction of this work.
I am also thankful for the resources, facilities, and infrastructure provided by
Jawaharlal Nehru University, which were essential for the successful execution of
this individual project.
Furthermore, I am grateful to Ayush Verma for their efforts, encouragement,
understanding, and moral support throughout the project journey.
Lastly, I would like to thank the university Banasthali Vidyapith for providing us
with the opportunity to undertake this project.
This project would not have been possible without the collective efforts and support
of everyone mentioned above. I am truly thankful for the collaborative spirit and
encouragement that surrounded this individual endeavour.
TABLE OF CONTENT
• Introduction
• Literature Review
• Methodology
• Results and Discussion
• Conclusion
• Appendices
• References
1. INTRODUCTION
Malware, short for malicious software, is designed to infiltrate, damage, or disable computers,
networks, or devices without the user's informed consent. The impact of malware is extensive,
ranging from data theft, financial loss, and privacy breaches to complete system shutdowns.
Cybercriminals use malware to gain unauthorized access to systems, steal sensitive information,
disrupt services, and extort money through tactics such as ransomware attacks. The sheer volume
and sophistication of modern malware present significant challenges to traditional security
measures. Malware can take many forms and is typically used by cybercriminals to gain unauthorized
access, steal sensitive information, disrupt services, or otherwise harm the victim. The term
encompasses a variety of hostile or intrusive software, including viruses, worms, Trojan horses,
ransomware, spyware, adware, and other malicious programs.
Malware is characterized by its covert nature and its ability to replicate and spread across systems
and networks. It can be embedded in seemingly legitimate software, attached to emails, or planted
on websites. The impact of malware can range from minor annoyances, such as unwanted
advertisements (adware), to severe consequences, including financial loss, identity theft, and
significant disruptions to critical infrastructure.
Types of Malware
1. Viruses
A virus is a type of malware that attaches itself to a legitimate program or file and reproduces itself
when the infected program is executed. Viruses can corrupt or delete data, use an infected computer to
spread itself to other systems, and cause system crashes.
2. Worms
Unlike viruses, worms are standalone software that can self-replicate and spread independently. They
exploit vulnerabilities in operating systems or applications to propagate through networks without
needing to attach themselves to a host program. Worms can consume bandwidth and overload servers,
leading to network disruptions.
3. Trojan Horses
Trojan horses, or Trojans, are malicious programs that disguise themselves as benign or useful
software. Once installed, they can create backdoors for unauthorized access, steal data, or install
additional malware. Trojans often appear as email attachments, downloads, or even legitimate-looking
apps.
4. Ransomware
Ransomware is a type of malware that encrypts the victim's files, making them inaccessible. The
attacker then demands a ransom payment, usually in cryptocurrency, for the decryption key.
Ransomware can paralyze businesses, hospitals, and government agencies by locking crucial data and
systems.
5. Spyware
Spyware is designed to secretly monitor and collect information about a user's activities without their
knowledge or consent. This can include keystrokes, passwords, credit card numbers, and other
sensitive data. Spyware is often used for identity theft or corporate espionage.
6. Adware
Adware automatically delivers advertisements to a user’s system. While not always malicious, adware
can track a user’s browsing habits and generate revenue through forced ad views or clicks. It often
comes bundled with free software and can degrade system performance and user experience.
7. Rootkits
Rootkits are sophisticated malware designed to provide ongoing privileged access to a computer while
actively hiding their presence. They can modify the operating system and intercept system calls to
conceal their activities. Rootkits are particularly dangerous because they can be extremely difficult to
detect and remove.
8. Botnets
A botnet is a network of infected computers, or "bots," controlled remotely by an attacker. Botnets are
used to carry out large-scale attacks, such as distributed denial-of-service (DDoS) attacks, which
overwhelm a target with traffic to render it inoperable. Botnets can also be used to send spam or
spread other types of malware.
9. Keyloggers
Keyloggers are a type of spyware that record every keystroke made on a computer. They are often
used to steal sensitive information, such as usernames, passwords, and credit card details. Keyloggers
can be software-based or hardware-based and are commonly used in targeted attacks.
When it comes to detecting malware, machine learning can be a powerful tool for
analyzing and identifying malicious software based on the characteristics and
behavior exhibited by different types of malware.
There are several key concepts and components that are fundamental to understanding
machine learning:
1. Data: Data plays a crucial role in machine learning, as algorithms learn from
large amounts of data to identify patterns and relationships. The quality and
quantity of data used for training machine learning models have a significant
impact on their performance and generalization capabilities.
2. Features: Features are the attributes or characteristics extracted from the
data that are used as inputs to machine learning algorithms. Features help
the algorithm differentiate between different classes or categories and make
accurate predictions or decisions.
3. Algorithms: Machine learning algorithms are mathematical models and
techniques that learn patterns or relationships from data to perform specific
tasks. There are various types of machine learning algorithms, including
supervised learning, unsupervised learning, reinforcement learning, and deep
learning, each suited for different types of tasks and data.
4. Training: Training refers to the process of feeding data into a machine
learning algorithm to enable it to learn from the patterns and relationships
present in the data. During training, the algorithm adjusts its parameters
based on the input data to minimize errors and improve performance.
5. Evaluation: Evaluating the performance of a machine learning model is crucial
to assess its accuracy, generalization capabilities, and ability to make
predictions on new, unseen data. Common evaluation metrics include
accuracy, precision, recall, F1 score, ROC curve analysis, and confusion
matrices.
2. LITERATURE REVIEW
1.
Introduction: The paper focuses on detecting IoT malware through forensic analysis of
network traffic features using Machine Learning models. The increasing use of IoT devices
has led to a rise in cybercrimes, making the identification of malware in IoT crucial. The
research proposes a model that achieved almost 100% detection accuracy during the
experimental phase.
Research Challenges: The challenges highlighted in the paper include the need for real-time
detection of IoT malware, forensic analysis of IoT malware features, detecting malware at
the initial intrusion stage, and analyzing Opcode for detecting cloned malware.
Objectives: The main objective of the research is to design a high-accuracy model to protect
IoT devices from malware intrusions at the initial stage. The study aims to comprehensively
study vulnerable factors in IoT devices, methods of attacks, and commonly used techniques
for compromising IoT devices.
Dataset Description: The paper uses the Malware on IoT Dataset and IoT-23 IoT Malware
Datasets, which contain labeled datasets of IoT malware and benign IoT traffic. The datasets
consist of various scenarios capturing different types of IoT malware attacks.
Algorithms Used: The research utilizes Machine Learning algorithms such as Random Forest,
Naive Bayes, Hoeffding Tree, Random Tree, and REPTree for detecting IoT malware based on
network traffic features.
Results of the Algorithms Obtained : The Random Forest classifier achieved the highest
accuracy of almost 100%, outperforming other machine learning algorithms. The results
showed accuracy ranging from 76% to 99% for different classifiers, with Random Forest being
the most accurate.
Future Work Proposed : Future work proposed in the paper includes enhancing the model by
considering fog level implementation at the IoT layer, integrating image visualization
techniques, and examining different attack types at various layers to improve model
efficiency. Additionally, the study suggests further research on the development of the
model for better detection and classification of IoT malware.
2.
Objectives: The main objective of the paper is to demonstrate the feasibility and benefits of
converting a traditional malware network classification model to a tiny machine learning
model for edge devices. The study aims to show that edge computing can provide faster and
more efficient network traffic classification compared to cloud computing.
Dataset Description: The research uses the USTC-TFC2016 dataset for training and testing the
network traffic classification model. This dataset contains session data with every layer,
providing raw network traffic data for analysis. The dataset is preprocessed to emulate real-
time raw network traffic for edge device testing.
Algorithms Used:The study employs a convolutional neural network (CNN) model trained
using the USTC-TFC2016 dataset. TensorFlow Lite is used to convert the traditional CNN
model into a tiny machine learning model suitable for edge devices. The model architecture
includes convolutional layers, max pool layers, fully connected layers, and a sigmoid activation
function.
Results of the Algorithms Obtained:The TensorFlow Lite model running on an edge device
showed faster execution times and reduced latency compared to the cloud computing
counterpart. Both models achieved 100% accuracy in classification, with the edge device
outperforming the cloud architecture in terms of speed at every input size.
Future Work Proposed:Future work proposed in the paper includes exploring the
implementation of the edge-computing architecture in practice, conducting network traffic
classification on unknown traffic, and investigating the feasibility of moving other
cybersecurity procedures to the edge. The study aims to further research the benefits and
applications of edge computing in cybersecurity systems.
3.
Introduction :The paper introduces the concept of using Generative Adversarial Networks
(GANs) for anomaly detection in network traffic. It addresses the increasing complexity of
network attacks and the need for effective anomaly detection methods. The proposed
approach combines GANs with encoder-decoder-encoder (EDE) framework to enhance
anomaly detection in network traffic.
Research Challenges: The challenges highlighted in the paper include the need for
continuous adaptation to evolving network attack strategies, the labor-intensive feature
design required by traditional machine learning methods, and the scalability issues faced by
deep learning models in handling large volumes of network data.
Objectives:The main objective of the paper is to propose an adversarial learning-aided
malicious network traffic detection (AMD) method based on the EDE framework. The AMD
method aims to detect unknown anomalies in a semi-supervised form, convert network traffic
data into images for processing, and achieve robust detection performance in complex
network environments.
Dataset Description: The research utilizes the USTC-TFC2016 dataset, which contains a large
number of network traffic samples for training and testing the anomaly detection model. The
dataset provides a diverse range of network traffic scenarios, enabling the evaluation of the
propose d AMD method's performance.
Algorithms Used: The study employs a model based on the EDE framework, consisting of two
encoders, a decoder, and a discriminator. The model utilizes adversarial training to analyze
data distribution in the potential image space. It focuses on capturing the differences
between normal and abnormal samples through potential vector distribution.
Results of the Algorithms Obtained:Experimental results demonstrate that the proposed AMD
method outperforms traditional machine learning methods, showing performance
improvements of up to 40%. The model effectively detects anomalies in targeted spaces and
exhibits robustness in noisy network environments. The AMD method achieves better
anomaly detection accuracy compared to conventional approaches.
Future Work Proposed: Future work proposed in the paper includes optimizing the AMD
model with updated GAN technology to enhance its performance further. The study aims to
continue improving the adversarial training aspect of the model and explore the application
of deep learning in network traffic anomaly detection tasks.
4.
Introduction: The paper discusses the application of edge computing for network traffic
classification in cybersecurity, focusing on detecting malware. It introduces the concept of
using tiny machine learning on edge devices to address latency issues faced by cloud
computing due to the increasing volume of network data.
Research Challenges: The challenges highlighted include the need for real-time processing of
network traffic data, limitations of traditional cloud computing in handling growing data
volumes, and potential delays in critical cybersecurity systems like autonomous vehicles.
Objectives: The main objective is to demonstrate the feasibility and benefits of converting a
traditional malware network classification model to a tiny machine learning model for edge
devices. The study aims to show that edge computing can provide faster and more efficient
network traffic classification compared to cloud computing.
Dataset Description: The research utilizes the USTC-TFC2016 dataset, containing session data
with every layer to provide raw network traffic data for analysis. The dataset is preprocessed
to simulate real-time raw network traffic for testing on edge devices.
Algorithms Used: The study employs a convolutional neural network (CNN) model trained
using the USTC-TFC2016 dataset. TensorFlow Lite is used to convert the CNN model into a tiny
machine learning model suitable for edge devices. The model architecture includes
convolutional layers, max pool layers, fully connected layers, and a sigmoid activation
function.
Results of the Algorithms Obtained: The TensorFlow Lite model running on an edge device
demonstrated faster execution times and reduced latency compared to the cloud computing
model. Both models achieved 100% accuracy in classification, with the edge device
outperforming the cloud architecture in terms of speed at every input size.
Future Work Proposed: Future work includes implementing the edge-computing architecture
in practice, exploring network traffic classification on unknown traffic, and investigating
moving other cybersecurity procedures to the edge. The study aims to further research the
benefits and applications of edge computing in cybersecurity systems.
5.
Introduction: The paper introduces the use of Generative Adversarial Networks (GANs) for
anomaly detection in network traffic. It aims to address the complexity of network attacks by
proposing a method that combines GANs with the encoder-decoder-encoder (EDE)
framework to enhance anomaly detection in network traffic.
Research Challenges: The challenges highlighted include the need for continuous adaptation
to evolving network attack strategies, labor-intensive feature design required by traditional
machine learning methods, and scalability issues faced by deep learning models in handling
large volumes of network data.
Objectives: The main objective is to propose an adversarial learning aided malicious network
traffic detection (AMD) method based on the EDE framework. The AMD method aims to
detect unknown anomalies in a semi-supervised form, convert network traffic data into
images for processing, and achieve robust detection performance in complex network
environments.
Dataset Description: The research utilizes the USTC-TFC2016 dataset, which provides a
diverse range of network traffic samples for training and testing the anomaly detection
model. The dataset enables the evaluation of the proposed AMD method's performance in
various network traffic scenarios.
Algorithms Used: The study employs a model based on the EDE framework, consisting of two
encoders, a decoder, and a discriminator. The model utilizes adversarial training to analyze
data distribution in the potential image space and focuses on capturing differences between
normal and abnormal samples.
Results of the Algorithms Obtained: Experimental results show that the proposed AMD
method outperforms traditional machine learning methods, achieving performance
improvements of up to 40%. The model effectively detects anomalies in targeted spaces and
demonstrates robustness in noisy network environments, showcasing better anomaly
detection accuracy compared to conventional approaches.
Future Work Proposed: Future work proposed includes optimizing the AMD model with
updated GAN technology to enhance performance further. The study aims to improve the
adversarial training aspect of the model, explore deep learning applications in network traffic
anomaly detection, and continue enhancing the model's robustness and accuracy in
detecting anomalies.
6.
7.
Introduction: The paper introduces MateGraph, a novel approach for mobile malware
detection and classification using traffic behavior graphs and Graph Convolution Networks
(GCNs). With the exponential growth of mobile devices and the increasing threat of mobile
malware, the study aims to enhance cyberspace security by leveraging rich communication
patterns in network traffic for malware detection.
Research Challenges:The challenges highlighted include the complexity of detecting mobile
malware in encrypted traffic, the need to differentiate between benign and malicious apps
with shared endpoints, and the limitations of existing methods in capturing diverse
communication behavior patterns of mobile applications.
Objectives:The primary objective is to propose MateGraph, a traffic behavior graph-based
approach for mobile malware detection and classification. The study aims to construct traffic
behavior graphs from network traffic data, utilize graph convolution network models to learn
graph topologies, and differentiate between benign and malicious applications effectively.
Dataset Description: The research evaluates MateGraph using the CICAndMal2017 dataset,
which contains 2,338 candidate apps across five categories. This dataset provides a diverse
set of network traffic samples for training and testing the proposed approach, enabling the
assessment of detection performance against state-of-the-art methods.
Algorithms Used:The study implements a traffic behavior graph construction method,
stacked clustering for edge establishment, and an enhanced Graph Convolution Network
(GCN) for learning behavior representations of network traffic. These algorithms work
together to detect and classify mobile malware based on communication patterns extracted
from traffic behavior graphs.
Results of the Algorithms Obtained:Experimental results show that MateGraph outperforms
several state-of-the-art methods, achieving an F1 score of 96.57% and increasing accuracy by
more than 7%. The proposed approach demonstrates superior performance in mobile
malware detection and classification tasks compared to existing techniques.
Future Work Proposed:Future work proposed includes further optimizing MateGraph for
enhanced detection performance, exploring the application of graph convolution networks in
analyzing diverse traffic behavior patterns, and investigating the scalability and robustness of
the approach in real-world network environments. The study aims to continue advancing
mobile malware detection methods using traffic behavior graphs and GCNs.
8.
Introduction: The paper introduces a novel approach to malware detection based on DNS
packet analysis over real network traffic. It focuses on the effectiveness of DNS-based
malware detection techniques when applied to entire network traffic generated by infected
terminals. The study aims to assess the performance of neural network-based DNS packet
analysis in detecting malware from real network traffic and identifying optimal detection
conditions.
Research Challenges: The challenges highlighted include the need to differentiate between
legitimate and malicious domain names, the limitations of existing malware detection
techniques when applied to real network traffic, and the complexity of analyzing DNS
packets to identify malware patterns accurately.
Objectives: The primary objective is to evaluate the effectiveness of DNS-based malware
detection techniques on real network traffic generated by infected terminals. The study aims
to identify the best parameters and configurations for neural network-based DNS packet
analysis to achieve optimal malware detection performance under specific conditions.
Dataset Description: The research utilizes a test dataset consisting of real network traffic,
including numerous DNS queries. Unlike previous works that focus on individual domain
names, this study evaluates the overall malevolence of network terminals based on their
traffic within specific time intervals. The dataset is labeled based on assumptions like
legitimate domains never appearing in certain DNS queries.
Algorithms Used: The study employs Long Short-Term Memory (LSTM) neural networks for
DNS packet analysis to estimate the probability that a domain name was generated by a
Domain Generation Algorithm (DGA). Different training strategies are considered, such as
training with whole domain names, without certain eTLDs, or without the root domain.
Results of the Algorithms Obtained: Experimental results demonstrate the effectiveness of
the neural network-based DNS packet analysis in detecting malware from real network
traffic. The study identifies optimal parameters and configurations that lead to improved
malware detection performance, showcasing the potential of DNS-based techniques in
identifying malicious network behavior.
Future Work Proposed: Future work proposed includes further optimizing the neural
network models for enhanced malware detection accuracy, exploring additional metrics and
methods for analyzing network traffic behavior, and investigating the scalability and
robustness of DNS-based malware detection techniques in diverse network environments.
The study aims to continue advancing malware detection approaches using neural networks
and DNS packet analysis.
9.
Introduction: The paper introduces a novel approach for mobile malware detection and
classification using traffic behavior graphs and Graph Convolution Networks (GCNs). With the
increasing threat of mobile malware and the proliferation of mobile devices, the study aims
to enhance cyberspace security by leveraging rich communication patterns in network traffic
for malware detection.
Research Challenges: The challenges highlighted include the complexity of detecting mobile
malware in encrypted traffic, the need to differentiate between benign and malicious apps
with shared endpoints, and the limitations of existing methods in capturing diverse
communication behavior patterns of mobile applications.
Objectives: The primary objective is to propose MateGraph, a traffic behavior graph-based
approach for mobile malware detection and classification. The study aims to construct traffic
behavior graphs from network traffic data, utilize graph convolution network models to learn
graph topologies, and effectively differentiate between benign and malicious applications.
Dataset Description: The research evaluates MateGraph using the CICAndMal2017 dataset,
containing 2,338 candidate apps across five categories. This dataset provides a diverse set of
network traffic samples for training and testing the proposed approach, enabling the
assessment of detection performance against state-of-the-art methods.
Algorithms Used:The study implements a traffic behavior graph construction method,
stacked clustering for edge establishment, and an enhanced Graph Convolution Network
(GCN) for learning behavior representations of network traffic. These algorithms work
together to detect and classify mobile malware based on communication patterns extracted
from traffic behavior graphs.
Results of the Algorithms Obtained: Experimental results show that MateGraph outperforms
several state-of-the-art methods, achieving an F1 score of 96.57% and increasing accuracy by
more than 7%. The proposed approach demonstrates superior performance in mobile
malware detection and classification tasks compared to existing techniques.
Future Work Proposed: Future work proposed includes further optimizing MateGraph for
enhanced detection performance, exploring the application of graph convolution networks in
analyzing diverse traffic behavior patterns, and investigating the scalability and robustness of
the approach in real-world network environments. The study aims to continue advancing
mobile malware detection methods using traffic behavior graphs and GCNs.
10.
Introduction:The paper presents a system for detecting malware by analyzing network traffic.
It utilizes supervised learning methods and extracts behavioral features across different
protocols and network layers to identify malicious activities.
Research Challenges:The challenges highlighted in the paper include the difficulty in
detecting modern malware that can evade traditional anti-malware software and the need
for passive systems to detect malicious activities without accessing the targeted machines
directly.
Objectives:The main objectives of the paper are to detect malware incidents, attribute them
to known malware families, discover new threats, and outperform existing rule-based
systems like Snort and Suricata in terms of detection accuracy and timeliness.
Dataset Description: The dataset used in the study includes network traffic captures from
various sources, including sandbox environments, real enterprise networks, and publicly
available databases. The dataset consists of both benign and malicious traffic instances
labeled using different methods.
Algorithms Used:The paper employs machine learning algorithms such as Naïve Bayes,
decision tree (J48), and Random Forest for classification tasks. Feature selection is done
using the Correlation Feature Selection (CFS) algorithm, and the Weka library is utilized for
implementation.
Results: The algorithms demonstrated high accuracy in distinguishing between benign and
malicious traffic, as well as classifying malware into known families. The Random Forest
algorithm performed particularly well in detecting unknown malware families,
outperforming Naïve Bayes and J48 in most cases.
Future Work:Future work proposed in the paper includes exploring transfer learning
techniques to improve detection in untrained network environments, evaluating the system
on mobile network traffic, clustering malware families, and adapting the method for online
detection in high-bandwidth networks.
11.
Introduction:The paper introduces a novel system for detecting malware in network traffic
based on statistical characteristics of HTTP requests. Traditional signature-based methods are
becoming less effective due to increasing threats using evasion techniques. The proposed
system aims to overcome these limitations by focusing on statistical features of HTTP
requests to identify security threats.
Research Challenges:The challenges addressed in the paper include the difficulty in detecting
new and polymorphic malware threats, the limitations of traditional signature-based
detection methods, and the need for more robust and generalizable detection techniques to
combat evolving cyber threats.
Objectives:The main objectives of the paper are to develop a system that can accurately
detect security threats in network traffic based on statistical characteristics of HTTP requests.
The system aims to achieve high precision and recall rates in identifying malicious flows and
to provide a more effective method for discovering network threats in the future.
Dataset Description: The dataset used in the study consists of millions of live traffic flows,
including both benign and malicious instances. Malware samples from various botnet
families were collected and analyzed, along with real data from volunteer networks. The
dataset was used for training and evaluating the proposed detection system.
Algorithms Used:The paper utilizes machine learning algorithms such as Random Forest and
XGBoost for classification tasks. Feature extraction is performed on HTTP traffic data,
including URL statistical features, HTTP header fields, and HTTP header sequences. These
features are used to train the classification models for detecting malicious traffic.
Results: The algorithms achieved high precision and recall rates in detecting malicious HTTP
traffic, with the XGBoost model outperforming the Random Forest model. The inclusion of
HTTP header sequences as features improved the precision rate to 98.32%, with a low false
positive rate. The models demonstrated effectiveness in identifying malware traffic in both
training and real-world network environments.
Future Work: Future work proposed in the paper includes addressing evasion attacks by
malware, improving detection of encrypted traffic using HTTPS, and exploring additional
methods to enhance the overall detection performance. The authors also suggest
investigating transfer learning techniques, evaluating the system on mobile network traffic,
and clustering malware families for more comprehensive threat analysis.
12.
Introduction: The paper introduces a system for detecting malware in network traffic by
analyzing packet sequences. It aims to address the limitations of traditional signature-based
methods and improve the accuracy of malware detection by leveraging machine learning
techniques on packet sequences.
Research Challenges: The challenges highlighted in the paper include the increasing
complexity and diversity of malware, the need for real-time detection capabilities, the
limitations of signature-based detection systems, and the requirement for efficient feature
extraction methods from packet sequences.
Objectives:The main objectives of the paper are to develop a system that can accurately
detect malware in network traffic based on packet sequences, improve the efficiency and
timeliness of malware detection, and provide a more robust and adaptive approach to
combating evolving cyber threats.
Dataset Description: The dataset used in the study consists of network traffic captures from
various sources, including both benign and malicious traffic instances. The dataset is labeled
with ground truth information to facilitate supervised learning tasks and model evaluation. It
includes packet sequences from different network protocols for training and testing the
detection system.
Algorithms Used:The paper employs machine learning algorithms such as Long Short-Term
Memory (LSTM) networks and Convolutional Neural Networks (CNNs) for analyzing packet
sequences and detecting malware. These algorithms are trained on the dataset to learn
patterns indicative of malicious behavior in network traffic.
Results: The algorithms demonstrated high accuracy in detecting malware instances from
packet sequences, with the LSTM network achieving better performance compared to CNNs
in most cases. The system showed promising results in identifying various types of malware
and distinguishing them from benign traffic, showcasing the effectiveness of the proposed
approach.
Future Work: Future work proposed in the paper includes exploring ensemble learning
techniques to further improve detection accuracy, enhancing the system's scalability for
large-scale network environments, investigating the impact of different network protocols on
detection performance, and adapting the approach for real-time detection in dynamic
network settings. Additionally, the authors suggest evaluating the system's performance on
encrypted traffic and exploring the use of deep learning models for feature extraction from
packet sequences.
13.
Introduction: The paper introduces a novel approach for analyzing network traffic to detect
malware using machine learning algorithms. The focus is on enhancing the accuracy and
efficiency of malware detection in network traffic by leveraging advanced techniques for
feature extraction and classification.
Research Challenge: The challenges outlined in the paper include the increasing
sophistication of malware, the limitations of traditional signature-based detection methods,
the need for real-time detection capabilities, and the complexity of analyzing large volumes
of network traffic data efficiently.
Objectives: The primary objectives of the paper are to develop a system that can effectively
detect malware in network traffic, improve the speed and accuracy of detection, and provide
a more robust defense mechanism against evolving cyber threats. The aim is to enhance
network security by leveraging machine learning for malware detection.
Dataset Description: The dataset utilized in the study comprises a diverse collection of
network traffic samples, including both benign and malicious instances. The dataset is
labelled to facilitate supervised learning tasks and model evaluation. It includes various
features extracted from network traffic data to train and test the detection system.
Algorithms Used: The paper employs a combination of machine learning algorithms, such as
Random Forest and Support Vector Machines (SVM), for analyzing network traffic data and
detecting malware. These algorithms are trained on the dataset to learn patterns indicative
of malicious behaviour and classify network traffic accurately.
Results: The algorithms demonstrated promising results in detecting malware in network
traffic, achieving high accuracy rates in distinguishing between benign and malicious traffic.
The models showed effectiveness in identifying various types of malware and exhibited
robust performance in real-world scenarios, showcasing the potential of the proposed
approach.
Future Work: Future work proposed in the paper includes exploring deep learning
techniques for enhanced feature extraction, investigating the use of anomaly detection
methods for identifying zero-day threats, optimizing the system for real-time detection, and
expanding the dataset to include a wider range of network traffic scenarios. Additionally, the
authors suggest evaluating the system's performance on encrypted traffic and incorporating
feedback mechanisms for continuous improvement.
14.
15.
environments.
8 “Malware Network “1st Eric Chen evaluate the optimizing the
Traffic Computer and effectiveness neural network
Information Science and
Classification on of DNS-based models for
Engineering
the Edge” University of Florida malware enhanced
Gainesville, USA “ detection malware
techniques detection
on real accuracy,
network traffic exploring
generated by additional metrics
infected and methods for
terminals. analyzing network
traffic behavior
9 “Universal “Onur Barut The study aims optimizing
Network Traffic Network and Edge to construct MateGraph for
Group traffic behavior enhanced
Analysis for
Intel Corporation
Malicious graphs from detection
Berlin, MA”
Traffic Detection network traffic performance,
using RappNet: A data, utilize exploring the
Privacy-Preserving graph application of
Approach” convolution graph convolution
network networks
models to
learn graph
topologies, and
effectively
differentiate
between
benign and
malicious
applications.
10 “Unknown Malware “Dmitri Bekerman, Bracha detect includes exploring
Detection Using Shapira, Lior Rokach, Ariel malware transfer learning
Network Traffic Bar” incidents, techniques to
Classification” attribute them improve
to known detection in
malware untrained
families, network
discover new environments,
threats, and evaluating the
outperform system on mobile
existing rule- network traffic
based systems
like Snort and
Suricata in
terms of
detection
accuracy and
timeliness.
11 “Malware Detection “Anshul Arora develop a addressing
Department of Computer Science system that evasion attacks by
Using Network Traffic and Engineering
Analysis Indian Institute of Technology” can accurately malware,
in Android Based detect security improving
Mobile Devices” threats detection of
in network encrypted traffic
traffic based using HTTPS, and
on statistical exploring
characteristics additional
of HTTP methods to
requests. enhance the
overall
detection
performance.
12 “A Method Based on “Ke Li1,2, Rongliang Chen1 , system that exploring
Statistical Liang Gu2 , Chaoge Liu3 , Jie can accurately ensemble learning
Characteristics for Yin” detect techniques to
Detection Malware malware in further
Requests in Network network traffic improve detection
Traffic” based on accuracy,
packet enhancing the
sequences, system's
improve the scalability for
efficiency and large-scale
timeliness of network
malware environments,
detection, and investigating the
provide a more impact of
robust and different network
adaptive protocols on
approach to detection
combating performance,
evolving cyber and adapting the
threats. approach for real-
time detection in
dynamic network
settings.
13 “Profiling Network “Manmeet Singh Gill, Dale a system that includes exploring
Traffic Behavior Lindskog, Pavol Zavarsky can effectively deep learning
Department of Information detect techniques for
for the Systems Security and
purpose of Assurance Management” malware in enhanced
Anomaly-based network traffic, feature extraction,
Intrusion improve the investigating the
Detection” speed and use of anomaly
accuracy of detection
detection, and methods for
provide a more identifying zero-
robust defense day
mechanism threats,
against
evolving cyber
threats.
14 “End-To-End “PENG YUJIE1 , NIU a robust deep learning
Android Malware WEINA1* , ZHANG system for methods for
Classification Based XIAOSONG1 , ZHOU JIE1 , detecting feature extraction,
On Pure WU HAO1 , CHEN
RUIDONG1” malware in investigating
Traffic Images” network anomaly
traffic, detection
enhance the techniques for
speed and zero-day threat
accuracy of identification,
detection, and optimizing the
provide a system
proactive for real-time
defense detection,
mechanism
against cyber
threats.
15 “Android malware “Dmitri Bekerman, Bracha model for exploring
classification based Shapira, Lior Rokach, Ariel Android advanced deep
on network traffic Bar” malware learning
analysis” classification techniques for
based feature extraction,
on network and applying the
traffic analysis model to other
and deep traffic-related
learning.
classification
tasks.
3. RESEARCH METHODOLOGY
This section provides a deep insight of the systematic and structured approach used by
researcher to conduct this study, which is divided into four sub-sections: Pre-processing, Data
description, Proposed framework, Model training. Pre-processing involves cleaning
and transforming data in order to prepare it for analysis. Data description includes
statistical summaries and exploration to comprehend data properties. The proposed
framework describes the precise setup and methods for achieving the objectives of
the research. Model training entails putting the selected framework into practice and
improving model parameters to accurately represent data patterns and improve
analytical or predictive performance.
Machine learning has become a significant competitive differentiator for many companies.
Classical machine learning is often categorized by how an algorithm learns to become more
accurate in its predictions.
There are four basic approaches: learning, supervised unsupervised learning, semi
supervised learning and reinforcement learning. The type of algorithm data
scientists chooses to use depends on what type of data they want to predict. In
machine learning, tasks are generally classified into broad categories.
These categories are based on how learning is received or how feedback on the learning
is given to the system developed.
Two of the most widely adopted machine learning methods are:
• Supervised learning which trains algorithms based on example input and output
data that is labelled by humans, and
• Unsupervised learning which provides the algorithm with no labelled data in order
to allow it to find structure within its input data.
These are just a few examples of the diverse range of machine learning algorithms
available, each with its strengths, weaknesses, and suitable applications. The choice
of algorithm depends on the nature of the data, the task to be performed, and the
specific requirements of the problem at hand. Machine learning practitioners often
experiment with different algorithms and techniques to find the most effective
solution for a given problem.
Classification is a data mining technique that allocates objects in a group to target
categories or classes. The goal of classification is to perfectly calculate the target
class for every case in the data. Classification is twostep process. The first step is
learning in which classification algorithm analysed the training data. The second
step is a classification in which test data is used to calculate the accuracy of data.
Classification predicts the result based on specified input. Item is belonged to which
class is calculated by classification algorithms based on the training dataset. There
are various classification techniques are available. Naïve -Bayes and SVM is
most successful techniques for classification.
Support Vector Machine (SVM) is a powerful supervised learning algorithm used for
both classification and regression tasks. SVM aims to find the optimal hyperplane
that best separates data points into different classes by maximizing the margin
between the classes. Here are some key points about the SVM algorithm:
Overall, Naive Bayes is a versatile and efficient algorithm that is widely used for text
classification and other machine learning tasks. Despite its simplifying assumptions,
Naive Bayes can often achieve competitive performance and serves as a strong
baseline model for many classification problems.
• Loading Packages:
• Finished Installation:
• Launching Jupyter:
PYTHON LIBRARIES REQUIRED
There are many libraries in python. The libraries going to use are as follows:
•NUMPY LIBRARY
•PANDAS LIBRARY
• MATPLOTLIB LIBRARY
• SEABORN LIBRARY
•TENSORFLOW LIBRARY
• KERAS LIBRARY
NUMPY LIBRARY
Description: NumPy, which stands for Numerical Python, is a powerful numerical
computing library in Python. It provides support for large, multi-dimensional arrays
and matrices, along with a collection of mathematical functions to operate on these
arrays.
Key Features:
• Efficient and fast array operations.
• Mathematical functions for linear algebra, Fourier analysis, random number
generation, etc.
• Integration with other libraries and languages.
Import Code:
import numpy as np
PANDAS LIBRARY
Description: Pandas is a data manipulation and analysis library for Python. It
provides data structures like DataFrame for efficient data manipulation with built
in methods for reshaping, merging, grouping, and aggregating data.
Key Features:
• DataFrame for handling structured data.
• Data cleaning, filtering, and manipulation.
• Integration with databases and Excel.
Import Code:
import pandas as pd
MATPLOTLIB LIBRARY
Description: Matplotlib is a 2D plotting library for Python that produces high
quality static, animated, and interactive visualizations. It can be used for creating a
wide variety of plots, charts, and figures.
Key Features:
• Line plots, scatter plots, bar plots, histograms, etc.
• Customization of plots with labels, titles, colors, and styles.
• Support for LaTeX for mathematical expressions.
Import Code:
import matplotlib.pyplot as plt
SEABORN LIBRARY
Description: Seaborn is a statistical data visualization library based on Matplotlib.
It provides a high-level interface for drawing attractive and informative statistical
graphics.
Key Features:
• Simplifies complex visualizations.
• Integration with Pandas DataFrames.
• Support for statistical estimation and data aggregation.
Import Code:
import seaborn as sns
TENSORFLOW LIBRARY
Description: TensorFlow is an open-source machine learning library developed by
Google. It provides a comprehensive ecosystem of tools, libraries, and community
resources for building and deploying machine learning models.
Key Features:
• Deep learning and neural network development.
• Flexibility for deployment on various platforms.
• Support for both CPU and GPU acceleration.
Import Code:
import tensorflow as tf
KERAS LIBRARY
Description: Keras is an open-source high-level neural networks API written in
Python and designed to be user-friendly and modular. It is often used as a high-level
interface for TensorFlow.
Key Features:
• Simplified API for building and training deep learning models.
• Easy prototyping and experimentation.
• Compatible with various backends, including TensorFlow.
Import Code:
from tensorflow import keras
3.1 PREPROCESSING
Data pre-processing is a data mining method employed to convert raw data into a
more efficient and usable format. Prior to data analysis, data cleaning is required so
that you can identify patterns in the data.
• Data Cleaning: In the initial dataset, there can be issues like missing data,
inaccuracies, or noise. Data cleaning is the process of addressing these
concerns, which involves handling inaccurate data, noisy data, and other
related issues to ensure data quality.
• Data Reduction: This aims to reduce the volume or complexity of the dataset
by summarizing, selecting, or transforming features, resulting in a more
manageable yet representative dataset for analysis or modeling. These
processes are crucial for efficient data handling and meaningful insights
extraction.
• Data Splitting: Dividing the dataset into training, validation, and test sets for
model training, tuning, and evaluation.
Image data augmentation is a method for generating new images from old ones. This
can be achieved by making a few minor adjustments to images, such as altering its
brightness, rotating it, or moving the subject horizontally or vertically. Using image
augmentation techniques, it is possible to artificially expand the size of the training
dataset and give the model much more data to work with. As a result, the model will
better recognize the novel variations of your training data, increasing its accuracy.
With the objective of mitigating the challenges associated with limited
datasets, data augmentation involves applying diverse transformations to existing
images. These transformations include rotation, flipping, zooming, translation,
brightness and contrast adjustments, color jittering, and noise addition. By
introducing these variations, the dataset becomes more comprehensive and
representative of the potential real-world scenarios the model might encounter. This
approach serves as a form of regularization, preventing overfitting by exposing the
model to a broader range of data during training. The augmented dataset not only
aids in creating a more robust model but also improves its adaptability to unforeseen
conditions, making it especially valuable in tasks such as image classification, object
detection, and segmentation.
Here are some common types of data used in malware detection data description:
1. File Characteristics:
o File size: The size of the file can be indicative of potential malware, as
some malware variants may have unusually large or small file sizes.
o File type: Different file types (e.g., executable files, scripts, documents)
can be analyzed to detect suspicious or malicious behavior.
o File extension: Unusual or suspicious file extensions can be an indicator
of malware, such as executables posing as other file types
(e.g., .txt.exe).
o File entropy: Measures the randomness or complexity of the file's
binary data, which can help identify obfuscated or encrypted malware.
2. Code Analysis:
3. Behavioral Analysis:
6. Contextual Data:
o Source of the file: Information about where the file was obtained (e.g.,
email attachments, downloads from suspicious websites) can provide
contextual cues for malware classification.
o User behavior: Analyzing user activities and interactions related to the
file (e.g., opening, execution) can provide insights into potential
malware infections.
o Environmental data: Information about the system environment,
network configuration, and security settings can influence malware
behavior and detection.
7. Network Data:
o Logs: Analyzing system logs (e.g., Windows Event Logs, firewall logs,
antivirus logs) for malware-related events and activities.
o Anomalies: Detecting anomalous behavior such as sudden spikes in
network traffic, unusual system calls, or unauthorized access attempts.
# Create a copy of the randomly sampled DataFrame to avoid modifying the original
data.
df = random_sample_df.copy()
# file_path = 'Malware_Detect_Data.csv'
# original_df = pd.read_csv(file_path, delimiter='|')
# df = original_df.copy()
New dataset creation
In [8]:
ts 0
uid 0
id.orig_h 0
id.orig_p 0
id.resp_h 0
id.resp_p 0
proto 0
service 199352
duration 157977
orig_bytes 157977
resp_bytes 157977
conn_state 0
local_orig 200000
local_resp 200000
missed_bytes 0
history 3434
orig_pkts 0
orig_ip_bytes 0
resp_pkts 0
resp_ip_bytes 0
tunnel_parents 200000
label 0
detailed-label 93057
dtype: int64
Bar chart
In [9]:
Heat map
# Generate a heatmap of null values
plt.figure(figsize=(12, 8))
sns.heatmap(df.isnull(), cmap='viridis', cbar=False)
plt.xlabel('Columns')
plt.ylabel('Rows')
plt.title('Heatmap of Null Values in Malware_Detect_Data.csv')
plt.show()
Calculating the null values as a percentage
null_values = df.isnull().sum()
In [13]:
plt.figure(figsize=(10, 8))
df.drop(['duration'],axis=1,inplace=True)
#removing null values
df.dropna(subset=['history'], inplace=True)
plt.figure(figsize=(12, 8))
sns.heatmap(df.isnull(), cmap='viridis', cbar=False)
plt.xlabel('Columns')
plt.ylabel('Rows')
plt.title('Heatmap of Null Values in Malware_Detect_Data.csv')
plt.show()
In [24]:
X = df.drop('label', axis=1)
y = df['label']
scaler = Normalizer()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Model Training
SVM Model
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score
svm_model = SVC(kernel='linear', C=0.001, gamma='scale')
svm_model.fit(X_train_scaled,y_train)
SVC
SVC(C=0.001, kernel='linear')
Making predicitions
print(f"Accuracy: {accuracy}")
print(f"Classification Report:\n{report}")
clf_svm =svm_model
Naive Bayes
from sklearn.naive_bayes import GaussianNB
nb_model = GaussianNB()
:
Training¶
nb_model.fit(X_train, y_train)
GaussianNB
GaussianNB()
Making predictions
y_pred_nb = nb_model.predict(X_test)
Precision = 𝑇𝑃
_____________
𝑇𝑃 + 𝐹𝑃
F1-Score = 2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙
______________________
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙
4.2 EVALUATION METRICS
This subsection discusses the results obtained with respect to the evaluation metrics
concerned in the previous subsection. The calculation of each and every class metric
is done using the confusion matrix represented in Figure 5. The evaluation metric
results corresponding to the confusion matrix are displayed in Table 1.
Accuracy: 0.8019026301063235
This table presents the precision, recall, F1-score, and support values for each class (0 and
1), followed by accuracy, macro-average, and weighted-average scores for the malware
detection model evaluation.
Precision recall F1- score Support
0 0.80 0.96 0.87 17918
1 0.96 0.79 0.87 21396
Accuracy 0.87 39314
Macro avg 0.88 0.88 0.87 39314
Weighted avg 0.88 0.87 0.87 39314
This table presents the precision, recall, F1-score, and support values for each class (0 and
1), followed by accuracy, macro-average, and weighted-average scores for the malware
detection model evaluation o.f Naïve Bayes
Accuracy (Naive Bayes): 0.8689016635295315
To evaluate and compare the two classification models based on the metrics
provided in the classification reports, we can consider the following aspects:
Analyzing network traffic data is pivotal for maintaining network security and
performance. The evaluation of the classification models showcased the Naive
Bayes model's superior accuracy, precision, recall, and F1-score compared to
the second model. Its ability to predict network traffic patterns, anomalies, or
threats was notably robust in this study.
Future Scope:
By pursuing these future scopes in malware detection and network traffic analysis,
organizations can bolster their cybersecurity defenses, enhance threat detection
capabilities, and optimize network performance for more secure and efficient
operations.
6. APPENDICES
Signature-based detection is one of the most traditional and widely used methods for identifying malware. It
involves the use of known patterns or signatures of malicious software to detect and prevent infections. Here's a
deeper look into how it works, its advantages and disadvantages, and some practical considerations.
1. Signature Creation:
o Identification: Security researchers and antivirus companies identify a new piece of malware.
o Analysis: The malware is analyzed to extract unique patterns, such as specific sequences of
bytes, known as signatures.
o Database Update: These signatures are then added to a database that antivirus and other
security tools use to identify malware.
2. Detection Process:
o Scanning: The antivirus software scans files, memory, and network traffic for these known
signatures.
o Comparison: The scanned data is compared against the signature database.
o Alert and Action: If a match is found, the system alerts the user or administrator and takes
predefined actions, such as quarantining or deleting the infected file.
1. Speed and Efficiency: Signature-based detection is fast because it relies on straightforward pattern
matching.
2. Low False Positive Rate: Since it looks for specific, known patterns, the chances of false positives are
relatively low.
3. Simplicity: This method is easy to understand and implement, making it a standard feature in many
security tools.
1. Inability to Detect Zero-Day Threats: It cannot detect new, unknown malware for which no signature
exists. This makes it ineffective against zero-day attacks.
2. High Maintenance: Signature databases need constant updates to include new malware. This requires
a significant amount of ongoing effort from security researchers.
3. Polymorphic and Metamorphic Malware: Some malware can change its code (polymorphism) or
rewrite itself (metamorphism) to evade signature-based detection.
Practical Considerations
1. Regular Updates: Ensuring that the signature database is regularly updated is crucial. Most antivirus
software has automatic update features to keep the database current.
2. Combining with Other Methods: Given its limitations, signature-based detection is
often used in conjunction with other detection methods, such as heuristic or anomaly-
based detection, to provide a more comprehensive defense.
3. Performance Impact: While generally efficient, the performance impact of signature
scanning should be monitored, especially in environments with limited resources.
4. Scope of Scanning: Defining the scope (files, memory, network traffic) and
frequency of scanning is essential to balance security and performance.
While signature-based detection remains a vital tool in the cybersecurity arsenal, its
limitations mean that it cannot be relied upon exclusively. The future likely involves more
sophisticated and integrated approaches, leveraging machine learning and behavioral analysis
to complement signature-based methods. This hybrid approach aims to provide more robust
protection against both known and unknown threats.
1. Baseline Establishment:
o Data Collection: Collect data on normal operations of the network, systems,
or user behaviors over a period.
o Profiling: Create profiles of typical behavior based on the collected data. This
can include network traffic patterns, system performance metrics, user login
patterns, and more.
2. Detection Process:
o Monitoring: Continuously monitor real-time data from the network or system.
o Comparison: Compare the current data against the established baseline to
identify any deviations.
o Alert and Action: If significant deviations are detected, the system generates
an alert and may initiate predefined responses such as blocking traffic,
isolating systems, or notifying administrators.
1. High False Positive Rate: Normal but unusual activities can be flagged as anomalies,
leading to false alarms.
2. Complexity: Establishing accurate baselines and maintaining them can be complex
and resource-intensive.
3. Training Period: Requires a learning period to establish what constitutes normal
behavior, which can be lengthy and require a large amount of data.
Practical Considerations
[1] Matrosov, Aleksandr; Rodionov, Eugene; Harley, David; Malcho, Juraj;, "Stuxnet Under
the Microscope," ESET LLC,
September 2010.
[2] Symantec, "Malware Targeting Windows 8 Uses Google
Docs":http://www.symantec.com/connect/blogs/malware-targetingwindows-8-uses-google-
docs.
[3] C. Rossow, C. J. Dietrich, H. Bos, L. Cavallaro, M. Van Steen,F. C. Freiling and N.
Pohlmann, "Sandnet: Network Traffic Analysis of Malicious Software," in Workshop on
Building Analysis Datasets and Gathering Experience Returns for
Security, Salzburg, Austria, 2011.
[4] N. Stakhanova, M. Couture and A. A. Ghorbani, "Exploring network-based malware
classification," in 6th International
Conference on Malicious and Unwanted Software(MALWARE), Fajardo, Puerto Rico, 2011.
[5] G. Xie, Q. Li, Y. Jiang, T. Dai, G. Shen, R. Li, R. Sinnott, and S. Xia, “Sam: Self-
attention based deep learning method for online traffic classification,” in Proceedings of the
Workshop on Network Meets AI & ML, ser. NetAI ’20. New York, NY, USA: Association for
Computing Machinery, 2020, p. 14–20. [Online].
Available:https://doi.org/10.1145/3405671.3405811
[6] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani,“Toward generating a new intrusion
detection dataset and intrusion traffic characterization,” in International Conference on
Information Systems Security and Privacy, 2018. [Online]. Available:
https://api.semanticscholar.org/CorpusID:4707749
[7] “Stratosphere research laboratory protecting the civil society through high quality
research.” [Online]. Available:
https://www.stratosphereips.org/datasets-overview
[8] Zhou, Y., and Jiang, X, “ Dissecting AndroidMalware:Characterization and Evolution,”
In Proceedings of the 33rd IEEE Symposium on Securityand Privacy (2012), IEEE Oakland
’12.
[9] Garg, S., Sarje, A.K. and Peddoju, S.K., “Improved Detection of P2P Botnets through
Network Behavior Analysis” In Recent Trends in
Computer Networks and Distributed Systems Security, Springer (2014), pp. 334-345.
[10] Sharifnya R, Abadi M. A novel reputation system to detect DGA-based botnets [A]. //
Proceedings of the 3th International Conference on Computer and Knowledge Engineering
(ICCKE 2013) [C], Piscatawary, NJ: IEEE, 2013: 417-423.
[11] Schiavoni S, Maggi F, Cavallaro L, et al. Phoenix: DGA-based botnet tracking and
intelligence [A]. // Proceedings of the 2014 International
Conference on Detection of Intrusions and Malware, and Vulnerability Assessment [C],
Berlin: Springer, 2014: 192-211.
[12] Gu G, Perdisci R, Zhang J, et al. BotMiner: Clustering Analysis of Network Traffic for
Protocol-and Structure-Independent Botnet
Detection[C]//USENIX security symposium. 2008, 5(2): 139-154.
[13] Lu C, Brooks R. Botnet traffic detection using hidden markov models [A]. //
Proceedings of the Seventh Annual Workshop on Cyber Security
and Information Intelligence Research [C], New York: ACM Press, 2011.
[14] Grill M, Rehák M. Malware detection using HTTP user-agent discrepancy identification
[A]. // Proceedings of the 2014 IEEE
International Workshop on Information Forensics and Security (WIFS) [C], Piscatawary, NJ:
IEEE, 2014: 221-226.
[15] A. H. Lashkari, A. F. A. Kadir, L. Taheri, and A. A. Ghorbani, “Toward Developing a
Systematic Approach to Generate Benchmark Android Malware Datasets and Classification,”
Proc. - Int. Carnahan Conf. Secur. Technol., vol. 2018-Octob, no. Cic, pp. 1–7, 2018.