Information 15 00046
Information 15 00046
Article
A Holistic Approach to Ransomware Classification: Leveraging
Static and Dynamic Analysis with Visualization
Bahaa Yamany 1 , Mahmoud Said Elsayed 2, * , Anca D. Jurcut 2 , Nashwa Abdelbaki 1 and Marianne A. Azer 1,3
1 School of Information Technology and Computer Science, Nile University, Cairo 12566, Egypt;
b.yamany@nu.edu.eg (B.Y.); nabdelbaki@nu.edu.eg (N.A.)
2 School of Computer Science, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland;
anca.jurcut@ucd.ie
3 Computers and Systems Department, National Telecommunication Institute, Cairo 11768, Egypt;
mazer@nu.edu.eg
* Correspondence: mahmoud.abdallah@ucdconnect.ie
Abstract: Ransomware is a type of malicious software that encrypts a victim’s files and demands
payment in exchange for the decryption key. It is a rapidly growing and evolving threat that has
caused significant damage and disruption to individuals and organizations around the world. In this
paper, we propose a comprehensive ransomware classification approach based on the comparison of
similarity matrices derived from static, dynamic analysis, and visualization. Our approach involves
the use of multiple analysis techniques to extract features from ransomware samples and to generate
similarity matrices based on these features. These matrices are then compared using a variety of
comparison algorithms to identify similarities and differences between the samples. The resulting
similarity scores are then used to classify the samples into different categories, such as families,
variants, and versions. We evaluate our approach using a dataset of ransomware samples and
demonstrate that it can accurately classify the samples with a high degree of accuracy. One advantage
of our approach is the use of visualization, which allows us to classify and cluster large datasets of
ransomware in a more intuitive and effective way. In addition, static analysis has the advantage of
being fast and accurate, while dynamic analysis allows us to classify and cluster packed ransomware
samples. We also compare our approach to other classification approaches based on single analysis
techniques and show that our approach outperforms these approaches in terms of classification
Citation: Yamany, B.; Elsayed, M.S.;
accuracy. Overall, our study demonstrates the potential of using a comprehensive approach based
Jurcut, A.D.; Abdelbaki, N.; Azer, M.A.
A Holistic Approach to Ransomware
on the comparison of multiple analysis techniques, including static analysis, dynamic analysis,
Classification: Leveraging Static and and visualization, for the accurate and efficient classification of ransomware. It also highlights the
Dynamic Analysis with Visualization. importance of considering multiple analysis techniques in the development of effective ransomware
Information 2024, 15, 46. https:// classification methods, especially when dealing with large datasets and packed samples.
doi.org/10.3390/info15010046
Keywords: dynamic analysis; encryption; honeypot; Jaccard index; malware; machine learning;
Academic Editor: Ruggero Lanotte
ransomware; similarity matrix; shared code analysis; static analysis
Received: 11 December 2023
Revised: 30 December 2023
Accepted: 5 January 2024
Published: 14 January 2024 1. Introduction
Malware analysis is the act of finding, comprehending, and minimizing the potential
damage caused by malicious software, such as ransomware in ref. [1]. It is a crucial compo-
Copyright: © 2024 by the authors.
nent of cybersecurity since it enables organizations and individuals to defend themselves
Licensee MDPI, Basel, Switzerland. against the numerous types of malware that might infect their systems and data. Malware
This article is an open access article analysis employs a variety of tools and methodologies, including static analysis, dynamic
distributed under the terms and analysis, sandbox analysis, and reverse engineering. These methods can be used to analyze
conditions of the Creative Commons the code and behavior of malware and to identify indicators of compromise (IOCs) that
Attribution (CC BY) license (https:// can be used to detect and categorize malware. Malware analysis is an important part
creativecommons.org/licenses/by/ of defending against ransomware as it allows organizations and individuals to identify
4.0/). and mitigate the potential harm caused by ransomware before it can cause significant
Paper Contribution
In this paper, our primary focus was a meticulous examination of a substantial dataset
containing ransomware samples, embedded within a broader corpus. This extensive analy-
sis led to the identification of a subset of ransomware samples sharing notable similarities.
Subsequently, we conducted a rigorous assessment using a similarity matrix-based analysis,
incorporating both static and dynamic features, with the overarching goal of offering a
comprehensive evaluation that highlights the respective merits and limitations of each
analytical approach.
Beyond our innovative methodological approach, we conducted a thorough survey
and comparative examination of established ransomware detection methodologies. Our
study presents an expansive exploration of the ransomware detection ecosystem, cover-
ing various dimensions, including the detection environment, data analysis techniques,
machine learning methodologies, outcomes, evaluation criteria, and a range of available
detection tools.
Within the context of this research, our contributions encompass a multifaceted explo-
ration of the following key aspects:
• Comparative analysis of infection behaviors across various ransomware families.
• Utilization of data visualization methods for the identification of similar ransomware
samples within extensive datasets.
• Employing a similarity matrix approach for the analysis of static and dynamic features
in ransomware samples.
• Assessment of the merits and limitations associated with static and dynamic feature
analysis.
• Comprehensive survey and comparative evaluation of varied ransomware detection
approaches, alongside an in-depth exploration of the ransomware detection ecosystem.
• Development and proposal of an automated methodology for extracting diverse
feature sets from ransomware samples.
The remainder of this paper is organized as follows. In Section 2, we present an
overview of the efforts that have been made in the literature to develop ransomware
detection approaches. We survey the different techniques that have been proposed and
analyze the criteria, parameters, and tools used in the ransomware detection ecosystem. In
Section 3, we provide a background on the different static and dynamic features that have
been used in ransomware tracking systems as well as the visualization techniques that have
been proposed for ransomware classification. In Section 4, we describe our system setup
and present the results and analysis of our proposed approach for extracting the malware’s
static features and classifying ransomware samples. We also compare our results to those
of other approaches proposed in the literature. Finally, in Section 5, we provide conclusions
and discuss future work in the field of ransomware detection. We outline the challenges
and opportunities that exist for improving the accuracy and effectiveness of ransomware
detection and highlight the potential impacts of these advances on cybersecurity.
2. Related Work
In this section, we aim to delve deeper into the related work, refine the problem
statement by addressing its limitations, and provide additional context regarding the
categorization of ransomware. Within the scope of this paper, our objective is to conduct a
comprehensive survey of the diverse spectrum of ransomware detection methodologies
and techniques as delineated in the existing literature. Beyond this survey, we undertake
a meticulous analysis of the varied criteria, parameters, and tools employed within the
broader ransomware detection ecosystem. The overarching goal is to furnish readers
with an in-depth understanding of the contemporary landscape in ransomware detection,
including both its advancements and inherent challenges. However, it is imperative to
acknowledge certain limitations in this pursuit. Firstly, the rapidly evolving nature of
ransomware demands a continuous update of detection methodologies, and as such, some
state-of-the-art techniques may not be covered if they have emerged after our knowledge
Information 2024, 15, 46 4 of 29
be the most effective approach. Identifying the malware family to which a new sample
belongs is a common necessity in malware analysis. One prevalent approach involves
subjecting the sample to a multi-engine antivirus scanner, such as VirusTotal. However,
outcomes from these scanners can sometimes lack clarity and accuracy as malware is often
tagged with generic labels like “generic”, offering little meaningful information. Addi-
tionally, malware creators may actively monitor the VirusTotal database, modifying their
code or functions to evade detection. An alternative method for malware analysis involves
executing the sample within a controlled sandbox environment, such as CuckooBox, to
gather insights into the malware’s behavior and communication with callback servers.
While this approach can yield valuable insights, it can be time-consuming and less efficient
when dealing with extensive malware datasets. A distinctive and automated approach
to malware analysis, as introduced in ref. [10], is shared code analysis or similarity check
analysis. This technique compares two malware samples by quantifying the proportion
of the recompiled source code they share. Unlike shared attribute analysis, which relies
on external characteristics, shared code analysis swiftly and accurately classifies malware,
particularly within large datasets. Nevertheless, it is crucial to assess the limitations of
this method and utilize it in conjunction with other analysis techniques as needed. In the
context of malware analysis and ransomware, ref. [11] offers a comparative exploration of
various analysis approaches and ransomware typologies, shedding light on their respective
behaviors and characteristics; Section 2.1, “Ransomware Detection Approaches”, outlines
various techniques. Machine learning leverages known ransomware datasets for classifi-
cation. Behavioral analysis observes malware execution, analyzing network activity, file
operations, and system resource usage.
2.1.2. Honeypots
Honeypots are valuable tools for gathering information about attacks, including the
identification of users and the extent of their activities, aiding in informed decision-making
for defense strategies. The primary objective of deploying honeypots is to acquire insights
into ongoing attacks and utilize that intelligence to bolster security measures. To enhance
user awareness, email notifications are sent, occasionally advising users to disconnect
network cables as a precautionary measure. This user training aspect adds an extra layer of
security awareness, making honeypots an effective means to detect ransomware attacks. In
ref. [16], the authors employed a combination of methods, including machine learning for
grouping cases and Honeypots to capture potentially malicious packages. Classification
tasks utilize Decision Trees and Support Vector Machine (SVM). The study suggests the
potential of architectural solutions for malware detection. Ref. [17] introduced an Intrusion
Detection Honeypot (IDH), comprising Honeyfolder, Audit Watch, and Complex Event
Processing (CEP). IDH is designed to mimic vulnerability while also functioning as an early
warning system, notifying users of suspicious file activity. Ref. [18] presented a deception
method involving Honeyfiles and Honeytokens, designed to access compromised private
files and detect hacking or ransomware attempts. The hypothesis explores the use of
honeypots combined with machine learning for malware detection. In ref. [19], data from
an Internet of Things (IoT) honeypot were effectively employed to train a dynamic machine
learning model. This highlights the dynamic nature of honeypot-driven machine-learning
techniques. Ref. [20] suggested a framework utilizing an Intrusion Prevention System
(IPS) gateway, an analytical system, and honeypots to detect and identify ransomware.
The framework encompasses six elements: IPS, gateway, static detector, dynamic detector,
honeynet, and a notification component, collectively contributing to effective ransomware
detection and user notification. These studies underscore the versatility and potential
of honeypot-driven approaches, often combined with machine learning techniques, for
enhancing ransomware detection and overall cybersecurity.
2.1.3. Statistics
To better understand the characteristics of ransomware, it may be possible to employ
statistical analysis. One prominent method of detecting ransomware is using statistical
analyses, which can identify unpredictable behavior and be used to flag the presence of
encryption. Based on the frequency of opcodes in the portable executable file, the authors
in ref. [21] proposed an approach for detecting malware. The study used a machine learn-
ing system to detect false positives, false negatives, true positives, and true negatives in
Information 2024, 15, 46 7 of 29
malware. While the authors in ref. [22] proposed a method for finding malware. This
research employed a machine learning algorithm to identify malware with varying degrees
of accuracy. The method of malware detection was developed by the authors using a
similarity measurement algorithm. The proposed method was meant to boost malware
detection times and throughput. This methodology has various advantages over others,
including increased speed by using opcodes directly and improved detection outcomes
from being immune to obfuscation and disassembly methods in ref. [23]. Another approach
for malware was classification presented in ref. [24] inspired by the aesthetic similarities
across viruses in the same family, this work proposes binary texture analysis over greyscale
photos generated directly from malware executables. This technique provides statistical
texture features of the second order over the graphical representation of malware. This
strategy cannot be fooled by common methods of concealment (e.g., packing, code relo-
cation, and encryption). Five malware detection metrics were assessed in the absence of
ground truth, a real-world scenario that poses various technical challenges, the end goal
was to develop fully automated, principled methods to assess these indications with the
highest possible precision. Estimators of statistical significance were provided for the five
measures used to identify malware. These statistical estimators were shown to be accurate
by comparison to the known truth and fictional data. This large dataset was obtained from
VirusTotal, and the estimators were then utilized to measure and quantify five metrics in
ref. [25]. Several methods proposed in the literature make use of multiple strategies. The
benefits and drawbacks of various ransomware detection strategies are summarized in
Table 1.
Ransomware
Detection Ref. Description Advantages Disadvantages
Approach
The most used machine learning One of the main
techniques in ransomware advantages of using
detection include supervised machine learning for
learning, unsupervised learning, ransomware detection is
and semi-supervised learning. that it allows for the
Supervised learning involves automatic identification of
training a model on labeled data, patterns and relationships
Machine learning algorithms
where the input and output are within large datasets. This
can be vulnerable to bias and
both known. This allows the can be particularly useful
can produce inaccurate results
model to make predictions based for identifying new and
if the training data are not
on the relationships learned emerging threats, as the
representative of the
Machine Learning [12–15] from the training data. model can learn from past
real-world data. They also
Unsupervised learning involves data to identify patterns
require frequent retraining to
training a model on data where and make predictions
ensure that they continue to
the output is not known, and the about future threats.
perform well as the data
model must find patterns and Machine learning
distribution changes.
relationships within the data on algorithms can also be
its own. Semi-supervised trained on a wide variety
learning is a combination of of data types, including
supervised and unsupervised text, images, and audio,
learning, where the model is which makes them useful
trained on a mix of labeled and for detecting ransomware
unlabeled data. in different formats.
Information 2024, 15, 46 8 of 29
Table 1. Cont.
Ransomware
Detection Ref. Description Advantages Disadvantages
Approach
One advantage of using a There are also some
honeypot is that it allows disadvantages to using
researchers to gather honeypots. One potential issue
Honeypots are a type of decoy valuable data and is the risk of false positives,
system that is designed to attract intelligence about the where legitimate activity is
and detect malware or tactics, techniques, and mistaken for malicious activity.
cyber-attacks. They are used to procedures (TTPs) used by Another issue is the cost and
lure attackers into a controlled attackers. This information resources required to maintain
and isolated environment, where can be used to improve the and operate a honeypot, as
Honeypot [16–20]
their actions can be observed effectiveness of well as the potential legal and
and studied. By setting up a ransomware detection and ethical considerations.
honeypot, it is possible to prevention measures. Additionally, honeypots may
monitor and track ransomware Additionally, honeypots not be suitable for all types of
activity and identify new strains can help mitigate the environments or organizations
or variants of the malware. impact of ransomware and may not provide
attacks by preventing the comprehensive protection
malware from reaching the against all types of
target system or data. ransomware attacks.
The disadvantage of this
The statistical analysis approach
approach is that it relies on the
involves collecting and
The advantage of using availability of accurate and
analyzing data about
statistical analysis is that it comprehensive data, which
ransomware behavior to identify
allows researchers to gain a may be difficult to obtain in
patterns and trends. This can be
deeper understanding of some cases. Additionally,
Statistical [21–25] done through various methods,
ransomware behavior and statistical analysis may not be
such as collecting data about the
identify key trends that can able to identify specific
frequency and types of ransom
inform prevention and instances of ransomware in
demands, the types of files
detection efforts. real time, making it less
targeted, and the tactics used by
effective for immediate
ransomware operators.
detection and response.
3. Background
In this section, we define and present the features that affect ransomware tracking and
introduce the different static and dynamic features that have been used for ransomware
tracking. In Section 3.1, we introduce the different types of ransomwares and provide
a brief history of ransomware. We also compare the key features, spreading techniques,
exploitation, and ransomware families of different ransomware types, such as crypto worm,
Ransomware-as-a-Service (RaaS), and Automated Active Adversary ransomware. We also
discuss the role of APT attacks, such as the Shamoon data wiper malware, in ransomware
infections. In Section 3.2, we discuss visualization techniques that are used to represent
and analyze data in a visual form. In the context of ransomware classification, visualiza-
tion techniques can be utilized to graphically represent the relationships and similarities
between different ransomware samples. These techniques can provide a more intuitive
and comprehensive understanding of the data, allowing analysts to identify patterns and
trends that may not be immediately apparent through traditional methods of analysis.
Some common visualization techniques that may be used in ransomware classification
include scatter plots, heat maps, and network graphs. By using these techniques, analysts
can effectively classify, and cluster ransomware samples based on their features and char-
acteristics, enabling more accurate and efficient detection and analysis of these threats.
Finally, in Section 3.3, we discuss the use of static and dynamic features in ransomware
tracking systems and the challenges and opportunities that these features present. Overall,
this section provides a comprehensive overview of the key features and techniques that are
Information 2024, 15, x FOR PEER REVIEW 9 of 32
Information 2024, 15, 46 9 of 29
3.1.in
used Ransomware
ransomware Types and History
tracking and classification as well as the challenges and opportunities
that these approachesclassified
Ransomware, present. as a type of malware, operates by encrypting a victim’s files
and subsequently demanding a ransom in exchange for restoring access to these files in
3.1.
ref.Ransomware Types and
[26]. Notably, History categories of ransomware exist, each with unique
various
Ransomware,
characteristics. classified
These as a type
categories of malware,
encompass crypto operates
wormsbyinencrypting a victim’s files
ref. [27], Human-operated
and subsequently demanding a ransom in exchange for
Ransomware in ref. [28], Ransomware-as-a-Service (RaaS) in ref. [29], and restoring access to these files in
Automated
ref. [26]. Notably,
Active Adversary various categories
ransomware inofref.
ransomware
[30]. Tableexist, each with unique
2 encapsulates characteristics.
the essential features,
These categories
propagation encompass
methods, crypto worms
exploitation in ref.and
strategies, [27], Human-operated
ransomware families Ransomware
associated within
ref. [28], Ransomware-as-a-Service (RaaS) in ref. [29], and Automated
these diverse ransomware types. A specific subtype within the RaaS ransomware category Active Adversary
ransomware
is Advanced inPersistent
ref. [30]. Table
Threat2 encapsulates
(APT) attacks, the essential
typified byfeatures,
instancespropagation methods,
like the Shamoon data
exploitation strategies, and ransomware families associated with these
wiper malware in ref. [31]. APT-33, for instance, has employed such attacks in the Middle diverse ransomware
types. A specific
East and Europe,subtype withinby
often driven thecommercial
RaaS ransomware or military category is Advanced
motives. Persistent
Notably, ransomware
Threat (APT) attacks, typified by instances like the Shamoon data wiper malware
infections can originate from various sources in ref. [32], with the distribution percentages in ref. [31].
APT-33, for instance,
elucidated in ref. [33].hasFigure
employed such attacks
2 visually represents in thetheMiddle
primary East and Europe,
sources oftenfor
of infection
driven by commercial or military motives. Notably, ransomware
most ransomware, which may include phishing emails, APT attacks, system infections can originate
from various sources
vulnerabilities, in ref.downloads,
drive-by [32], with the and distribution
exploit kits. percentages
An in-depth elucidated in ref. of
exploration [33].
the
Figure 2 visually represents the primary sources of infection for
history of ransomware has been undertaken by the authors in ref. [34]. In Table 3, amost ransomware, which
may include phishing
chronological account emails, APT attacks,
of significant system vulnerabilities,
ransomware drive-by downloads,
attacks is summarized, including andthe
exploit kits. An in-depth exploration of the history of ransomware has been undertaken by
attack date, the responsible ransomware family, and the resultant damage. Broadly,
the authors in ref. [34]. In Table 3, a chronological account of significant ransomware attacks
ransomware can be categorized into two principal subgroups: locker ransomware in ref.
is summarized, including the attack date, the responsible ransomware family, and the
[35] and crypto ransomware in ref. [36]. Locker ransomware restricts access to a device,
resultant damage. Broadly, ransomware can be categorized into two principal subgroups:
often by imposing an additional password requirement to access the device. In contrast,
locker ransomware in ref. [35] and crypto ransomware in ref. [36]. Locker ransomware
crypto ransomware identifies and encrypts valuable data located on the victim’s device.
restricts access to a device, often by imposing an additional password requirement to access
the device. In contrast, crypto ransomware identifies and encrypts valuable data located
Table 2. Comparison between ransomware malware behavior types.
on the victim’s device.
Human-Operated Ransomware-as-a-Service
Crypto Worm
Table 2. Comparison between ransomware malware Automated Active Adversary
Ransomware (RaaS)behavior types.
Ransomware-as-a-Service
Key Features Self-propagating Targeted attacksHuman-Operated Ransomware-as-a- Advanced Automated
evasion Active
tactics
Crypto Worm modelService (RaaS)
Ransomware Adversary
Spreading tech-
Wormhole Social engineering Ransomware-as-a- Customized
Advanced
Key Features
niques Self-propagating TargetedEmail
attacksattachments, web links attack vectors
Service model evasion tactics
Exploitation Vulnerabilities in Targeted vulnerabili- Email attachments, Customized
Spreading techniques Wormhole Social engineering
Vulnerabilities inweb
systems Customized exploits
techniques systems ties links attack vectors
Exploitation Vulnerabilities Vulnerabilities
Detection mod- User awareness, net-vulnerabilities
Targeted Network monitoring,
Customized user
exploits
techniques Antivirus in systems Antivirus, network inmonitoring
systems
ules work monitoringUser awareness, Antivirus, network awareness
Network monitoring,
Detection modules Antivirus
Ransomware network monitoring monitoring user awareness
Ransomware
Family Exam-Family WannaCry Ryuk REvil SolarWinds
WannaCry Ryuk REvil SolarWinds
Example
ple
Figure 3. 3.
Figure Shared features
Shared between
features two
between malware
two samples
malware [45].
samples [45].
The
The Jaccard
Jaccard index
index is is a measure
a measure ofof
thethe similarity
similarity between
between twotwo sets
sets of of data
data in in
ref.ref. [46].
[46].
It Itis iscalculated
calculatedbybydividing
dividing the
the size
size ofof theintersection
the intersection ofof
thethe two
two sets
sets byby the
the size
size ofofthe the
union of the two sets. The Jaccard index is often used in cybersecurity
union of the two sets. The Jaccard index is often used in cybersecurity to measure the to measure the
similarity between different malware samples, and it can be particularly
similarity between different malware samples, and it can be particularly useful for useful for tracking
the evolution
tracking of different
the evolution ransomware
of different families over
ransomware time. over
families By calculating
time. By the Jaccard index
calculating the
for different pairs of ransomware samples, analysts can identify
Jaccard index for different pairs of ransomware samples, analysts can identify how how similar or dissimilar
similar
or dissimilar they are and can use this information to better understand the TTPs of the
different families. The Jaccard index has emerged as the most generally adopted—and
with good reason. It quantifies the degree of overlap between two sets of malware features
simply and sensibly, providing us with the percentage of unique features common to both
sets normalized by the percentage of unique features in each group in ref. [47] (JI =
The Jaccard index is a measure of the similarity between two sets of data in ref. [46].
It is calculated by dividing the size of the intersection of the two sets by the size of the
union of the two sets. The Jaccard index is often used in cybersecurity to measure the
Information 2024, 15, 46 similarity between different malware samples, and it can be particularly useful for
12 of 29
tracking the evolution of different ransomware families over time. By calculating the
Jaccard index for different pairs of ransomware samples, analysts can identify how similar
or dissimilar
they theyuse
are and can arethis
and can use this
information to information to better
better understand understand
the TTPs the TTPsfamilies.
of the different of the
different
The Jaccardfamilies.
index The Jaccard index
has emerged as thehas
mostemerged
generallyas adopted—and
the most generally adopted—and
with good reason. It
with good reason.
quantifies It quantifies
the degree of overlapthe degree two
between of overlap
sets of between
malwaretwo sets of
features malware
simply and features
sensibly,
simply andus
providing sensibly,
with theproviding us with
percentage the percentage
of unique of uniquetofeatures
features common both setscommon to both
normalized by
sets normalizedofby
the percentage the percentage
unique of unique
features in each groupfeatures in (JI
in ref. [47] each group in ref.
= intersection [47] (JI =
length/union
length). Thelength/union
intersection Jaccard Indexlength).
explanation is shown
The Jaccard in Figure
Index 4.
explanation is shown in Figure 4.
Figure
Figure4.
4.Jaccard
JaccardIndex
Indexbetween
betweentwo
twomalware
malwaresamples.
samples.
UsingN-grams
Using N-gramstototrack
trackthe
theevolution
evolutionof ofransomware
ransomwarefamilies
familiescan
canbe
beaapowerful
powerfultooltool
for cybersecurity
for cybersecurityprofessionals
professionals in in ref.
ref. [48].
[48]. By
By extracting
extracting subsequences
subsequencesofof specific
specific lengths
lengths
from sequential
from sequential data
data and
and comparing
comparing them them using
using aa similarity
similarity function,
function, itit isis possible
possible to
to
determine the level of code commonality between two malware samples. This can be
especially useful for identifying patterns and trends in the TTPs of different ransomware
families and can help analysts develop more effective defense and response strategies.
The similarity function used in this process should have certain properties to ensure
accurate and reliable results. It should produce a normalized value that allows all similarity
comparisons to be made on the same scale, and it should be able to accurately estimate
the amount of code sharing between two samples. Additionally, it should be able to
provide insight into why it performs well in modeling code similarities. Overall, the use
of N-grams and a similarity function can be an effective way to track the evolution of
ransomware families and better understand their TTPs. By extracting and comparing
subsequences of specific lengths, analysts can identify common patterns and trends and can
use this information to develop more effective defense and response strategies in ref. [49].
We employ a similarity function with the following properties to determine the level of
code commonality between two malware samples shown in Figure 5. In the provided
figure, each number corresponds to a distinct malware sample included in the analysis.
The purpose of these numbers is to uniquely identify and label each malware instance
for clarity. The arrows in the figure represent the presence of similar n-gram features
between different malware samples. Specifically, the direction of the arrows indicates
the connection from the source malware sample to the target sample, demonstrating a
shared set of n-gram features. This visual representation highlights the commonalities
in the n-gram patterns found in the corresponding malware instances. By examining the
arrows and associated numbers, one can gain insights into the relationships and similarities
among the various malware samples based on their n-gram features. This analysis aids in
understanding the potential connections and patterns within the dataset, contributing to a
more comprehensive comprehension of the malware landscape under investigation.
gram features. This visual representation highlights the commonalities in the n-gram pat-
terns found in the corresponding malware instances. By examining the arrows and asso-
ciated numbers, one can gain insights into the relationships and similarities among the
various malware samples based on their n-gram features. This analysis aids in under-
Information 2024, 15, 46
standing the potential connections and patterns within the dataset, contributing to a13more
of 29
comprehensive comprehension of the malware landscape under investigation.
Figure5.
Figure 5. N-gram
N-gram extracted
extracted from
fromransomware
ransomwaresamples.
samples.
4.
4. Experimental
Experimental WorkWorkand
andDetection
DetectionScheme
Scheme
In this section, we present the experimental
In this section, we present the experimental work workdonedone to study
to study ransomware
ransomware vi-
visual-
sualization techniques and shared static and dynamic features between
ization techniques and shared static and dynamic features between different ransomware different ran-
somware
samples. samples.
Ransomware Ransomware visualization
visualization techniques
techniques are presented
are presented in Section
in Section 4.1,shared
4.1, while while
shared static and dynamic features are presented in Sections 4.2 and 4.3 respectively.
static and dynamic features are presented in Sections 4.2 and 4.3 respectively. In Section In
Section 4.4, our lab setup is presented. Time complexity is presented in Section
4.4, our lab setup is presented. Time complexity is presented in Section 4.5. Finally, in4.5. Finally,
in Section 4.6, we present the results from static and dynamic analyzers.
Section 4.6, we present the results from static and dynamic analyzers.
4.1. Visualization Techniques
4.1. Visualization Techniques
In our approach to using visualization techniques to classify and analyze ransomware
In our
samples, weapproach
started bytoselecting
using visualization
a dataset of techniques
ransomware tosamples
classify (most
and analyze
matched ransom-
ones)
ware samples, we started by selecting a dataset of ransomware samples
and then applied a similarity matrix using a static and dynamic analyzer to find a fast (most matched
ones)
and and then
suitable wayapplied a in
to use it similarity
our finalmatrix usingToa identify
approach. static andthedynamic analyzer
most similar to find
samples, wea
fast and
used suitable
a cluster way to
engine to analyze
use it in the
ourdata
finaland
approach. To identify
report the samplesthe most
with the similar
highestsamples,
level of
similarity. We then used static and dynamic analysis techniques to generatehighest
we used a cluster engine to analyze the data and report the samples with the level
a similarity
of similarity. We then used static and dynamic analysis techniques to generate
matrix for each group of samples. This matrix allowed us to visualize the relationships a similarity
matrix for
between theeach groupsamples
different of samples. This matrix
and identify allowed
patterns us to visualize
and trends the Once
in the data. relationships
we had
between the different samples and identify patterns and trends in the data.
generated the similarity matrix, we used it to validate the query-sample similarity with Once we had
the
matched samples. This helped us to confirm that the samples in the first group were indeed
the most similar ones and allowed us to identify any discrepancies or errors in the data.
Constructing nodes and connections between them helps to view and graph the data’s
connections. In other words, each sample is a node, and we may connect them and declare
they are comparable if they have similar DLL imports.
• The cluster engine reported the most similar samples from the set.
• There is a need to validate the query-sample similarity with the matched samples.
• It is also important to reveal intelligence from the data and discover the patterns.
The graphical representation in Figure 6 elucidates the Vendors Detection for a collec-
tion of ransomware samples. It is worth emphasizing that not all security vendors have
uniformly detected every sample within this dataset. This observation underscores the in-
herent variability in ransomware detection rates across different security solutions, thereby
emphasizing the critical need for robust and comprehensive cybersecurity strategies. In
the ensuing discussion, we will delve deeper into the implications of these detections. The
samples characterized by a consistent segment count are indicative of non-packed samples,
reflecting their unaltered and original nature within the dataset. This differentiation is
instrumental in our analysis of the dataset’s composition and assists in identifying potential
trends or variations among the samples. The numerical results for visualization techniques
can be found in Table 4.
The data depicted in Figure 7 reveals a noteworthy observation concerning the sample
sizes utilized within the context of this study. It is evident from the graphical represen-
tation that a predominant portion of the collected samples exhibited uniformity in their
respective sizes.
Information 2024, 15, 46 14 of 29
Information 2024, 15, x FOR PEER REVIEW 15 of 32
60
50
Number of Detections
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90
Sample ID
Figure 8, which illustrates the counts of the Import Address Table (IAT) across various
samples, unequivocally highlights a remarkable consistency in these counts across all
the analyzed samples. This compelling consistency within the IAT counts underscores
the robustness of this static feature as a key determinant for effectively classifying and
clustering ransomware samples within our laboratory experiments.
Figure 9 provides an insightful depiction of the counts of internal functions across
the examined samples. Notably, a striking similarity becomes evident as one observes
the distribution of these internal function counts. This remarkable uniformity among
the samples in terms of internal function counts further solidifies the findings from our
laboratory experiments, affirming the robustness of our research results.
In Figure 10, we observe the counts of segments within portable executable (PE) files.
This analysis allows us to discern between packed and non-packed samples in our dataset.
Information 2024, 15, 46
Information 2024, 15, x FOR PEER REVIEW 16 of1532of 29
55,600
55,550
Sample Size in Bytes
55,500
55,450
55,400
55,350
Information 2024, 15, x FOR PEER REVIEW 17 of 32
0 10 20 30 40 50 60 70 80 90
Sample ID
100
80
Number of Imports
60
40
20
0
0 10 20 30 40 50 60 70 80 90
Sample ID
180
160
Number of Internal Functions
140
120
100
80
60
40
20
0
0 10 20 30 40 50 60 70 80 90
Sample ID
5
Number of Segments
0
0 10 20 30 40 50 60 70 80 90
Sample ID
controller uses to classify the binaries. The disassembler is an important part of the analyzer
server, as it is responsible for breaking down the samples into their constituent parts and
extracting the relevant features. MongoDB is used to index all the extracted features so
that they can be queried and analyzed using the Jaccard Index similarity function. This
allows analysts to identify common patterns and trends quickly and easily in the TTPs of
different ransomware families and to better understand the relationships between different
samples. VirusTotal is another important tool in our lab setup, as it provides a range of
clustering and similarity-matching capabilities that allow analysts to group and classify
different samples based on their features and attributes. It also includes a comprehensive
graph view that enables analysts to visualize the relationships between different malware
objects and to better understand how they are associated with specific campaigns. Overall,
the lab setup for our proposed malware indexing system is designed to provide analysts
with the tools and resources they need to effectively analyze and classify different types of
malware, including ransomware. It includes a range of machines and tools that support the
disassembly and analysis of executable binaries, as well as powerful indexing and querying
capabilities that enable analysts to identify common patterns and trends in the TTPs of
different malware families. The full-matched ransomware classification and detection
system diagram is illustrated in Figure 13.
Table 5. Cont.
4.6. Results
The results of our analysis show that the use of minhash and Jaccard index for feature
comparison is an effective method for accurately estimating the degree of code sharing
between different ransomware samples. By applying minhash to the strings, Import
Address Table, and API call features extracted from our ransomware samples, we were
able to identify highly similar samples with a high degree of accuracy. This approach
allowed us to cluster the samples into distinct groups, enabling us to identify relationships
between different ransomware families and variants more easily. In addition to the minhash
and Jaccard index, we also employed other visualization techniques, such as the use of
graph networks and dendrograms, to further aid in the analysis and interpretation of the
data. These techniques allowed us to visually explore the relationships between different
malware samples and identify patterns and trends that would have been difficult to discern
using other methods. In our proposed approach, the first step is to store the ransomware
samples in a database or repository. This can be done by manually collecting the samples
or using an automated tool to gather them from various sources such as online scanners or
honeypots. Next, the samples are indexed using a variety of features such as strings, Import
Address Table, or API calls. These features are extracted from the samples using static or
dynamic analysis techniques and stored in the database for later use. Once the samples are
indexed, analysts can search for specific samples or groups of samples using various search
criteria such as ransomware family, encryption algorithm, or date of discovery. Finally,
the similarity between the samples can be visualized using various techniques such as
clustering or similarity matrices. These visualizations can help analysts quickly understand
the relationships between different ransomware samples and identify patterns or trends in
the data.
i. Strings-Based Similarity
We propose a method for identifying the similarity between different ransomware
samples using strings as a feature. By extracting all contiguous printable sequences of char-
acters from the samples and generating the Jaccard index between all pairs of ransomware
samples based on their common string relationships, we can compute the strings-based
ransomware similarity. Strings taken from a binary tend to be format strings established
by the programmer, which compilers in general do not transform, regardless of which
compilers the ransomware authors use or what parameters they provide the compilers.
This strategy allows us to bypass the compiler difficulty and accurately identify similarities
between different ransomware samples. The similarity matrix generated using extracted
static strings as a feature is illustrated in Figure 14.
In our static analysis, the absolute time required per sample is consistently 5 s, indica-
tive of the efficiency of our static analyzer. Additionally, the absolute Jaccard index for
similarity among the samples is 0.3. This Jaccard index value highlights the fast-processing
nature of our static analysis; however, it is important to note that a Jaccard index of
0.3 signifies a lower level of accuracy in capturing similarities between the samples. This
trade-off between speed and accuracy is a key consideration in our approach, aiming to
strike a balance that aligns with the requirements of timely detection.
Information 2024, 15, x FOR PEER REVIEW 26 of 32
Information 2024, 15, 46 of 0.3 signifies a lower level of accuracy in capturing similarities between the samples.
23 of 29
This trade-off between speed and accuracy is a key consideration in our approach, aiming
to strike a balance that aligns with the requirements of timely detection.
Figure14.
Figure 14.The
Thesimilarity
similaritymatrix
matrixgenerated
generatedusing
usingstring
stringfeatures.
features.
ii. ii.
ImportImport Address
Address Table–Based
Table–Based Similarity
Similarity
Ransomwareanalysts
Ransomware analystsand andreverse
reverseengineers
engineerscan canuse usethetheImport
ImportAddress
AddressTable Table(IAT)
(IAT)
featuretotoidentify
feature identifythe theshared
sharedcodecodebetween
betweendifferent
differentransomware
ransomwaresamples. samples.By Bycomparing
comparing
theIAT
the IATofoftwo twosamples,
samples,analysts
analystscan candetermine
determinethe theextent
extentto towhich
whichthe thesamples
samplesuse usethe
the
same imported DLLs and functions. This information
same imported DLLs and functions. This information can be useful in identifying the can be useful in identifying the re-
relationships between different ransomware families and in understanding the evolutionof
lationships between different ransomware families and in understanding the evolution
individual
of individual families
families over
overtime.
time. To To
generate
generate thetheIAT-based
IAT-based similarity
similaritymatrix,
matrix,analysts
analystscan
extract
can extractthe the
IATIAT from eacheach
from sample
sampleandandcompute
compute the Jaccard
the Jaccard index between
index between all pairs of sam-
all pairs of
ples based
samples on their
based on theircommon
common IAT IAT
entries. The resulting
entries. The resulting matrix can then
matrix be visualized
can then using
be visualized
a variety
using of techniques,
a variety of techniques,such such
as clustering
as clustering or network
or network analysis, to identify
analysis, to identify patterns and
patterns
trends
and within
trends withinthe the
data. By using
data. By using the the
IATIATfeature
featurein combination
in combination with other
with static
other andand
static dy-
dynamic analysis techniques, analysts can gain a more comprehensive understanding the
namic analysis techniques, analysts can gain a more comprehensive understanding of of
relationships
the relationships between
between different
differentransomware
ransomware samples
samples andandcancanmore effectively
more effectivelyclassify and
classify
and cluster
cluster them them for further
for further analysis.
analysis. Overall,
Overall, the usethe of use theofIAT
the feature
IAT feature in ransomware
in ransomware anal-
analysis
ysis can can greatly
greatly improve
improve the efficiency
the efficiency and accuracy
and accuracy of malwareof malware classification
classification and
and cluster-
clustering
ing efforts. efforts. The similarity
The similarity matrixmatrix generated
generated usingusing the extracted
the extracted staticstatic
Import Import
addressaddress
table
table as a feature
as a feature is illustrated
is illustrated in Figure
in Figure 15. In15.ourIn our import
import address address tableanalysis,
table (IAT) (IAT) analysis,
the ab-
the absolute time required for processing each sample
solute time required for processing each sample falls within the range of 5 to falls within the range of 10
5 tos. 10 s.
This
This indicates
indicates the efficiency
the efficiency of ourofIAT ouranalysis,
IAT analysis,
striking striking
a balance a balance
between between
speed and speed and
compre-
comprehensive
hensive examination. examination.
Notably,Notably, the absolute
the absolute Jaccard index Jaccard for index for similarity
similarity among samples amongin
samples
the context in the context
of IAT of IAT
analysis is analysis
0.86. Thisishigh
0.86.Jaccard
This high indexJaccard
valueindex
attestsvalue
to the attests
accuracyto theof
accuracy of our IAT analysis, showcasing its effectiveness in
our IAT analysis, showcasing its effectiveness in capturing similarities between samples. capturing similarities between
samples. This combination
This combination of relatively of relatively fast processing
fast processing time and atime highand a high
Jaccard Jaccard
index index
underlines
underlines the efficacy of our approach in achieving both
the efficacy of our approach in achieving both speed and accuracy in import address table speed and accuracy in import
address
analysis. table analysis.
Information 2024,
Information 15,15,
2024, 46x FOR PEER REVIEW 24
27ofof2932
Figure15.
Figure 15.The
Thesimilarity
similaritymatrix
matrixgenerated
generatedusing
usingthe
theImport
ImportAddress
AddressTable
Table(IAT)
(IAT)feature.
feature.
Ransomware’s
Ransomware’sclusteringclusteringisisuseful
usefulfor forgrouping
groupingaalarge
largesetsetofofsamples
samplesinto intoa aknown
known
ororunknown number of groups or clusters, with objects in each
unknown number of groups or clusters, with objects in each cluster having a high cluster having a high de-
degree
gree ofofsimilarity
similarityand andobjects
objectsinin other
other clusters
clusters being
being dissimilar.
dissimilar. WeWe proposed
proposed an an effi-
efficient
cient malware
malware indexing
indexing systemsystemthatthat provides
provides search
search functionalities,
functionalities, similarity
similarity checking,
checking, and
and sample
sample classification
classification andand clustering.
clustering. TheThe system
system mainly
mainly targets
targets native
native binarybinary
files.files.
The
The indexing engine depends on hybrid data from static features
indexing engine depends on hybrid data from static features extraction, comparing differ- extraction, comparing
different ransomware
ent ransomware families
families to to find
find thethesimilarity
similaritymatrix
matrixbetween
between those
those samples.
samples. We Wecom- com-
pared different static features by checking the similarity matrix for
pared different static features by checking the similarity matrix for different ransomware different ransomware
families.
families.OurOurresearch
researchhas hasproven
proventhat thatthetheImport
ImportAddress
AddressTable
Table(IAT)
(IAT)isisthe
thebest
bestfeature
feature
for
for finding similar ransomware samples. The limitations in finding similaritiesbetween
finding similar ransomware samples. The limitations in finding similarities between
ransomware
ransomwaresamples samples areare
thethe
classification
classification andandclustering of the
clustering of packed
the packedsamples. Therefore,
samples. There-
we focused on using a dynamic analyzer integrated with sandboxing
fore, we focused on using a dynamic analyzer integrated with sandboxing to extract to extract dynamic dy-
features like API calls.
namic features Using
like API dynamic
calls. Usinganalyzer
dynamicand static analyzer
analyzer and static features andfeatures
analyzer comparing and
different features-based similarity matrices will help in clustering and classifying packed
comparing different features-based similarity matrices will help in clustering and classi-
and unpacked ransomware samples.
fying packed and unpacked ransomware samples.
iii. API calls-Based
iii. API Similarity
calls-Based Similarity
ToTofind
findsimilarities
similaritiesbetween
betweenransomware
ransomwaresamples,
samples,we weutilized
utilizedAPI
APIcalls
callsasasa adynamic
dynamic
feature. By analyzing the API calls made by a sample during runtime
feature. By analyzing the API calls made by a sample during runtime through sandboxing, through sandboxing,
wewewere
wereableabletotoextract
extractvaluable
valuableinformation
informationabout aboutthe
thesample’s
sample’sbehavior
behaviorand anduse useitittoto
compare with other samples. This method proved particularly
compare with other samples. This method proved particularly effective in identifyingeffective in identifying
packed
packedsamples,
samples,which
whichcan canoften
oftenbebedifficult
difficulttotoclassify
classifyusing
usingstatic
staticfeatures
featuresalone.
alone.Using
Using
API
APIcalls
callsasasa adynamic
dynamicfeature
featureallowed
allowedusustotoaccurately
accuratelycluster
clusterand
andclassify
classifya alarge
largedataset
dataset
ofofransomware
ransomwaresamples,samples,including
includingboth
bothpacked
packedand andunpacked
unpackedsamples.
samples.By Bycomparing
comparingthe the
API
API call similarity matrix between different ransomware families, we were abletotoidentify
call similarity matrix between different ransomware families, we were able identify
shared
sharedbehavior
behaviorand and characteristics that helped
characteristics that helpedus usbetter
betterunderstand
understandthe the relationships
relationships be-
between
tween different samples. The similarity matrix generated using extracted dynamicAPI
different samples. The similarity matrix generated using extracted dynamic API
calls
callsasasa afeature
featureisisillustrated
illustratedininFigure
Figure16.16.InInour
ourdynamic
dynamicanalysis
analysisofofmethod
methodAPI APIcalls,
calls,
the
the absolute time required for processing each sample typically ranges from 30 s to 60s,s,
absolute time required for processing each sample typically ranges from 30 s to 60
contingent upon the complexity of the sample. Despite the relatively longer processing
contingent upon the complexity of the sample. Despite the relatively longer processing
time, this method is designed to provide a thorough and detailed analysis of the dynamic
behavior of samples.
Remarkably, the absolute Jaccard index for similarity among samples in the context
Information 2024, 15, 46 of dynamic analysis method API calls is 1. This perfect matching Jaccard index signifies 25 of 29
full similarity, indicating that the dynamic analysis method precisely identifies identical
Information 2024, 15, x FOR PEER REVIEW 28 of 32
patterns across samples. While the method requires more time for analysis, the perfect
time, this Jaccard
matching methodunderscores
is designed toits provide a thorough
high accuracy and detailed
in capturing analysis
similarities of the samples,
between dynamic
behavior
making
time, of samples.
thisitmethod
a robust tool for comprehensive
is designed dynamic
to provide a thorough andanalysis.
detailed analysis of the dynamic
behavior of samples.
Remarkably, the absolute Jaccard index for similarity among samples in the context
of dynamic analysis method API calls is 1. This perfect matching Jaccard index signifies
full similarity, indicating that the dynamic analysis method precisely identifies identical
patterns across samples. While the method requires more time for analysis, the perfect
matching Jaccard underscores its high accuracy in capturing similarities between samples,
making it a robust tool for comprehensive dynamic analysis.
Figure
Figure 16.
16. The
The similarity
similarity matrix
matrix generated
generated using
using the API
API Call
Call feature.
feature.
Remarkably,
To the absolute
provide a concise Jaccard index for
and comprehensive similarity
overview among
of our samples in
ransomware the context
classification
of dynamic
system, analysisa method
we present detailed API calls is 1.ofThis
comparison perfect matching
key features, Jaccard index
time complexities, and signifies
analysis
full similarity,
methods in theindicating that the and
form of a diagram dynamic
table.analysis method
The diagram precisely
illustrated inidentifies
Figure 17identical
visually
patterns
Figure 16. across
The samples.
similarity matrixWhile the
generated method
using the requires
API Call more
feature. time
encapsulates the essential characteristics of our approach, highlighting the distinct for analysis, the perfect
time
matching Jaccard underscores its high accuracy in capturing similarities
complexities and trade-offs associated with each analyzed feature static strings, static Im- between samples,
port To
making provide
Address a concise
it a robust
Table tool and
for
(IAT), comprehensive
comprehensive
and overview
dynamic APIdynamic of our ransomware classification
calls. analysis.
system, we present aconcise
detailed comparison of key features, time complexities, and analysis
A comparative analysis of ransomware classificationoffeatures
To provide a and comprehensive overview our ransomware
is described classification
in Table 6,
methods
system, in the
we form of
present a a diagram
detailed and table. The
comparison of diagram
key illustrated
features, time in Figure 17 visually
complexities, and analysis
with a numerical comparison between static and dynamic analyzers.
encapsulates
methods in the the form
essential
of acharacteristics
diagram and of our The
table. approach,
diagram highlighting
illustratedthe indistinct
Figure time
17 visually
complexities and trade-offs associated with each analyzed feature static
encapsulates the essential characteristics of our approach, highlighting the distinct strings, static Im- time
port Address Table (IAT), and dynamic API calls.
complexities and trade-offs associated with each analyzed feature static strings, static
A comparative
Import Address Table analysis
(IAT),of and
ransomware
dynamicclassification
API calls. features is described in Table 6,
with a numerical comparison between static and dynamic analyzers.
Author Contributions: Conceptualization, B.Y., N.A. and M.A.A.; methodology, B.Y. and M.A.A.;
software, B.Y.; validation, B.Y., N.A. and M.A.A.; formal analysis, B.Y., N.A. and M.A.A.; investigation,
B.Y., M.S.E. and M.A.A.; resources, B.Y.; data curation, B.Y. and M.A.A.; writing—original draft
preparation, B.Y., A.D.J., N.A. and M.A.A.; writing—review and editing, B.Y., M.S.E., A.D.J., N.A. and
M.A.A.; visualization, B.Y., M.S.E., A.D.J., N.A. and M.A.A.; supervision, A.D.J., M.S.E. and M.A.A.;
project administration, B.Y., M.S.E., A.D.J., N.A. and M.A.A.; funding acquisition, A.D.J. All authors
have read and agreed to the published version of the manuscript.
Funding: This research was funded by the University College Dublin (UCD), School of Computer
Science, Dublin, Ireland, grant number 13/RC/2077.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: This article does not contain any studies with human participants or
animals performed by any of the authors.
Data Availability Statement: Data in this research paper will be shared upon request made to the
corresponding author.
Conflicts of Interest: All authors declare that they have no conflict of interest for the presented work.
References
1. Gopinath, M.; Sethuraman, S.C. A comprehensive survey on deep learning based malware detection techniques. Comput. Sci. Rev.
2023, 47, 100529.
2. Brown, A.; Gupta, M.; Abdelsalam, M. Automated machine learning for deep learning based malware detection. Comput. Secur.
2024, 137, 103582.
3. Kok, S.; Abdullah, A.; Jhanjhi, N.; Supramaniam, M. Ransomware, threat and detection techniques: A review. Int. J. Comput. Sci.
Netw. Secur. 2019, 19, 136.
4. Yadav, C.S.; Singh, J.; Yadav, A.; Pattanayak, H.S.; Kumar, R.; Khan, A.A.; Haq, M.A.; Alhussen, A.; Alharby, S. Malware analysis
in iot & android systems with defensive mechanism. Electronics 2022, 11, 2354.
5. Rey, V.; Sánchez, M.S.; Celdrán, A.H.; Bovet, G. Federated learning for malware detection in IoT devices. Comput. Netw. 2022,
204, 108693. [CrossRef]
6. Johnson, S.; Gowtham, R.; Nair, A.R. Ensemble Model Ransomware Classification: A Static Analysis-based Approach. In Inventive
Computation and Information Technologies: Proceedings of ICICIT 2021; Springer Nature: Singapore, 2022; pp. 153–167.
7. Al-rimy, B.A.S.; Maarof, M.A.; Shaid, S.Z.M. Ransomware threat success factors, taxonomy, and countermeasures: A survey and
research directions. Comput. Secur. 2018, 74, 144–166. [CrossRef]
8. Akhtar, Z. Malware detection and analysis: Challenges and research opportunities. arXiv 2021, arXiv:2101.08429.
9. Tahir, R. A study on malware and malware detection techniques. Int. J. Educ. Manag. Eng. 2018, 8, 20. [CrossRef]
10. Yamany, B.; Elsayed, M.S.; Jurcut, A.D.; Abdelbaki, N.; Azer, M.A. A New Scheme for Ransomware Classification and Clustering
Using Static Features. Electronics 2022, 11, 3307. [CrossRef]
11. Yamany, B.E.M.; Azer, M.A. SALAM Ransomware Behavior Analysis Challenges and Decryption. In Proceedings of the 2021
Tenth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 5–7 December 2021;
pp. 273–277.
12. Fernando, D.W.; Komninos, N.; Chen, T. A study on the evolution of ransomware detection using machine learning and deep
learning techniques. IoT 2020, 1, 551–604. [CrossRef]
13. Khan, F.; Ncube, C.; Ramasamy, L.K.; Kadry, S.; Nam, Y. A digital DNA sequencing engine for ransomware detection using
machine learning. IEEE Access 2020, 8, 119710–119719. [CrossRef]
14. Liu, K.; Xu, S.; Xu, G.; Zhang, M.; Sun, D.; Liu, H. A review of android malware detection approaches based on machine learning.
IEEE Access 2020, 8, 124579–124607. [CrossRef]
15. Bae, S.I.; Lee, G.B.; Im, E.G. Ransomware detection using machine learning algorithms. Concurr. Comput. Pract. Exp. 2020, 32,
e5422. [CrossRef]
16. Chakkaravarthy, S.S.; Sangeetha, D.; Cruz, M.V.; Vaidehi, V.; Raman, B. Design of intrusion detection honeypot using social
leopard algorithm to detect IoT ransomware attacks. IEEE Access 2020, 8, 169944–169956. [CrossRef]
17. El-Kosairy, A.; Azer, M.A. Intrusion and ransomware detection system. In Proceedings of the 2018 1st International Conference
on Computer Applications & Information Security (ICCAIS), Riyadh, Saudi Arabia, 4–6 April 2018; pp. 1–7.
18. Vishwakarma, R.; Jain, A.K. A honeypot with machine learning based detection framework for defending IoT based botnet DDoS
attacks. In Proceedings of the 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli,
India, 23–25 April 2019; pp. 1019–1024.
19. Keong Ng, C.; Rajasegarar, S.; Pan, L.; Jiang, F.; Zhang, L.Y. VoterChoice: A ransomware detection honeypot with multiple voting
framework. Concurr. Comput. Pract. Exp. 2020, 32, e5726. [CrossRef]
Information 2024, 15, 46 28 of 29
20. Pont, J.; Arief, B.; Hernandez-Castro, J. Why current statistical approaches to ransomware detection fail. In Proceedings of the
International Conference on Information Security, Bali, Indonesia, 16–18 December 2020; Springer International Publishing:
Cham, Switzerland, 2020; pp. 199–216.
21. Yewale, A.; Singh, M. Malware detection based on opcode frequency. In Proceedings of the 2016 International Conference
on Advanced Communication Control and Computing Technologies (ICACCCT), Ramanathapuram, India, 25–27 May 2016;
pp. 646–649.
22. Rezaei, S.; Afraz, A.; Rezaei, F.; Shamani, M.R. Malware detection using opcodes statistical features. In Proceedings of the 2016
8th International Symposium On Telecommunications (IST), Tehran, Iran, 27–28 September 2016; pp. 151–155.
23. Verma, V.; Muttoo, S.K.; Singh, V.B. Multiclass malware classification via first-and second-order texture statistics. Comput. Secur.
2020, 97, 101895. [CrossRef]
24. Du, P.; Sun, Z.; Chen, H.; Cho, J.H.; Xu, S. Statistical estimation of malware detection metrics in the absence of ground truth. IEEE
Trans. Inf. Forensics Secur. 2018, 13, 2965–2980. [CrossRef]
25. Bijitha, C.V.; Sukumaran, R.; Nath, H.V. A survey on ransomware detection techniques. In Secure Knowledge Management in
Artificial Intelligence Era: 8th International Conference, SKM 2019, Goa, India, 21–22 December 2019; Proceedings 8; Springer: Singapore,
2020; pp. 55–68.
26. Bello, A.; Maurushat, A. Synthesis of Evidence on Existing and Emerging Social Engineering Ransomware Attack Vectors. In
Cybersecurity Issues, Challenges, and Solutions in the Business World; IGI Global: Hershey, PA, USA, 2023; pp. 234–254.
27. Cai, C.X.; Zhao, R. Salience theory and cryptocurrency returns. J. Bank. Financ. 2024, 159, 107052. [CrossRef]
28. Oz, H.; Aris, A.; Levi, A.; Uluagac, A.S. A survey on ransomware: Evolution, taxonomy, and defense solutions. ACM Comput.
Surv. (CSUR) 2022, 54, 1–37. [CrossRef]
29. Alzahrani, S.; Xiao, Y.; Sun, W. An analysis of conti ransomware leaked source codes. IEEE Access 2022, 10, 100178–100193.
[CrossRef]
30. Shu, R.; Xia, T.; Williams, L.; Menzies, T. Omni: Automated ensemble with unexpected models against adversarial evasion attack.
Empir. Softw. Eng. 2022, 27, 26. [CrossRef]
31. Alagappan, A.; Venkatachary, S.K.; Andrews, L.J.B. Augmenting Zero Trust Network Architecture to enhance security in virtual
power plants. Energy Rep. 2022, 8, 1309–1320. [CrossRef]
32. Whyte, C.; Mazanec, B. Understanding Cyber-Warfare: Politics, Policy and Strategy; Routledge: Oxford, UK, 2023.
33. Berrueta, E.; Morato, D.; Magaña, E.; Izal, M. A survey on detection techniques for cryptographic ransomware. IEEE Access 2019,
7, 144925–144944. [CrossRef]
34. Kara, I.; Aydos, M. The rise of ransomware: Forensic analysis for windows based ransomware attacks. Expert Syst. Appl. 2022,
190, 116198. [CrossRef]
35. Gómez-Hernández, J.A.; Sánchez-Fernández, R.; García-Teodoro, P. Inhibiting crypto-ransomware on windows platforms through
a honeyfile-based approach with R-Locker. IET Inf. Secur. 2022, 16, 64–74. [CrossRef]
36. Almomani, I.; Alkhayer, A.; El-Shafai, W. A crypto-steganography approach for hiding ransomware within HEVC streams in
android IoT devices. Sensors 2022, 22, 2281. [CrossRef]
37. Ahmed, M.; Afreen, N.; Ahmed, M.; Sameer, M.; Ahamed, J. An inception V3 approach for malware classification using machine
learning and transfer learning. Int. J. Intell. Netw. 2023, 4, 11–18. [CrossRef]
38. Chaganti, R.; Ravi, V.; Pham, T.D. A multi-view feature fusion approach for effective malware classification using Deep Learning.
J. Inf. Secur. Appl. 2023, 72, 103402. [CrossRef]
39. Eren, M.E.; Bhattarai, M.; Rasmussen, K.; Alexandrov, B.S.; Nicholas, C. MalwareDNA: Simultaneous Classification of Malware,
Malware Families, and Novel Malware. In Proceedings of the 2023 IEEE International Conference on Intelligence and Security
Informatics (ISI), Charlotte, NC, USA, 2–3 October 2023; pp. 1–3.
40. Marques, A.B.; Branco, V.; Costa, R.; Costa, N. Data Visualization in Hybrid Space—Constraints and Opportunities for Design.
In Proceedings of the International Conference on Design and Digital Communication, Barcelos, Portugal, 3–5 October 2022;
Springer Nature: Cham, Switzerland, 2022; pp. 3–15.
41. Rimon, S.I.; Haque, M.M. Malware Detection and Classification Using Hybrid Machine Learning Algorithm. In Proceedings
of the International Conference on Intelligent Computing & Optimization, Hua Hin, Thailand, 27–28 October 2022; Springer
International Publishing: Cham, Switzerland, 2022; pp. 419–428.
42. Mallik, A.; Khetarpal, A.; Kumar, S. ConRec: Malware classification using convolutional recurrence. J. Comput. Virol. Hacking Tech.
2022, 18, 297–313. [CrossRef]
43. Abbasi, M.S.; Al-Sahaf, H.; Mansoori, M.; Welch, I. Behavior-based ransomware classification: A particle swarm optimization
wrapper-based approach for feature selection. Appl. Soft Comput. 2022, 121, 108744. [CrossRef]
44. Kim, J.; Lee, S. Malware Visualization and Similarity via Tracking Binary Execution Path. Teh. Vjesn. 2022, 29, 221–230.
45. Saxe, J.; Sanders, H. Malware Data Science: Attack Detection and Attribution; No Starch Press: San Francisco, CA, USA, 2018.
46. Kong, K.; Zhang, Z.; Guo, C.; Han, J.; Long, G. PMMSA: Security analysis system for android wearable applications based on
permission matching and malware similarity analysis. Future Gener. Comput. Syst. 2022, 137, 349–362. [CrossRef]
47. Mudgil, P.; Gupta, P.; Mathur, I.; Joshi, N. A novel similarity measure for context-based search engine. In Proceedings of the
International Conference on Innovative Computing and Communications: Proceedings of ICICC 2022; Springer Nature: Singapore; 2022,
Volume 2, pp. 791–808.
Information 2024, 15, 46 29 of 29
48. Abbas, A.R.; Mahdi, B.S.; Fadhil, O.Y. Breast and lung anticancer peptides classification using N-Grams and ensemble learning
techniques. Big Data Cogn. Comput. 2022, 6, 40. [CrossRef]
49. Cucchiarelli, A.; Morbidoni, C.; Spalazzi, L.; Baldi, M. Algorithmically generated malicious domain names detection based on
n-grams features. Expert Syst. Appl. 2021, 170, 114551. [CrossRef]
50. Di Mauro, M.; Galatro, G.; Liotta, A. Experimental review of neural-based approaches for network intrusion management. IEEE
Trans. Netw. Serv. Manag. 2020, 17, 2480–2495. [CrossRef]
51. Dong, S.; Xia, Y.; Peng, T. Network abnormal traffic detection model based on semi-supervised deep reinforcement learning. IEEE
Trans. Netw. Serv. Manag. 2021, 18, 4197–4212. [CrossRef]
52. Pelletier, C.; Webb, G.I.; Petitjean, F. Deep learning for the classification of Sentinel-2 image time series. In Proceedings of the
IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019;
pp. 461–464.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.