0% found this document useful (0 votes)

28 views29 pages

Information 15 00046

Uploaded by

shravanichanikya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views29 pages

Information 15 00046

Uploaded by

shravanichanikya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

information

Article
A Holistic Approach to Ransomware Classification: Leveraging
Static and Dynamic Analysis with Visualization
Bahaa Yamany 1 , Mahmoud Said Elsayed 2, * , Anca D. Jurcut 2 , Nashwa Abdelbaki 1 and Marianne A. Azer 1,3

1 School of Information Technology and Computer Science, Nile University, Cairo 12566, Egypt;
b.yamany@nu.edu.eg (B.Y.); nabdelbaki@nu.edu.eg (N.A.)
2 School of Computer Science, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland;
anca.jurcut@ucd.ie
3 Computers and Systems Department, National Telecommunication Institute, Cairo 11768, Egypt;
mazer@nu.edu.eg
* Correspondence: mahmoud.abdallah@ucdconnect.ie

Abstract: Ransomware is a type of malicious software that encrypts a victim’s files and demands
payment in exchange for the decryption key. It is a rapidly growing and evolving threat that has
caused significant damage and disruption to individuals and organizations around the world. In this
paper, we propose a comprehensive ransomware classification approach based on the comparison of
similarity matrices derived from static, dynamic analysis, and visualization. Our approach involves
the use of multiple analysis techniques to extract features from ransomware samples and to generate
similarity matrices based on these features. These matrices are then compared using a variety of
comparison algorithms to identify similarities and differences between the samples. The resulting
similarity scores are then used to classify the samples into different categories, such as families,
variants, and versions. We evaluate our approach using a dataset of ransomware samples and
demonstrate that it can accurately classify the samples with a high degree of accuracy. One advantage
of our approach is the use of visualization, which allows us to classify and cluster large datasets of
ransomware in a more intuitive and effective way. In addition, static analysis has the advantage of
being fast and accurate, while dynamic analysis allows us to classify and cluster packed ransomware
samples. We also compare our approach to other classification approaches based on single analysis
techniques and show that our approach outperforms these approaches in terms of classification
Citation: Yamany, B.; Elsayed, M.S.;
accuracy. Overall, our study demonstrates the potential of using a comprehensive approach based
Jurcut, A.D.; Abdelbaki, N.; Azer, M.A.
A Holistic Approach to Ransomware
on the comparison of multiple analysis techniques, including static analysis, dynamic analysis,
Classification: Leveraging Static and and visualization, for the accurate and efficient classification of ransomware. It also highlights the
Dynamic Analysis with Visualization. importance of considering multiple analysis techniques in the development of effective ransomware
Information 2024, 15, 46. https:// classification methods, especially when dealing with large datasets and packed samples.
doi.org/10.3390/info15010046
Keywords: dynamic analysis; encryption; honeypot; Jaccard index; malware; machine learning;
Academic Editor: Ruggero Lanotte
ransomware; similarity matrix; shared code analysis; static analysis
Received: 11 December 2023
Revised: 30 December 2023
Accepted: 5 January 2024
Published: 14 January 2024 1. Introduction
Malware analysis is the act of finding, comprehending, and minimizing the potential
damage caused by malicious software, such as ransomware in ref. [1]. It is a crucial compo-
Copyright: © 2024 by the authors.
nent of cybersecurity since it enables organizations and individuals to defend themselves
Licensee MDPI, Basel, Switzerland. against the numerous types of malware that might infect their systems and data. Malware
This article is an open access article analysis employs a variety of tools and methodologies, including static analysis, dynamic
distributed under the terms and analysis, sandbox analysis, and reverse engineering. These methods can be used to analyze
conditions of the Creative Commons the code and behavior of malware and to identify indicators of compromise (IOCs) that
Attribution (CC BY) license (https:// can be used to detect and categorize malware. Malware analysis is an important part
creativecommons.org/licenses/by/ of defending against ransomware as it allows organizations and individuals to identify
4.0/). and mitigate the potential harm caused by ransomware before it can cause significant

Information 2024, 15, 46. https://doi.org/10.3390/info15010046 https://www.mdpi.com/journal/information

Information 2024, 15, x FOR PEER REVIEW 2 of 32

important part of defending against ransomware as it allows organizations and individu-

Information 2024, 15, 46 2 of 29
als to identify and mitigate the potential harm caused by ransomware before it can cause
significant damage or disruption. It can also help to identify and track the activities of
ransomware operators, which can provide valuable intelligence for law enforcement and
damage or disruption. It can also help to identify and track the activities of ransomware
other cybersecurity professionals. In addition to traditional malware analysis techniques,
operators, which can provide valuable intelligence for law enforcement and other cyber-
there are also automated malware analysis tools and platforms that can be used to auto-
security professionals. In addition to traditional malware analysis techniques, there are
mate and streamline the analysis process. These tools can help to reduce the time and
also automated malware analysis tools and platforms that can be used to automate and
resources required for manual analysis, as well as increase the speed and accuracy of the
streamline the analysis process. These tools can help to reduce the time and resources
analysis process in ref. [2]. However, it is important to carefully consider the benefits and
required for manual analysis, as well as increase the speed and accuracy of the analysis
limitations of automated malware analysis as it may not always provide the same level of
process in ref. [2]. However, it is important to carefully consider the benefits and limitations
depth and detail as manual analysis. Static analysis and dynamic analysis are two ap-
of automated malware analysis as it may not always provide the same level of depth and
proaches
detail that cananalysis.
as manual be used to analyze
Static and classify
analysis and dynamic ransomware
analysis inare
ref.two
[3]. Both approaches
approaches that
can be used to analyze and classify ransomware in ref. [3]. Both approaches haveseparately
have their own benefits and limitations, and they can be used in combination or their own
depending
benefits andon the specific
limitations, andneeds
theyofcan thebeanalysis.
used in combination or separately depending on
Ransomware represents
the specific needs of the analysis. a form of malicious software that encrypts a victim’s files
and Ransomware
subsequently represents
demands aaransom form ofinmalicious
exchangesoftwarefor the decryption
that encrypts key.a This perilous
victim’s files
threat is marked by its rapid proliferation and constant evolution,
and subsequently demands a ransom in exchange for the decryption key. This perilous resulting in significant
harm and
threat disruption
is marked by itstorapid
individuals and entities
proliferation worldwide
and constant [4]. Ransomware
evolution, resulting indeployment
significant
encompasses various techniques, including exploit
harm and disruption to individuals and entities worldwide [4]. Ransomware kits, drive-by downloads, and social
deployment
engineering strategies. Common vectors for its transmission
encompasses various techniques, including exploit kits, drive-by downloads, and social include email attachments,
compromised
engineering websites,Common
strategies. and software vulnerabilities.
vectors Upon infiltration,
for its transmission include email ransomware
attachments,typi-
cally encrypts websites,
compromised a wide arrayandofsoftware
files, ranging from documents
vulnerabilities. to images, holding
Upon infiltration, them hos-
ransomware typ-
tage. encrypts
ically Subsequently, a widevictims
array ofare confronted
files, ranging from with documents
ransom demands,to images, often presented
holding them
through on-screen
hostage. Subsequently,messages,
victimsor concealed
are confronted notes within their systems.
with ransom demands, These demands
often presentedtyp-
ically include a stipulated payment deadline and a menacing
through on-screen messages, or concealed notes within their systems. These demands ultimatum to delete the vic-
tim’s data should the ransom go unpaid. The repercussions
typically include a stipulated payment deadline and a menacing ultimatum to delete the of a ransomware attack can
be profound,
victim’s resulting
data should thein operational
ransom disruption,
go unpaid. critical data of
The repercussions loss, and substantial
a ransomware attackfinan-
can
cialprofound,
be setbacks.resulting
Victims facing such attacks
in operational may find
disruption, themselves
critical data loss, at and
a crossroads,
substantialcompelled
financial
to either pay
setbacks. the ransom
Victims facing suchfor data
attacks recovery
may find or explore
themselves alternative avenues,compelled
at a crossroads, such as data to
restoration
either pay the from backups
ransom or decryption
for data recovery ortechniques. Importantly,
explore alternative avenues, even when
such a ransom
as data restora- is
tion
paid,from
therebackups or decryption
is no guarantee that thetechniques.
ransomware Importantly,
operatorsevenwill when
honoratheir
ransom is paid,
promise to
there
provideis notheguarantee
decryption thatkey
the [5].
ransomware operators
The escalating will honor
prevalence and their promise to provide
sophistication of ransom- the
decryption
ware assaults keypose
[5]. The escalating
a global threat prevalence and sophistication
to both individuals and businesses.of ransomware assaults
Being prepared to
pose a global
respond to andthreat to both
recover fromindividuals
such attacks,and businesses. Being prepared
as well as proactively to respond
recognizing theto and
threat
recover from such attacks,
and implementing as well asmeasures,
precautionary proactively recognizing
assumes the threat
paramount and implementing
importance for safe-
precautionary measures, assumes paramount importance for
guarding against this formidable adversary [6]. Figure 1 oﬀers a comprehensive overview safeguarding against this
formidable adversary [6]. Figure 1 offers a comprehensive overview
of the various phases involved in a ransomware attack, spanning from its inception to the of the various phases
involved
extortionin a ransomware attack, spanning from its inception to the extortion phase.
phase.

Figure 1. Ransomware lifecycle from creation to extortion.

Information 2024, 15, 46 3 of 29

Paper Contribution
In this paper, our primary focus was a meticulous examination of a substantial dataset
containing ransomware samples, embedded within a broader corpus. This extensive analy-
sis led to the identification of a subset of ransomware samples sharing notable similarities.
Subsequently, we conducted a rigorous assessment using a similarity matrix-based analysis,
incorporating both static and dynamic features, with the overarching goal of offering a
comprehensive evaluation that highlights the respective merits and limitations of each
analytical approach.
Beyond our innovative methodological approach, we conducted a thorough survey
and comparative examination of established ransomware detection methodologies. Our
study presents an expansive exploration of the ransomware detection ecosystem, cover-
ing various dimensions, including the detection environment, data analysis techniques,
machine learning methodologies, outcomes, evaluation criteria, and a range of available
detection tools.
Within the context of this research, our contributions encompass a multifaceted explo-
ration of the following key aspects:
• Comparative analysis of infection behaviors across various ransomware families.
• Utilization of data visualization methods for the identification of similar ransomware
samples within extensive datasets.
• Employing a similarity matrix approach for the analysis of static and dynamic features
in ransomware samples.
• Assessment of the merits and limitations associated with static and dynamic feature
analysis.
• Comprehensive survey and comparative evaluation of varied ransomware detection
approaches, alongside an in-depth exploration of the ransomware detection ecosystem.
• Development and proposal of an automated methodology for extracting diverse
feature sets from ransomware samples.
The remainder of this paper is organized as follows. In Section 2, we present an
overview of the efforts that have been made in the literature to develop ransomware
detection approaches. We survey the different techniques that have been proposed and
analyze the criteria, parameters, and tools used in the ransomware detection ecosystem. In
Section 3, we provide a background on the different static and dynamic features that have
been used in ransomware tracking systems as well as the visualization techniques that have
been proposed for ransomware classification. In Section 4, we describe our system setup
and present the results and analysis of our proposed approach for extracting the malware’s
static features and classifying ransomware samples. We also compare our results to those
of other approaches proposed in the literature. Finally, in Section 5, we provide conclusions
and discuss future work in the field of ransomware detection. We outline the challenges
and opportunities that exist for improving the accuracy and effectiveness of ransomware
detection and highlight the potential impacts of these advances on cybersecurity.

2. Related Work
In this section, we aim to delve deeper into the related work, refine the problem
statement by addressing its limitations, and provide additional context regarding the
categorization of ransomware. Within the scope of this paper, our objective is to conduct a
comprehensive survey of the diverse spectrum of ransomware detection methodologies
and techniques as delineated in the existing literature. Beyond this survey, we undertake
a meticulous analysis of the varied criteria, parameters, and tools employed within the
broader ransomware detection ecosystem. The overarching goal is to furnish readers
with an in-depth understanding of the contemporary landscape in ransomware detection,
including both its advancements and inherent challenges. However, it is imperative to
acknowledge certain limitations in this pursuit. Firstly, the rapidly evolving nature of
ransomware demands a continuous update of detection methodologies, and as such, some
state-of-the-art techniques may not be covered if they have emerged after our knowledge
Information 2024, 15, 46 4 of 29

cutoff date. Secondly, the effectiveness of ransomware detection can be context-dependent,

varying based on factors such as the specific ransomware variant, its obfuscation techniques,
and the target environment. These contextual variations pose challenges in proposing a
one-size-fits-all solution.
Categorizing ransomware is a crucial aspect of understanding the threat landscape.
Ransomware can be classified into various categories based on its characteristics, propaga-
tion methods, and behavior. Some common categories include:
• Encrypting Ransomware: This category involves ransomware that encrypts files on
the victim’s system, rendering them inaccessible until a ransom is paid.
• Locker Ransomware: Locker ransomware locks the victim out of their entire system,
preventing access until a ransom is provided.
• Doxware or Leakware: This type threatens to release sensitive information unless a
ransom is paid, often compromising privacy.
• Scareware: Scareware displays false warnings or claims of malware infections, extort-
ing money for their removal.
• Mobile Ransomware: Designed for mobile devices, this category targets smartphones
and tablets, encrypting data or locking the device.
• Ransomware-as-a-Service (RaaS): RaaS platforms allow cybercriminals to easily create
and distribute ransomware, contributing to its proliferation.
• Targeted Ransomware: Some ransomware attacks are highly targeted, focusing on
specific organizations or individuals, often with higher ransom demands.
• Cryptojacking: While it is not traditional ransomware, cryptojacking malware hijacks
computer resources to mine cryptocurrencies, often without the victim’s consent.
The proliferation of computers, the Internet, and applications has introduced threats,
including malicious software or malware. One study in ref. [7] focused on ransomware, a
type of malware that encrypts user files, demanding a ransom for their release. Despite
advisories against paying ransoms, victims commonly resort to this measure. The paper
emphasized the need for advanced protection measures against ransomware, highlighting
the importance of understanding its nature for effective defense. While existing surveys
touch on technical aspects, there is a dearth of comprehensive reviews dedicated to explor-
ing ransomware research. This paper seeks to fill this gap by providing a detailed survey
and introducing a new ransomware taxonomy. The survey covers ransomware threat
factors, taxonomy, and existing research, offering valuable insights for future endeavors in
this domain.
Categorization aids in understanding the modus operandi of different ransomware
variants and tailoring detection and mitigation strategies accordingly. As we proceed with
our analysis, it is essential to consider these categories and their implications on detection
and prevention strategies. Furthermore, we acknowledge that the ransomware landscape
is dynamic, and new categories or variants may emerge over time, necessitating ongoing
research and adaptation of security measures.
Ransomware detection through reverse engineering entails shared code analysis to
identify analyzable sample groupings, aiding in developer attribution and variant identifi-
cation. Shared code analysis enables swift determination of code commonality between
new and previously analyzed samples. In a standard malware detection system, the pri-
mary components encompass feature extraction, feature selection, classification/clustering,
and decision-making. Raw data are processed through feature extraction, yielding relevant
features. Feature selection reduces complexity by identifying feature correlations. The
resulting feature vector undergoes classification or clustering, with the decision module
distinguishing between malicious and benign samples in ref. [8]. In ref. [9], the authors
conducted a comparative study between static and dynamic malware analysis techniques.
Both static and dynamic analysis approaches hold significant value in the realm of ran-
somware analysis and classification. The choice between these methods hinges on the
specific requirements, available resources, and characteristics of the ransomware samples
under scrutiny. Often, a synergistic combination of static and dynamic analysis proves to
Information 2024, 15, 46 5 of 29

be the most effective approach. Identifying the malware family to which a new sample
belongs is a common necessity in malware analysis. One prevalent approach involves
subjecting the sample to a multi-engine antivirus scanner, such as VirusTotal. However,
outcomes from these scanners can sometimes lack clarity and accuracy as malware is often
tagged with generic labels like “generic”, offering little meaningful information. Addi-
tionally, malware creators may actively monitor the VirusTotal database, modifying their
code or functions to evade detection. An alternative method for malware analysis involves
executing the sample within a controlled sandbox environment, such as CuckooBox, to
gather insights into the malware’s behavior and communication with callback servers.
While this approach can yield valuable insights, it can be time-consuming and less efficient
when dealing with extensive malware datasets. A distinctive and automated approach
to malware analysis, as introduced in ref. [10], is shared code analysis or similarity check
analysis. This technique compares two malware samples by quantifying the proportion
of the recompiled source code they share. Unlike shared attribute analysis, which relies
on external characteristics, shared code analysis swiftly and accurately classifies malware,
particularly within large datasets. Nevertheless, it is crucial to assess the limitations of
this method and utilize it in conjunction with other analysis techniques as needed. In the
context of malware analysis and ransomware, ref. [11] offers a comparative exploration of
various analysis approaches and ransomware typologies, shedding light on their respective
behaviors and characteristics; Section 2.1, “Ransomware Detection Approaches”, outlines
various techniques. Machine learning leverages known ransomware datasets for classifi-
cation. Behavioral analysis observes malware execution, analyzing network activity, file
operations, and system resource usage.

2.1. Ransomware Detection Approaches and Techniques

In the Machine Learning approach, machine learning algorithms analyze and catego-
rize ransomware behavior. Trained on datasets of both known ransomware and benign
samples, these algorithms identify new ransomware based on learned characteristics. Ma-
chine learning techniques, such as Decision Trees, Support Vector Machines, and Artificial
Neural Networks, are applied. Advantages include adaptability to new ransomware vari-
ations and scalability for handling large datasets. However, accuracy hinges on dataset
quality, diversity, and algorithm complexity. The Honeypot approach entails establish-
ing networks or systems designed to attract and ensnare ransomware. These systems
simulate vulnerability to lure ransomware attackers and monitor their activities. Benefits
encompass real-time collection and analysis of new ransomware samples and the ability to
discern attacker behavior trends and patterns. Nonetheless, Honeypots require substan-
tial resources and maintenance and may not detect all ransomware types. The Statistical
Analysis approach scrutinizes the statistical attributes of ransomware samples to uncover
common patterns and features. Techniques like frequency analysis, entropy analysis, and
n-gram analysis are employed. Advantages include rapid analysis of large datasets and
the identification of shared patterns across diverse ransomware types. However, it may
struggle with sophisticated or novel ransomware and could yield false positives if benign
samples exhibit similar statistical characteristics. Each approach possesses its own merits
and drawbacks, making them suitable for specific ransomware detection scenarios. The
choice of approach should align with the requirements and constraints of the detection
system. Careful consideration is vital when selecting the appropriate methodology.

2.1.1. Machine Learning

Machine Learning leverages algorithms grouped into categories like Bayesian, deci-
sion tree, dimension reduction, instance-based, clustering, deep learning, ensemble, neural
network, regularization, rule system, and regression. These algorithms are utilized for
ransomware detection by analyzing and classifying behaviors. Bayesian algorithms, rooted
in Bayesian statistics, employ probabilistic models for event likelihood predictions, com-
monly applied in spam filters and malware detection systems. Decision tree algorithms
Information 2024, 15, 46 6 of 29

employ tree-like structures to make decisions based on predefined conditions or rules,

often used for classifying malware. Dimension reduction reduces dataset features for
easier analysis, aiding in identifying malware patterns and characteristics. Instance-based
algorithms make predictions based on stored instances or examples, useful in recognizing
malware patterns. Clustering algorithms group similar data points, employed to identify
malware features. Deep learning utilizes artificial neural networks for pattern recogni-
tion. Ensemble algorithms combine multiple models to enhance accuracy, while neural
network algorithms employ artificial neural networks for pattern detection. Regularization
algorithms prevent overfitting in complex models. In ref. [12], a machine learning-based
model distinguished ransomware from normal files and other malware, with an automatic
detection model enabling the identification of new ransomware samples. Ref. [13] explored
research projects employing machine learning and deep learning for ransomware detection.
Ref. [14] utilized a digital DNA sequencing engine and AI machine learning network to
classify ransomware into distinct families based on their “digital genomes”. Researchers
in [15] employed hybrid multi-level profiling for a comprehensive forensic investigation
of crypto ransomware. They introduce the concept of “behavioral chaining” and employ
tools for mining associative rules and AI. Profiling ransomware behavior based on its chain
ratio introduces a novel approach to creating unique ransomware signatures.

2.1.2. Honeypots
Honeypots are valuable tools for gathering information about attacks, including the
identification of users and the extent of their activities, aiding in informed decision-making
for defense strategies. The primary objective of deploying honeypots is to acquire insights
into ongoing attacks and utilize that intelligence to bolster security measures. To enhance
user awareness, email notifications are sent, occasionally advising users to disconnect
network cables as a precautionary measure. This user training aspect adds an extra layer of
security awareness, making honeypots an effective means to detect ransomware attacks. In
ref. [16], the authors employed a combination of methods, including machine learning for
grouping cases and Honeypots to capture potentially malicious packages. Classification
tasks utilize Decision Trees and Support Vector Machine (SVM). The study suggests the
potential of architectural solutions for malware detection. Ref. [17] introduced an Intrusion
Detection Honeypot (IDH), comprising Honeyfolder, Audit Watch, and Complex Event
Processing (CEP). IDH is designed to mimic vulnerability while also functioning as an early
warning system, notifying users of suspicious file activity. Ref. [18] presented a deception
method involving Honeyfiles and Honeytokens, designed to access compromised private
files and detect hacking or ransomware attempts. The hypothesis explores the use of
honeypots combined with machine learning for malware detection. In ref. [19], data from
an Internet of Things (IoT) honeypot were effectively employed to train a dynamic machine
learning model. This highlights the dynamic nature of honeypot-driven machine-learning
techniques. Ref. [20] suggested a framework utilizing an Intrusion Prevention System
(IPS) gateway, an analytical system, and honeypots to detect and identify ransomware.
The framework encompasses six elements: IPS, gateway, static detector, dynamic detector,
honeynet, and a notification component, collectively contributing to effective ransomware
detection and user notification. These studies underscore the versatility and potential
of honeypot-driven approaches, often combined with machine learning techniques, for
enhancing ransomware detection and overall cybersecurity.

2.1.3. Statistics
To better understand the characteristics of ransomware, it may be possible to employ
statistical analysis. One prominent method of detecting ransomware is using statistical
analyses, which can identify unpredictable behavior and be used to flag the presence of
encryption. Based on the frequency of opcodes in the portable executable file, the authors
in ref. [21] proposed an approach for detecting malware. The study used a machine learn-
ing system to detect false positives, false negatives, true positives, and true negatives in
Information 2024, 15, 46 7 of 29

malware. While the authors in ref. [22] proposed a method for finding malware. This
research employed a machine learning algorithm to identify malware with varying degrees
of accuracy. The method of malware detection was developed by the authors using a
similarity measurement algorithm. The proposed method was meant to boost malware
detection times and throughput. This methodology has various advantages over others,
including increased speed by using opcodes directly and improved detection outcomes
from being immune to obfuscation and disassembly methods in ref. [23]. Another approach
for malware was classification presented in ref. [24] inspired by the aesthetic similarities
across viruses in the same family, this work proposes binary texture analysis over greyscale
photos generated directly from malware executables. This technique provides statistical
texture features of the second order over the graphical representation of malware. This
strategy cannot be fooled by common methods of concealment (e.g., packing, code relo-
cation, and encryption). Five malware detection metrics were assessed in the absence of
ground truth, a real-world scenario that poses various technical challenges, the end goal
was to develop fully automated, principled methods to assess these indications with the
highest possible precision. Estimators of statistical significance were provided for the five
measures used to identify malware. These statistical estimators were shown to be accurate
by comparison to the known truth and fictional data. This large dataset was obtained from
VirusTotal, and the estimators were then utilized to measure and quantify five metrics in
ref. [25]. Several methods proposed in the literature make use of multiple strategies. The
benefits and drawbacks of various ransomware detection strategies are summarized in
Table 1.

Table 1. Comparison between ransomware detection approaches.

Ransomware
Detection Ref. Description Advantages Disadvantages
Approach
The most used machine learning One of the main
techniques in ransomware advantages of using
detection include supervised machine learning for
learning, unsupervised learning, ransomware detection is
and semi-supervised learning. that it allows for the
Supervised learning involves automatic identification of
training a model on labeled data, patterns and relationships
Machine learning algorithms
where the input and output are within large datasets. This
can be vulnerable to bias and
both known. This allows the can be particularly useful
can produce inaccurate results
model to make predictions based for identifying new and
if the training data are not
on the relationships learned emerging threats, as the
representative of the
Machine Learning [12–15] from the training data. model can learn from past
real-world data. They also
Unsupervised learning involves data to identify patterns
require frequent retraining to
training a model on data where and make predictions
ensure that they continue to
the output is not known, and the about future threats.
perform well as the data
model must find patterns and Machine learning
distribution changes.
relationships within the data on algorithms can also be
its own. Semi-supervised trained on a wide variety
learning is a combination of of data types, including
supervised and unsupervised text, images, and audio,
learning, where the model is which makes them useful
trained on a mix of labeled and for detecting ransomware
unlabeled data. in different formats.
Information 2024, 15, 46 8 of 29

Table 1. Cont.

Ransomware
Detection Ref. Description Advantages Disadvantages
Approach
One advantage of using a There are also some
honeypot is that it allows disadvantages to using
researchers to gather honeypots. One potential issue
Honeypots are a type of decoy valuable data and is the risk of false positives,
system that is designed to attract intelligence about the where legitimate activity is
and detect malware or tactics, techniques, and mistaken for malicious activity.
cyber-attacks. They are used to procedures (TTPs) used by Another issue is the cost and
lure attackers into a controlled attackers. This information resources required to maintain
and isolated environment, where can be used to improve the and operate a honeypot, as
Honeypot [16–20]
their actions can be observed effectiveness of well as the potential legal and
and studied. By setting up a ransomware detection and ethical considerations.
honeypot, it is possible to prevention measures. Additionally, honeypots may
monitor and track ransomware Additionally, honeypots not be suitable for all types of
activity and identify new strains can help mitigate the environments or organizations
or variants of the malware. impact of ransomware and may not provide
attacks by preventing the comprehensive protection
malware from reaching the against all types of
target system or data. ransomware attacks.
The disadvantage of this
The statistical analysis approach
approach is that it relies on the
involves collecting and
The advantage of using availability of accurate and
analyzing data about
statistical analysis is that it comprehensive data, which
ransomware behavior to identify
allows researchers to gain a may be difficult to obtain in
patterns and trends. This can be
deeper understanding of some cases. Additionally,
Statistical [21–25] done through various methods,
ransomware behavior and statistical analysis may not be
such as collecting data about the
identify key trends that can able to identify specific
frequency and types of ransom
inform prevention and instances of ransomware in
demands, the types of files
detection efforts. real time, making it less
targeted, and the tactics used by
effective for immediate
ransomware operators.
detection and response.

3. Background
In this section, we define and present the features that affect ransomware tracking and
introduce the different static and dynamic features that have been used for ransomware
tracking. In Section 3.1, we introduce the different types of ransomwares and provide
a brief history of ransomware. We also compare the key features, spreading techniques,
exploitation, and ransomware families of different ransomware types, such as crypto worm,
Ransomware-as-a-Service (RaaS), and Automated Active Adversary ransomware. We also
discuss the role of APT attacks, such as the Shamoon data wiper malware, in ransomware
infections. In Section 3.2, we discuss visualization techniques that are used to represent
and analyze data in a visual form. In the context of ransomware classification, visualiza-
tion techniques can be utilized to graphically represent the relationships and similarities
between different ransomware samples. These techniques can provide a more intuitive
and comprehensive understanding of the data, allowing analysts to identify patterns and
trends that may not be immediately apparent through traditional methods of analysis.
Some common visualization techniques that may be used in ransomware classification
include scatter plots, heat maps, and network graphs. By using these techniques, analysts
can effectively classify, and cluster ransomware samples based on their features and char-
acteristics, enabling more accurate and efficient detection and analysis of these threats.
Finally, in Section 3.3, we discuss the use of static and dynamic features in ransomware
tracking systems and the challenges and opportunities that these features present. Overall,
this section provides a comprehensive overview of the key features and techniques that are
Information 2024, 15, x FOR PEER REVIEW 9 of 32
Information 2024, 15, 46 9 of 29

3.1.in
used Ransomware
ransomware Types and History
tracking and classification as well as the challenges and opportunities
that these approachesclassified
Ransomware, present. as a type of malware, operates by encrypting a victim’s files
and subsequently demanding a ransom in exchange for restoring access to these files in
3.1.
ref.Ransomware Types and
[26]. Notably, History categories of ransomware exist, each with unique
various
Ransomware,
characteristics. classified
These as a type
categories of malware,
encompass crypto operates
wormsbyinencrypting a victim’s files
ref. [27], Human-operated
and subsequently demanding a ransom in exchange for
Ransomware in ref. [28], Ransomware-as-a-Service (RaaS) in ref. [29], and restoring access to these files in
Automated
ref. [26]. Notably,
Active Adversary various categories
ransomware inofref.
ransomware
[30]. Tableexist, each with unique
2 encapsulates characteristics.
the essential features,
These categories
propagation encompass
methods, crypto worms
exploitation in ref.and
strategies, [27], Human-operated
ransomware families Ransomware
associated within
ref. [28], Ransomware-as-a-Service (RaaS) in ref. [29], and Automated
these diverse ransomware types. A specific subtype within the RaaS ransomware category Active Adversary
ransomware
is Advanced inPersistent
ref. [30]. Table
Threat2 encapsulates
(APT) attacks, the essential
typified byfeatures,
instancespropagation methods,
like the Shamoon data
exploitation strategies, and ransomware families associated with these
wiper malware in ref. [31]. APT-33, for instance, has employed such attacks in the Middle diverse ransomware
types. A specific
East and Europe,subtype withinby
often driven thecommercial
RaaS ransomware or military category is Advanced
motives. Persistent
Notably, ransomware
Threat (APT) attacks, typified by instances like the Shamoon data wiper malware
infections can originate from various sources in ref. [32], with the distribution percentages in ref. [31].
APT-33, for instance,
elucidated in ref. [33].hasFigure
employed such attacks
2 visually represents in thetheMiddle
primary East and Europe,
sources oftenfor
of infection
driven by commercial or military motives. Notably, ransomware
most ransomware, which may include phishing emails, APT attacks, system infections can originate
from various sources
vulnerabilities, in ref.downloads,
drive-by [32], with the and distribution
exploit kits. percentages
An in-depth elucidated in ref. of
exploration [33].
the
Figure 2 visually represents the primary sources of infection for
history of ransomware has been undertaken by the authors in ref. [34]. In Table 3, amost ransomware, which
may include phishing
chronological account emails, APT attacks,
of significant system vulnerabilities,
ransomware drive-by downloads,
attacks is summarized, including andthe
exploit kits. An in-depth exploration of the history of ransomware has been undertaken by
attack date, the responsible ransomware family, and the resultant damage. Broadly,
the authors in ref. [34]. In Table 3, a chronological account of significant ransomware attacks
ransomware can be categorized into two principal subgroups: locker ransomware in ref.
is summarized, including the attack date, the responsible ransomware family, and the
[35] and crypto ransomware in ref. [36]. Locker ransomware restricts access to a device,
resultant damage. Broadly, ransomware can be categorized into two principal subgroups:
often by imposing an additional password requirement to access the device. In contrast,
locker ransomware in ref. [35] and crypto ransomware in ref. [36]. Locker ransomware
crypto ransomware identifies and encrypts valuable data located on the victim’s device.
restricts access to a device, often by imposing an additional password requirement to access
the device. In contrast, crypto ransomware identifies and encrypts valuable data located
Table 2. Comparison between ransomware malware behavior types.
on the victim’s device.
Human-Operated Ransomware-as-a-Service
Crypto Worm
Table 2. Comparison between ransomware malware Automated Active Adversary
Ransomware (RaaS)behavior types.
Ransomware-as-a-Service
Key Features Self-propagating Targeted attacksHuman-Operated Ransomware-as-a- Advanced Automated
evasion Active
tactics
Crypto Worm modelService (RaaS)
Ransomware Adversary
Spreading tech-
Wormhole Social engineering Ransomware-as-a- Customized
Advanced
Key Features
niques Self-propagating TargetedEmail
attacksattachments, web links attack vectors
Service model evasion tactics
Exploitation Vulnerabilities in Targeted vulnerabili- Email attachments, Customized
Spreading techniques Wormhole Social engineering
Vulnerabilities inweb
systems Customized exploits
techniques systems ties links attack vectors
Exploitation Vulnerabilities Vulnerabilities
Detection mod- User awareness, net-vulnerabilities
Targeted Network monitoring,
Customized user
exploits
techniques Antivirus in systems Antivirus, network inmonitoring
systems
ules work monitoringUser awareness, Antivirus, network awareness
Network monitoring,
Detection modules Antivirus
Ransomware network monitoring monitoring user awareness
Ransomware
Family Exam-Family WannaCry Ryuk REvil SolarWinds
WannaCry Ryuk REvil SolarWinds
Example
ple

Figure 2. Ransomware infection vectors.

Figure 2. Ransomware infection vectors.
Information 2024, 15, 46 10 of 29

Table 3. Ransomware history timeline.

Date Ransomware Family Event Description

1989 AIDS First ransomware, called “AIDS” or “PC Cyborg”, is released.
1991 PC Cyborg It displays a message on the infected computer’s screen demanding payment.
Gpcode ransomware uses strong encryption to lock users’ files, demanding
2005 Gpcode
payment to decrypt them.
Cryptolocker ransomware is released, using encryption to hold users’ files
2013 Cryptolocker
hostage and demanding payment for the decryption key.
Cryptowall ransomware is released, using encryption to hold users’ files
2014 Cryptowall
hostage and demanding payment for the decryption key.
TeslaCrypt ransomware has been released, targeting video game files and
2015 TeslaCrypt
demanding payment for the decryption key.
Locky ransomware is released, using encryption to hold users’ files hostage
2016 Locky
and demanding payment for the decryption key.
NotPetya ransomware attack causes widespread damage, affecting thousands
2017 NotPetya
of computers and causing disruptions to various industries.
LockerGoga ransomware attack targets industrial control systems, causing
2018 LockerGoga
disruptions to manufacturing and other industries.
Ryuk ransomware targets government and healthcare organizations for large
2019 Ryuk
ransoms, causing widespread damage.
REvil (also known as Sodinokibi) ransomware attack causes widespread
2020 REvil (Sodinokibi)
damage, affecting thousands of users and organizations.
Babuk ransomware attack targets government agencies and high-profile
2021 Babuk
companies, threatening to release stolen data if a ransom is not paid.
Egregor ransomware attack causes widespread damage, affecting thousands of
2022 Egregor
users and organizations.
BlackCat ransomware is a type of malicious software that encrypts a victim’s
2023 Black Cat
files and demands a ransom for the decryption key.

3.2. Ransowmare Classification with Visualization Techniques

Visualization techniques play a pivotal role in the realm of cybersecurity, offering valu-
able support in the classification and analysis of ransomware. Ransomware classification
entails identifying and categorizing diverse ransomware types, relying on their distinctive
characteristics and behaviors. Visualization methods, in this context, emerge as powerful
tools for rendering large datasets of ransomware samples in a manner that is both intuitive
and highly effective in ref. [37]. There are several different visualization techniques that can
be used for ransomware classification, including scatter plots, heat maps, and tree maps.
Scatter plots are a type of graph that plots data points on a two-dimensional grid, with one
variable on the x-axis and the other on the y-axis. Scatter plots can be used to visualize the
relationships between different features of ransomware samples, such as their encryption
algorithms and file types, and can help analysts identify patterns and trends in the data.
Heat maps are another type of visualization that uses color-coded scales to represent data
values, with higher values being represented by warmer colors and lower values being
represented by cooler colors in ref. [38]. Heat maps can be used to visualize the distribution
of different features of ransomware samples and can help analysts identify clusters or
outliers in the data. Treemaps are a type of visualization that uses nested rectangles to
represent data values, with the size of the rectangles representing the value and the color
representing the category in ref. [39]. Treemaps can be used to visualize the relationships
between different categories of ransomware samples and can help analysts identify patterns
and trends in the data. Visualization techniques are particularly useful for ransomware
classification because they allow analysts to identify patterns and trends quickly and easily
in large datasets and can help them identify similarities and differences between different
ransomware families. By visualizing the data in this way, analysts can more easily identify
clusters and outliers, and can use these insights to better understand the TTPs in ref. [40].
Information 2024, 15, 46 11 of 29

3.3. Ransomware’s Features Tracking System

Our proposed ransomware classification, clustering, and detection system aims to pro-
vide a comprehensive approach to analyzing and classifying different types of ransomware.
By using a combination of static analysis, dynamic analysis, and visualization techniques,
our system can extract a wide range of features from ransomware samples and generate
similarity matrices based on these features. These matrices can then be compared using
a variety of comparison algorithms to identify similarities and differences between the
samples, and the resulting similarity scores can be used to classify the samples into different
categories, such as families, variants, and versions. One of the key features of our system is
the ability to identify the amount of code shared by two malicious ransomware binaries be-
fore they are assembled. This can be especially useful for ransomware analysts and reverse
engineers as it can help them better understand the TTPs of different ransomware families
and identify common patterns and trends in ref. [41]. By providing a joint collaborative
analysis platform, our system allows analysts to avoid having to redo tedious tasks that
have already been done by others and enables them to work together more efficiently and
effectively in ref. [42]. Overall, our proposed ransomware classification, clustering, and de-
tection system offers a powerful and comprehensive approach to analyzing and classifying
different types of ransomware. By providing a joint collaborative analysis platform and
the ability to identify the amount of code shared by two malicious ransomware binaries, it
helps analysts and reverse-engineers better understand the TTPs of different ransomware
families and develop more effective defense and response strategies. Ransomware features
tracking refers to the process of identifying and tracking the characteristics and behavior
Information 2024, 15, x FOR PEER REVIEW 12 of 32
of different types of ransomware over time. This is an important task for cybersecurity
professionals as it allows them to better understand the TTPs of different ransomware
families and to develop more effective defense and response strategies in ref. [43]. There are
different approaches
several different that can that
approaches be used
can beforused
ransomware featurefeature
for ransomware tracking, including
tracking, the
including
Jaccard index,index,
the Jaccard N-grams, and shared
N-grams, feature
and shared analysis.
feature We classified
analysis. malware
We classified samples
malware into
samples
“bags of features”
into “bags beforebefore
of features” comparing them; them;
comparing features could could
features be strings, hashes,hashes,
be strings, exportexport
and
import tables,tables,
and import and API
andcalls.
API Shared features
calls. Shared between
features two malware
between samples
two malware are shown
samples in
are shown
Figure
in Figure3. Shared
3. Shared feature
featureanalysis
analysisinvolves
involves identifying common characteristics
identifying common characteristicsororbe-
behaviors sharedby
haviors shared bydifferent
differentmalware
malwaresamples.
samples.ThisThiscan
canbebedone
doneusing
usingtechniques
techniquessuch
suchas
asstatic
staticanalysis,
analysis,dynamic
dynamicanalysis,
analysis,and
andmachine
machinelearning,
learning,and
andcan
canbebeparticularly
particularlyuseful
usefulfor
fortracking
tracking thethe evolution
evolution of ransomware
of ransomware families
families by analyzing
by analyzing the shared
the shared features features of
of different
different
ransomware ransomware
samplessamples in ref. [44].
in ref. [44].

Figure 3. 3.
Figure Shared features
Shared between
features two
between malware
two samples
malware [45].
samples [45].

The
The Jaccard
Jaccard index
index is is a measure
a measure ofof
thethe similarity
similarity between
between twotwo sets
sets of of data
data in in
ref.ref. [46].
[46].
It Itis iscalculated
calculatedbybydividing
dividing the
the size
size ofof theintersection
the intersection ofof
thethe two
two sets
sets byby the
the size
size ofofthe the
union of the two sets. The Jaccard index is often used in cybersecurity
union of the two sets. The Jaccard index is often used in cybersecurity to measure the to measure the
similarity between different malware samples, and it can be particularly
similarity between different malware samples, and it can be particularly useful for useful for tracking
the evolution
tracking of different
the evolution ransomware
of different families over
ransomware time. over
families By calculating
time. By the Jaccard index
calculating the
for different pairs of ransomware samples, analysts can identify
Jaccard index for different pairs of ransomware samples, analysts can identify how how similar or dissimilar
similar
or dissimilar they are and can use this information to better understand the TTPs of the
different families. The Jaccard index has emerged as the most generally adopted—and
with good reason. It quantifies the degree of overlap between two sets of malware features
simply and sensibly, providing us with the percentage of unique features common to both
sets normalized by the percentage of unique features in each group in ref. [47] (JI =
The Jaccard index is a measure of the similarity between two sets of data in ref. [46].
It is calculated by dividing the size of the intersection of the two sets by the size of the
union of the two sets. The Jaccard index is often used in cybersecurity to measure the
Information 2024, 15, 46 similarity between different malware samples, and it can be particularly useful for
12 of 29
tracking the evolution of different ransomware families over time. By calculating the
Jaccard index for different pairs of ransomware samples, analysts can identify how similar
or dissimilar
they theyuse
are and can arethis
and can use this
information to information to better
better understand understand
the TTPs the TTPsfamilies.
of the different of the
different
The Jaccardfamilies.
index The Jaccard index
has emerged as thehas
mostemerged
generallyas adopted—and
the most generally adopted—and
with good reason. It
with good reason.
quantifies It quantifies
the degree of overlapthe degree two
between of overlap
sets of between
malwaretwo sets of
features malware
simply and features
sensibly,
simply andus
providing sensibly,
with theproviding us with
percentage the percentage
of unique of uniquetofeatures
features common both setscommon to both
normalized by
sets normalizedofby
the percentage the percentage
unique of unique
features in each groupfeatures in (JI
in ref. [47] each group in ref.
= intersection [47] (JI =
length/union
length). Thelength/union
intersection Jaccard Indexlength).
explanation is shown
The Jaccard in Figure
Index 4.
explanation is shown in Figure 4.

Figure
Figure4.
4.Jaccard
JaccardIndex
Indexbetween
betweentwo
twomalware
malwaresamples.
samples.

UsingN-grams
Using N-gramstototrack
trackthe
theevolution
evolutionof ofransomware
ransomwarefamilies
familiescan
canbe
beaapowerful
powerfultooltool
for cybersecurity
for cybersecurityprofessionals
professionals in in ref.
ref. [48].
[48]. By
By extracting
extracting subsequences
subsequencesofof specific
specific lengths
lengths
from sequential
from sequential data
data and
and comparing
comparing them them using
using aa similarity
similarity function,
function, itit isis possible
possible to
to
determine the level of code commonality between two malware samples. This can be
especially useful for identifying patterns and trends in the TTPs of different ransomware
families and can help analysts develop more effective defense and response strategies.
The similarity function used in this process should have certain properties to ensure
accurate and reliable results. It should produce a normalized value that allows all similarity
comparisons to be made on the same scale, and it should be able to accurately estimate
the amount of code sharing between two samples. Additionally, it should be able to
provide insight into why it performs well in modeling code similarities. Overall, the use
of N-grams and a similarity function can be an effective way to track the evolution of
ransomware families and better understand their TTPs. By extracting and comparing
subsequences of specific lengths, analysts can identify common patterns and trends and can
use this information to develop more effective defense and response strategies in ref. [49].
We employ a similarity function with the following properties to determine the level of
code commonality between two malware samples shown in Figure 5. In the provided
figure, each number corresponds to a distinct malware sample included in the analysis.
The purpose of these numbers is to uniquely identify and label each malware instance
for clarity. The arrows in the figure represent the presence of similar n-gram features
between different malware samples. Specifically, the direction of the arrows indicates
the connection from the source malware sample to the target sample, demonstrating a
shared set of n-gram features. This visual representation highlights the commonalities
in the n-gram patterns found in the corresponding malware instances. By examining the
arrows and associated numbers, one can gain insights into the relationships and similarities
among the various malware samples based on their n-gram features. This analysis aids in
understanding the potential connections and patterns within the dataset, contributing to a
more comprehensive comprehension of the malware landscape under investigation.
gram features. This visual representation highlights the commonalities in the n-gram pat-
terns found in the corresponding malware instances. By examining the arrows and asso-
ciated numbers, one can gain insights into the relationships and similarities among the
various malware samples based on their n-gram features. This analysis aids in under-
Information 2024, 15, 46
standing the potential connections and patterns within the dataset, contributing to a13more
of 29
comprehensive comprehension of the malware landscape under investigation.

Figure5.
Figure 5. N-gram
N-gram extracted
extracted from
fromransomware
ransomwaresamples.
samples.

4.
4. Experimental
Experimental WorkWorkand
andDetection
DetectionScheme
Scheme
In this section, we present the experimental
In this section, we present the experimental work workdonedone to study
to study ransomware
ransomware vi-
visual-
sualization techniques and shared static and dynamic features between
ization techniques and shared static and dynamic features between diﬀerent ransomware different ran-
somware
samples. samples.
Ransomware Ransomware visualization
visualization techniques
techniques are presented
are presented in Section
in Section 4.1,shared
4.1, while while
shared static and dynamic features are presented in Sections 4.2 and 4.3 respectively.
static and dynamic features are presented in Sections 4.2 and 4.3 respectively. In Section In
Section 4.4, our lab setup is presented. Time complexity is presented in Section
4.4, our lab setup is presented. Time complexity is presented in Section 4.5. Finally, in4.5. Finally,
in Section 4.6, we present the results from static and dynamic analyzers.
Section 4.6, we present the results from static and dynamic analyzers.
4.1. Visualization Techniques
4.1. Visualization Techniques
In our approach to using visualization techniques to classify and analyze ransomware
In our
samples, weapproach
started bytoselecting
using visualization
a dataset of techniques
ransomware tosamples
classify (most
and analyze
matched ransom-
ones)
ware samples, we started by selecting a dataset of ransomware samples
and then applied a similarity matrix using a static and dynamic analyzer to find a fast (most matched
ones)
and and then
suitable wayapplied a in
to use it similarity
our finalmatrix usingToa identify
approach. static andthedynamic analyzer
most similar to find
samples, wea
fast and
used suitable
a cluster way to
engine to analyze
use it in the
ourdata
finaland
approach. To identify
report the samplesthe most
with the similar
highestsamples,
level of
similarity. We then used static and dynamic analysis techniques to generatehighest
we used a cluster engine to analyze the data and report the samples with the level
a similarity
of similarity. We then used static and dynamic analysis techniques to generate
matrix for each group of samples. This matrix allowed us to visualize the relationships a similarity
matrix for
between theeach groupsamples
different of samples. This matrix
and identify allowed
patterns us to visualize
and trends the Once
in the data. relationships
we had
between the diﬀerent samples and identify patterns and trends in the data.
generated the similarity matrix, we used it to validate the query-sample similarity with Once we had
the
matched samples. This helped us to confirm that the samples in the first group were indeed
the most similar ones and allowed us to identify any discrepancies or errors in the data.
Constructing nodes and connections between them helps to view and graph the data’s
connections. In other words, each sample is a node, and we may connect them and declare
they are comparable if they have similar DLL imports.
• The cluster engine reported the most similar samples from the set.
• There is a need to validate the query-sample similarity with the matched samples.
• It is also important to reveal intelligence from the data and discover the patterns.
The graphical representation in Figure 6 elucidates the Vendors Detection for a collec-
tion of ransomware samples. It is worth emphasizing that not all security vendors have
uniformly detected every sample within this dataset. This observation underscores the in-
herent variability in ransomware detection rates across different security solutions, thereby
emphasizing the critical need for robust and comprehensive cybersecurity strategies. In
the ensuing discussion, we will delve deeper into the implications of these detections. The
samples characterized by a consistent segment count are indicative of non-packed samples,
reflecting their unaltered and original nature within the dataset. This differentiation is
instrumental in our analysis of the dataset’s composition and assists in identifying potential
trends or variations among the samples. The numerical results for visualization techniques
can be found in Table 4.
The data depicted in Figure 7 reveals a noteworthy observation concerning the sample
sizes utilized within the context of this study. It is evident from the graphical represen-
tation that a predominant portion of the collected samples exhibited uniformity in their
respective sizes.
Information 2024, 15, 46 14 of 29
Information 2024, 15, x FOR PEER REVIEW 15 of 32

Vendors Performance Disparities in Family Sample Detection

50
Number of Detections

0
0 10 20 30 40 50 60 70 80 90
Sample ID

Figure 6. Vendors Performance for Samples Detection.

TableThe data depicted

4. Visualization of in Figure 7results.
numerical reveals a noteworthy observation concerning the sam-
ple sizes utilized within the context of this study. It is evident from the graphical repre-
sentation that Number
a predominant portion of the collected samples exhibited
of Samples uniformity in their
80 Samples
respective sizes.Vendor Detection 100% (40 + vendors) detectability
57 samples with 55,200 bytes
Sample Size
23 samples with 55,300 bytes
IAT Count 111 imports
Internal Disassembled Functions Count 185 functions
Sample Segments 5 sections

Figure 8, which illustrates the counts of the Import Address Table (IAT) across various
samples, unequivocally highlights a remarkable consistency in these counts across all
the analyzed samples. This compelling consistency within the IAT counts underscores
the robustness of this static feature as a key determinant for effectively classifying and
clustering ransomware samples within our laboratory experiments.
Figure 9 provides an insightful depiction of the counts of internal functions across
the examined samples. Notably, a striking similarity becomes evident as one observes
the distribution of these internal function counts. This remarkable uniformity among
the samples in terms of internal function counts further solidifies the findings from our
laboratory experiments, affirming the robustness of our research results.
In Figure 10, we observe the counts of segments within portable executable (PE) files.
This analysis allows us to discern between packed and non-packed samples in our dataset.
Information 2024, 15, 46
Information 2024, 15, x FOR PEER REVIEW 16 of1532of 29

Sample Size Distribution

55,650

55,600

55,550
Sample Size in Bytes

55,500

55,450

55,400

55,350
Information 2024, 15, x FOR PEER REVIEW 17 of 32
0 10 20 30 40 50 60 70 80 90
Sample ID

Figure 7. Sample Size Distribution.

Figure 7. Sample Size Distribution.
Figure 8, which illustrates the counts of the Import Address Table (IAT) across vari-
ous samples, unequivocally highlights a remarkable consistency in these counts across all
IAT Count
the analyzed samples. Distribution
This compelling consistency within the IAT counts underscores the
120 robustness of this static feature as a key determinant for eﬀectively classifying and clus-
tering ransomware samples within our laboratory experiments.

100

80
Number of Imports

0
0 10 20 30 40 50 60 70 80 90
Sample ID

Figure 8. IAT Count Distribution.

Figure 8. IAT Count Distribution.
Figure 9 provides an insightful depiction of the counts of internal functions across
the examined samples. Notably, a striking similarity becomes evident as one observes the
distribution of these internal function counts. This remarkable uniformity among the sam-
ples in terms of internal function counts further solidifies the findings from our laboratory
experiments, aﬃrming the robustness of our research results.
Information 2024, 15, 46
Information 2024, 15, x FOR PEER REVIEW 18 of 32 16 of 29

Internal Disassembled-Functions Count Distribution

200

180

160
Number of Internal Functions

140

120

100

0
0 10 20 30 40 50 60 70 80 90
Sample ID

Information 2024, 15, x FOR PEER REVIEW 19 of 32

Figure 9. Internal Disassembled-Functions Count Distribution.
Figure 9. Internal Disassembled-Functions Count Distribution.
In Figure 10, we observe the counts of segments within portable executable (PE) files.
This analysis allows us to discern between packed and non-packed samples in our dataset.
Sample Segments Count Distribution
6

5
Number of Segments

0
0 10 20 30 40 50 60 70 80 90
Sample ID

Figure 10. Sample Segments Count Distribution.

Figure 10. Sample Segments Count Distribution.
Visualization comparison is a powerful tool for analyzing and classifying different
types of ransomware. By visualizing the relationships between different samples, it is pos-
sible to identify common patterns and trends and to better understand the TTPs of differ-
ent ransomware families. In our research, we found that the Import Address Table and
Internal Function count were the most effective features for finding the similarity between
different ransomware samples. The Import Address Table is a data structure in a Windows
executable file that contains the addresses of imported functions from other dynamic link
Information 2024, 15, 46 17 of 29

Visualization comparison is a powerful tool for analyzing and classifying different

types of ransomware. By visualizing the relationships between different samples, it is
possible to identify common patterns and trends and to better understand the TTPs of
different ransomware families. In our research, we found that the Import Address Table and
Internal Function count were the most effective features for finding the similarity between
different ransomware samples. The Import Address Table is a data structure in a Windows
executable file that contains the addresses of imported functions from other dynamic
link libraries (DLLs). The Internal Function count is the number of functions defined in
the sample. By analyzing these features, we were able to identify common patterns and
trends in the data and to better understand the TTPs of different ransomware families.
We also found that obfuscated or packed samples could be identified by comparing the
PE File Segments of different samples. The PE File Segments are the different parts of a
Windows executable file that contain code, data, and other information. By comparing the
Segment count of different samples, we were able to identify those that had been packed or
obfuscated as the default Segment count for a sample is typically around five.

4.2. Static Ransomware Classification System

The ransomware classification and detection system that we have proposed is designed
to analyze and classify different types of ransomware using a combination of static and
dynamic analysis techniques. It works by submitting samples to the system through a
Python API and then applying a classification and clustering algorithm using disassembled
binaries to extract various features and generate a mnemonic N-gram. The system then
calculates the Jaccard similarity between different samples and performs clustering on those
samples to group them into classified clusters. This allows analysts to identify common
patterns and trends in the TTPs of different ransomware families and to better understand
the relationships between different samples. The proposed static ransomware classification
and detection system includes a System Controller with APIs for submitting ransomware
samples to the static analysis server and querying the MongoDB NoSQL database for
various properties. The Analyzer server on Windows retrieves the static characteristics and
attributes from the given samples through the disassembler process. Overall, the proposed
ransomware classification and detection system provides a comprehensive approach to
analyzing and classifying different types of ransomware. By using a combination of static
and dynamic analysis techniques, it can extract a wide range of features from ransomware
samples and generate a mnemonic N-gram to identify common patterns and trends in the
TTPs of different ransomware families. By performing clustering on the samples, it also
allows analysts to better understand the relationships between different samples and to
develop more effective defense and response strategies. The proposed static ransomware
classification and detection system diagram is illustrated in Figure 11.

4.3. Dynamic Ransomware Classification System

In this paper, we have presented a novel and efficient malware indexing system that
provides a range of search and analysis capabilities for ransomware analysts and reverse
engineers. The system is designed to analyze native binaries and to identify common
patterns and trends in the TTPs of different ransomware families. One of the key features
of the system is its ability to perform similarity checks between different samples and
to classify and cluster them based on their features and attributes. This allows analysts
to identify commonalities and differences between different ransomware families and
to better understand the relationships between different samples. One limitation of the
system is that it is mainly designed to analyze native binaries and may not be as effective at
analyzing packed or obfuscated samples. Many malware authors use packing techniques
to hide and obscure their code, making it more difficult to analyze and classify. However,
the system is still able to provide useful insights and intelligence for analysts working
with packed or obfuscated samples, as it relies on hybrid data from static and dynamic
analysis to identify common patterns and trends in the TTPs of different ransomware
ransomware samples to the static analysis server and querying the MongoDB NoSQL da-
tabase for various properties. The Analyzer server on Windows retrieves the static char-
Information 2024, 15, 46
acteristics and attributes from the given samples through the disassembler process.18Over- of 29
all, the proposed ransomware classification and detection system provides a comprehen-
sive approach to analyzing and classifying different types of ransomware. By using a com-
families.
binationOverall,
of staticthe
andproposed
dynamicmalware
analysis indexing system
techniques, it canis extract
a valuable toolrange
a wide for analysts and
of features
reverse engineers working with ransomware samples. By providing
from ransomware samples and generate a mnemonic N-gram to identify common pat- a range of search and
analysis
terns andcapabilities,
trends in it helps
the TTPsanalysts to identify
of different commonfamilies.
ransomware patterns Byandperforming
trends in the TTPs of
clustering
different ransomware families and to better understand the relationships between
on the samples, it also allows analysts to better understand the relationships between dif- different
samples. This enables
ferent samples and to them
developto develop more effective
more effective defense anddefense and response
response strategies.strategies
The pro-
and
posed static ransomware classification and detection system diagram proposed
to more effectively mitigate the threat posed by ransomware. The dynamic
is illustrated in Fig-
ransomware
ure 11. classification and detection system diagram is illustrated in Figure 12.

Information 2024, 15, x FOR PEER REVIEW 21 of 32

Figure 11. Static ransomware classification system block diagram.
Figure 11. Static ransomware classification system block diagram.
4.3. Dynamic Ransomware Classification System
In this paper, we have presented a novel and efficient malware indexing system that
provides a range of search and analysis capabilities for ransomware analysts and reverse
engineers. The system is designed to analyze native binaries and to identify common pat-
terns and trends in the TTPs of different ransomware families. One of the key features of
the system is its ability to perform similarity checks between different samples and to
classify and cluster them based on their features and attributes. This allows analysts to
identify commonalities and differences between different ransomware families and to bet-
ter understand the relationships between different samples. One limitation of the system
is that it is mainly designed to analyze native binaries and may not be as effective at ana-
lyzing packed or obfuscated samples. Many malware authors use packing techniques to
hide and obscure their code, making it more difficult to analyze and classify. However,
the system is still able to provide useful insights and intelligence for analysts working
with packed or obfuscated samples, as it relies on hybrid data from static and dynamic
analysis to identify common patterns and trends in the TTPs of different ransomware fam-
ilies. Overall, the proposed malware indexing system is a valuable tool for analysts and
reverse engineers working with ransomware samples. By providing a range of search and
analysis capabilities, it helps analysts to identify common patterns and trends in the TTPs
Figure
of 12. Dynamic
different ransomware
ransomware classification
families system block diagram.
and to better
Figure 12. Dynamic ransomware classification systemunderstand the relationships between dif-
block diagram.
ferent samples. This enables them to develop more effective defense and response strate-
4.4.Lab
4.4.
gies Lab
and Setup
Setup
to more effectively mitigate the threat posed by ransomware. The proposed dy-
namicThe ransomware
The labsetup
lab setupforclassification
for
ourour and
proposed
proposed detection
malware
malware system
indexing
indexing diagram
system
system isincludes
illustrated
includes in machines
a range
a range of Figure 12.
of ma-
chines
and toolsand tools
that arethat are designed
designed to supportto support the analysis
the analysis and classification
and classification of different
of different types
types
of of malware
malware presented
presented in Table
in Table 5, including
5, including ransomware.
ransomware. TheThe controller
controller is isthe
thecentral
central
componentthat
component thatinteracts
interactswith
withall
allthe
theanalysis
analysisengines
enginesandandperforms
performsdatabase
databasequeries
queriestoto
retrieverelevant
retrieve relevantsamples
samplesin inresponse
responsetotoanalyst
analystrequests.
requests.ItItalso
alsosubmits
submitssamples
samplesto tothe
the
analyzer
analyzerserver
serverforforanalysis.
analysis.The
Theanalyzer
analyzer server is responsible
server for disassembling
is responsible for disassembling executable
execut-
binaries into a into
able binaries set ofa static
set offeatures and extracting
static features pertinent
and extracting properties
pertinent and features
properties and that the
features
that the controller uses to classify the binaries. The disassembler is an important part of
the analyzer server, as it is responsible for breaking down the samples into their constitu-
ent parts and extracting the relevant features. MongoDB is used to index all the extracted
features so that they can be queried and analyzed using the Jaccard Index similarity func-
tion. This allows analysts to identify common patterns and trends quickly and easily in
Information 2024, 15, 46 19 of 29

controller uses to classify the binaries. The disassembler is an important part of the analyzer
server, as it is responsible for breaking down the samples into their constituent parts and
extracting the relevant features. MongoDB is used to index all the extracted features so
that they can be queried and analyzed using the Jaccard Index similarity function. This
allows analysts to identify common patterns and trends quickly and easily in the TTPs of
different ransomware families and to better understand the relationships between different
samples. VirusTotal is another important tool in our lab setup, as it provides a range of
clustering and similarity-matching capabilities that allow analysts to group and classify
different samples based on their features and attributes. It also includes a comprehensive
graph view that enables analysts to visualize the relationships between different malware
objects and to better understand how they are associated with specific campaigns. Overall,
the lab setup for our proposed malware indexing system is designed to provide analysts
with the tools and resources they need to effectively analyze and classify different types of
malware, including ransomware. It includes a range of machines and tools that support the
disassembly and analysis of executable binaries, as well as powerful indexing and querying
capabilities that enable analysts to identify common patterns and trends in the TTPs of
different malware families. The full-matched ransomware classification and detection
system diagram is illustrated in Figure 13.

Table 5. Machines and tools used in dynamic analyzer lab setup.

Components Purpose Technology Output

The main component that interacts with all the analysis engines
and performs database queries to retrieve relevant samples in
response to analyst requests is the controller. This component is the
central hub of the malware indexing system and is responsible for
coordinating the analysis and classification of different types of
malware, including ransomware. It receives requests from analysts
System and communicates with the various analysis engines to gather the
APIs modules Hybrid attributes
Controller necessary data and information. The controller also interacts with
the database to retrieve relevant samples based on the analyst’s
requests, ensuring that the analyst has access to the most up-to-date
and relevant data. Overall, the controller plays a critical role in the
operation of the malware indexing system, enabling analysts to
access the data and information quickly and easily in order to
effectively analyze and classify different types of malware.
The analyzer server is responsible for converting executable
Static features:
binaries into a set of static features through disassembly to extract
Function names
relevant static attributes that can be used by the controller to
Imported libraries
classify the binaries. This process involves breaking down the
Strings and
samples into their constituent parts and extracting the relevant
string patterns
features, such as function names, imported libraries, and other
File size and metadata
characteristics. The analyzer server uses a disassembler tool to
perform this process, which is an important part of the overall
Dynamic features:
malware indexing system. In addition to extracting static features,
Network traffic and
Analyzer the analyzer server is also responsible for converting the set of Disassembling
connection
Engine binaries into dynamic attributes through sandboxing. Sandboxing Decompiling
information
involves running the samples in a controlled environment and
API Calls
observing their behavior to extract dynamic attributes, such as
File system changes
network traffic, file system changes, and other activities. These
Process and thread
dynamic attributes can be used to supplement the static features to
behavior
classify and analyze the samples more accurately. Overall, the
Memory modifications
analyzer server plays a critical role in the operation of the malware
Registry changes
indexing system, providing the necessary data and information
System and
that enables analysts to effectively analyze and classify different
library calls
types of malware.
Information 2024, 15, 46 20 of 29

Table 5. Cont.

Components Purpose Technology Output

MongoDB is a document-based store for all the extracted attributes by
the analysis engines. It is an NoSQL database that is designed to
handle large amounts of data and to support flexible and scalable data
models. In the context of the malware indexing system, MongoDB is
Receive
used to store all the extracted static and dynamic features from the
controller
analyzer server, as well as any additional metadata or information
queries for
MonogoDB about the samples. These data are then used by the controller to Hybrid attributes
binaries
perform various analysis and classification tasks, such as calculating
features and
the Jaccard Index similarity between different samples or clustering the
attributes.
samples into different categories. By providing a centralized repository
for all the extracted features, MongoDB enables analysts to access and
analyze the data more easily, and to perform complex queries and
searches across the entire dataset.
Ghidra used to extract static features from malware samples. Ghidra is
a powerful and feature-rich tool that is developed and maintained by
the National Security Agency (NSA). It provides a wide range of
features and functionality that are useful in the analysis of malware,
including the ability to disassemble code, view and edit assembly
Disassembler Static features of
instructions, and perform various other tasks. In the context of the Disassembling
(Ghidra 11.0) the samples
malware indexing system, Ghidra could be used to disassemble the
samples to extract static features such as function names, imported
libraries, and other characteristics. These features could then be stored
in the MongoDB database for use in various analysis and
classification tasks.
Cuckoo Sandbox is a powerful and widely used tool that is used to
extract dynamic features from malware samples. It is an open-source
sandboxing platform that allows users to analyze the behavior of
malware in a controlled environment. By analyzing the behavior of
malware in a sandbox, analysts can extract various dynamic features
such as network traffic, file system changes, and other characteristics
Sandbox Dynamic features
that may not be visible through static analysis alone. In the context of Sandboxing
(Cuckoo) of the samples
the malware indexing system, Cuckoo Sandbox could be used to
extract dynamic features from the samples and store them in the
MongoDB database. These dynamic features could be used in
conjunction with the static features extracted through disassembly to
provide a more complete picture of the malware’s behavior and
capabilities.
Python PyCharm is a powerful integrated development environment
(IDE) that is often used in the development of Python programs. It is a
System scripts
popular choice among developers due to its feature-rich set of tools
Python Database
and capabilities, including code completion, debugging, testing, and
PyCharm Python IDE connections script
deployment. In the context of the malware indexing system, Python
(2023.3.2) Sample
PyCharm could be used to develop and maintain the various
submissions script
components of the system, including the analysis engines, the
controller, and the database.
VirusTotal is a powerful tool that allows users to scan files and URLs
against a vast array of antivirus engines and other security tools. This
enables analysts to identify malicious content and track the evolution
of malware over time quickly and easily. In our study, we used
VirusTotal to collect a large dataset of ransomware samples, which we
VirusTotal Collected
then analyzed using various techniques such as static analysis,
VirusTotal Hunting ransomware
dynamic analysis, and visualization. By leveraging the power of
feature samples
VirusTotal, we were able to gather a large and diverse set of
ransomware samples quickly and efficiently, which allowed us to more
accurately and effectively classify and cluster the samples. Overall,
VirusTotal proved to be a valuable resource in our study, and it is a
powerful tool that is widely used in the field of malware analysis
Information 2024,
Information 15,15,
2024, 46 x FOR PEER REVIEW 2221of
of 32
29

Figure 13. Full-matched ransomware classification system block diagram.

Figure 13. Full-matched ransomware classification system block diagram.
Table
4.5. 5. Machines
Time and tools used in dynamic analyzer lab setup.
Complexity
Components Purpose
In the realm Technology
of malware detection and classification, Output
the consideration of time com-
plexity isthat
The main component paramount, as the
interacts with all potential damages
the analysis en- inflicted by threats may occur before a
detection
gines and performs or classification
database queries tosystem has relevant
retrieve the chance to identify them. Understanding the time
efficiencytoofanalyst
samples in response our system is crucial
requests is theforcontroller.
ensuring timely responses to potential threats [50–52].
This component is the central hub of the malware indexing
4.5.1. Static vs. Dynamic Analysis
system and is responsible for coordinating the analysis and
Our system
classification of different types incorporates both staticran-
of malware, including and dynamic analysis approaches to strike a
somware. It receives requests from analysts and communi- analysis, while fast, may exhibit reduced
balance between speed and accuracy. Static
accuracy on certain samples. On the other hand, dynamic analysis, although accurate,
System Con- cates with the various analysis engines to gather the neces-
tends to be slower in terms of analysis time. APIs modules Hybrid attributes
troller sary data and information. The controller also interacts with
the database to retrieve
4.5.2. Hybridrelevant samples
Approach for based
Optimalon Time
the ana-
Efficiency
lyst’s requests, ensuring that the analyst has access to the
To address this trade-off, we have implemented a hybrid approach that combines
most up-to-date and relevant data. Overall, the controller
the strengths of both static and dynamic analyses. This hybridization aims to achieve an
plays a criticaloptimal
role in the operation
average time of the malware
complexity forindexing
our system.
system, enabling analysts to access the data and information
Static in
quickly and easily analysis
order totime: approximately
effectively analyze 5–10 s per sample.
and clas-
Dynamic analysis time: approximately 30–60 s per sample.
sify different types of malware.
The analyzer server Byisintegrating
responsiblestatic and dynamic
for converting analyses, we have achieved an Static
executable average time com-
features:
plexity of 2–5 s per sample. Additionally,
binaries into a set of static features through disassembly to samples that have been previously
Function namesand
analyzed
added to our database incur zero seconds
extract relevant static attributes that can be used by the con- of analysis time during subsequent
Importedevaluations.
libraries
Thisthe
troller to classify strategic
binaries.combination
This process enables
involvesusbreak-
to deliver efficient and accurate results
Strings andwithin
string a
reasonable time frame.
ing down the samples into their constituent parts and ex- patterns
tracting the relevant features, such as function names, im- File size and
Analyzer En- 4.5.3. Continuous Database Augmentation Disassembling
ported libraries, and other characteristics. The analyzer metadata
gine To further enhance the time efficiency of ourDecompiling
system, we encourage users to contin-
server uses a disassembler tool to perform this process,
uously contribute samples to our database. By doing so, the likelihood of encountering
which is an important part of the overall malware indexing Dynamic features:
previously analyzed samples increases. Consequently, the analysis time for these samples
system. In addition to extracting static features, the analyzer Network traffic and
becomes virtually instantaneous, offering an additional layer of efficiency.
server is also responsible for converting the set of binaries connection infor-
into dynamic attributes through sandboxing. Sandboxing mation
involves running the samples in a controlled environment API Calls
Information 2024, 15, 46 22 of 29

4.5.4. Proactive Sample Analysis

Users are also encouraged to submit suspicious samples to our system for analysis
before execution. This proactive approach enables preemptive identification of potential
threats, contributing to an overall improvement in system responsiveness.
In conclusion, our approach to time complexity involves a thoughtful integration
of static and dynamic analyses, coupled with continuous database augmentation and
proactive sample analysis. This multifaceted strategy ensures a swift and accurate response
to emerging threats in the ever-evolving landscape of malware detection.

4.6. Results
The results of our analysis show that the use of minhash and Jaccard index for feature
comparison is an effective method for accurately estimating the degree of code sharing
between different ransomware samples. By applying minhash to the strings, Import
Address Table, and API call features extracted from our ransomware samples, we were
able to identify highly similar samples with a high degree of accuracy. This approach
allowed us to cluster the samples into distinct groups, enabling us to identify relationships
between different ransomware families and variants more easily. In addition to the minhash
and Jaccard index, we also employed other visualization techniques, such as the use of
graph networks and dendrograms, to further aid in the analysis and interpretation of the
data. These techniques allowed us to visually explore the relationships between different
malware samples and identify patterns and trends that would have been difficult to discern
using other methods. In our proposed approach, the first step is to store the ransomware
samples in a database or repository. This can be done by manually collecting the samples
or using an automated tool to gather them from various sources such as online scanners or
honeypots. Next, the samples are indexed using a variety of features such as strings, Import
Address Table, or API calls. These features are extracted from the samples using static or
dynamic analysis techniques and stored in the database for later use. Once the samples are
indexed, analysts can search for specific samples or groups of samples using various search
criteria such as ransomware family, encryption algorithm, or date of discovery. Finally,
the similarity between the samples can be visualized using various techniques such as
clustering or similarity matrices. These visualizations can help analysts quickly understand
the relationships between different ransomware samples and identify patterns or trends in
the data.
i. Strings-Based Similarity
We propose a method for identifying the similarity between different ransomware
samples using strings as a feature. By extracting all contiguous printable sequences of char-
acters from the samples and generating the Jaccard index between all pairs of ransomware
samples based on their common string relationships, we can compute the strings-based
ransomware similarity. Strings taken from a binary tend to be format strings established
by the programmer, which compilers in general do not transform, regardless of which
compilers the ransomware authors use or what parameters they provide the compilers.
This strategy allows us to bypass the compiler difficulty and accurately identify similarities
between different ransomware samples. The similarity matrix generated using extracted
static strings as a feature is illustrated in Figure 14.
In our static analysis, the absolute time required per sample is consistently 5 s, indica-
tive of the efficiency of our static analyzer. Additionally, the absolute Jaccard index for
similarity among the samples is 0.3. This Jaccard index value highlights the fast-processing
nature of our static analysis; however, it is important to note that a Jaccard index of
0.3 signifies a lower level of accuracy in capturing similarities between the samples. This
trade-off between speed and accuracy is a key consideration in our approach, aiming to
strike a balance that aligns with the requirements of timely detection.
Information 2024, 15, x FOR PEER REVIEW 26 of 32

Information 2024, 15, 46 of 0.3 signifies a lower level of accuracy in capturing similarities between the samples.
23 of 29
This trade-oﬀ between speed and accuracy is a key consideration in our approach, aiming
to strike a balance that aligns with the requirements of timely detection.

Figure14.
Figure 14.The
Thesimilarity
similaritymatrix
matrixgenerated
generatedusing
usingstring
stringfeatures.
features.

ii. ii.
ImportImport Address
Address Table–Based
Table–Based Similarity
Similarity
Ransomwareanalysts
Ransomware analystsand andreverse
reverseengineers
engineerscan canuse usethetheImport
ImportAddress
AddressTable Table(IAT)
(IAT)
featuretotoidentify
feature identifythe theshared
sharedcodecodebetween
betweendifferent
differentransomware
ransomwaresamples. samples.By Bycomparing
comparing
theIAT
the IATofoftwo twosamples,
samples,analysts
analystscan candetermine
determinethe theextent
extentto towhich
whichthe thesamples
samplesuse usethe
the
same imported DLLs and functions. This information
same imported DLLs and functions. This information can be useful in identifying the can be useful in identifying the re-
relationships between different ransomware families and in understanding the evolutionof
lationships between different ransomware families and in understanding the evolution
individual
of individual families
families over
overtime.
time. To To
generate
generate thetheIAT-based
IAT-based similarity
similaritymatrix,
matrix,analysts
analystscan
extract
can extractthe the
IATIAT from eacheach
from sample
sampleandandcompute
compute the Jaccard
the Jaccard index between
index between all pairs of sam-
all pairs of
ples based
samples on their
based on theircommon
common IAT IAT
entries. The resulting
entries. The resulting matrix can then
matrix be visualized
can then using
be visualized
a variety
using of techniques,
a variety of techniques,such such
as clustering
as clustering or network
or network analysis, to identify
analysis, to identify patterns and
patterns
trends
and within
trends withinthe the
data. By using
data. By using the the
IATIATfeature
featurein combination
in combination with other
with static
other andand
static dy-
dynamic analysis techniques, analysts can gain a more comprehensive understanding the
namic analysis techniques, analysts can gain a more comprehensive understanding of of
relationships
the relationships between
between different
differentransomware
ransomware samples
samples andandcancanmore effectively
more effectivelyclassify and
classify
and cluster
cluster them them for further
for further analysis.
analysis. Overall,
Overall, the usethe of use theofIAT
the feature
IAT feature in ransomware
in ransomware anal-
analysis
ysis can can greatly
greatly improve
improve the efficiency
the efficiency and accuracy
and accuracy of malwareof malware classification
classification and
and cluster-
clustering
ing efforts. efforts. The similarity
The similarity matrixmatrix generated
generated usingusing the extracted
the extracted staticstatic
Import Import
addressaddress
table
table as a feature
as a feature is illustrated
is illustrated in Figure
in Figure 15. In15.ourIn our import
import address address tableanalysis,
table (IAT) (IAT) analysis,
the ab-
the absolute time required for processing each sample
solute time required for processing each sample falls within the range of 5 to falls within the range of 10
5 tos. 10 s.
This
This indicates
indicates the efficiency
the efficiency of ourofIAT ouranalysis,
IAT analysis,
striking striking
a balance a balance
between between
speed and speed and
compre-
comprehensive
hensive examination. examination.
Notably,Notably, the absolute
the absolute Jaccard index Jaccard for index for similarity
similarity among samples amongin
samples
the context in the context
of IAT of IAT
analysis is analysis
0.86. Thisishigh
0.86.Jaccard
This high indexJaccard
valueindex
attestsvalue
to the attests
accuracyto theof
accuracy of our IAT analysis, showcasing its effectiveness in
our IAT analysis, showcasing its effectiveness in capturing similarities between samples. capturing similarities between
samples. This combination
This combination of relatively of relatively fast processing
fast processing time and atime highand a high
Jaccard Jaccard
index index
underlines
underlines the efficacy of our approach in achieving both
the efficacy of our approach in achieving both speed and accuracy in import address table speed and accuracy in import
address
analysis. table analysis.
Information 2024,
Information 15,15,
2024, 46x FOR PEER REVIEW 24
27ofof2932

Figure15.
Figure 15.The
Thesimilarity
similaritymatrix
matrixgenerated
generatedusing
usingthe
theImport
ImportAddress
AddressTable
Table(IAT)
(IAT)feature.
feature.

Ransomware’s
Ransomware’sclusteringclusteringisisuseful
usefulfor forgrouping
groupingaalarge
largesetsetofofsamples
samplesinto intoa aknown
known
ororunknown number of groups or clusters, with objects in each
unknown number of groups or clusters, with objects in each cluster having a high cluster having a high de-
degree
gree ofofsimilarity
similarityand andobjects
objectsinin other
other clusters
clusters being
being dissimilar.
dissimilar. WeWe proposed
proposed an an effi-
efficient
cient malware
malware indexing
indexing systemsystemthatthat provides
provides search
search functionalities,
functionalities, similarity
similarity checking,
checking, and
and sample
sample classification
classification andand clustering.
clustering. TheThe system
system mainly
mainly targets
targets native
native binarybinary
files.files.
The
The indexing engine depends on hybrid data from static features
indexing engine depends on hybrid data from static features extraction, comparing differ- extraction, comparing
different ransomware
ent ransomware families
families to to find
find thethesimilarity
similaritymatrix
matrixbetween
between those
those samples.
samples. We Wecom- com-
pared different static features by checking the similarity matrix for
pared different static features by checking the similarity matrix for different ransomware different ransomware
families.
families.OurOurresearch
researchhas hasproven
proventhat thatthetheImport
ImportAddress
AddressTable
Table(IAT)
(IAT)isisthe
thebest
bestfeature
feature
for
for finding similar ransomware samples. The limitations in finding similaritiesbetween
finding similar ransomware samples. The limitations in finding similarities between
ransomware
ransomwaresamples samples areare
thethe
classification
classification andandclustering of the
clustering of packed
the packedsamples. Therefore,
samples. There-
we focused on using a dynamic analyzer integrated with sandboxing
fore, we focused on using a dynamic analyzer integrated with sandboxing to extract to extract dynamic dy-
features like API calls.
namic features Using
like API dynamic
calls. Usinganalyzer
dynamicand static analyzer
analyzer and static features andfeatures
analyzer comparing and
different features-based similarity matrices will help in clustering and classifying packed
comparing different features-based similarity matrices will help in clustering and classi-
and unpacked ransomware samples.
fying packed and unpacked ransomware samples.
iii. API calls-Based
iii. API Similarity
calls-Based Similarity
ToTofind
findsimilarities
similaritiesbetween
betweenransomware
ransomwaresamples,
samples,we weutilized
utilizedAPI
APIcalls
callsasasa adynamic
dynamic
feature. By analyzing the API calls made by a sample during runtime
feature. By analyzing the API calls made by a sample during runtime through sandboxing, through sandboxing,
wewewere
wereableabletotoextract
extractvaluable
valuableinformation
informationabout aboutthe
thesample’s
sample’sbehavior
behaviorand anduse useitittoto
compare with other samples. This method proved particularly
compare with other samples. This method proved particularly effective in identifyingeffective in identifying
packed
packedsamples,
samples,which
whichcan canoften
oftenbebedifficult
difficulttotoclassify
classifyusing
usingstatic
staticfeatures
featuresalone.
alone.Using
Using
API
APIcalls
callsasasa adynamic
dynamicfeature
featureallowed
allowedusustotoaccurately
accuratelycluster
clusterand
andclassify
classifya alarge
largedataset
dataset
ofofransomware
ransomwaresamples,samples,including
includingboth
bothpacked
packedand andunpacked
unpackedsamples.
samples.By Bycomparing
comparingthe the
API
API call similarity matrix between different ransomware families, we were abletotoidentify
call similarity matrix between different ransomware families, we were able identify
shared
sharedbehavior
behaviorand and characteristics that helped
characteristics that helpedus usbetter
betterunderstand
understandthe the relationships
relationships be-
between
tween different samples. The similarity matrix generated using extracted dynamicAPI
different samples. The similarity matrix generated using extracted dynamic API
calls
callsasasa afeature
featureisisillustrated
illustratedininFigure
Figure16.16.InInour
ourdynamic
dynamicanalysis
analysisofofmethod
methodAPI APIcalls,
calls,
the
the absolute time required for processing each sample typically ranges from 30 s to 60s,s,
absolute time required for processing each sample typically ranges from 30 s to 60
contingent upon the complexity of the sample. Despite the relatively longer processing
contingent upon the complexity of the sample. Despite the relatively longer processing
time, this method is designed to provide a thorough and detailed analysis of the dynamic
behavior of samples.
Remarkably, the absolute Jaccard index for similarity among samples in the context
Information 2024, 15, 46 of dynamic analysis method API calls is 1. This perfect matching Jaccard index signifies 25 of 29
full similarity, indicating that the dynamic analysis method precisely identifies identical
Information 2024, 15, x FOR PEER REVIEW 28 of 32
patterns across samples. While the method requires more time for analysis, the perfect
time, this Jaccard
matching methodunderscores
is designed toits provide a thorough
high accuracy and detailed
in capturing analysis
similarities of the samples,
between dynamic
behavior
making
time, of samples.
thisitmethod
a robust tool for comprehensive
is designed dynamic
to provide a thorough andanalysis.
detailed analysis of the dynamic
behavior of samples.
Remarkably, the absolute Jaccard index for similarity among samples in the context
of dynamic analysis method API calls is 1. This perfect matching Jaccard index signifies
full similarity, indicating that the dynamic analysis method precisely identifies identical
patterns across samples. While the method requires more time for analysis, the perfect
matching Jaccard underscores its high accuracy in capturing similarities between samples,
making it a robust tool for comprehensive dynamic analysis.

Figure
Figure 16.
16. The
The similarity
similarity matrix
matrix generated
generated using
using the API
API Call
Call feature.
feature.

Remarkably,
To the absolute
provide a concise Jaccard index for
and comprehensive similarity
overview among
of our samples in
ransomware the context
classification
of dynamic
system, analysisa method
we present detailed API calls is 1.ofThis
comparison perfect matching
key features, Jaccard index
time complexities, and signifies
analysis
full similarity,
methods in theindicating that the and
form of a diagram dynamic
table.analysis method
The diagram precisely
illustrated inidentifies
Figure 17identical
visually
patterns
Figure 16. across
The samples.
similarity matrixWhile the
generated method
using the requires
API Call more
feature. time
encapsulates the essential characteristics of our approach, highlighting the distinct for analysis, the perfect
time
matching Jaccard underscores its high accuracy in capturing similarities
complexities and trade-oﬀs associated with each analyzed feature static strings, static Im- between samples,
port To
making provide
Address a concise
it a robust
Table tool and
for
(IAT), comprehensive
comprehensive
and overview
dynamic APIdynamic of our ransomware classification
calls. analysis.
system, we present aconcise
detailed comparison of key features, time complexities, and analysis
A comparative analysis of ransomware classificationoffeatures
To provide a and comprehensive overview our ransomware
is described classification
in Table 6,
methods
system, in the
we form of
present a a diagram
detailed and table. The
comparison of diagram
key illustrated
features, time in Figure 17 visually
complexities, and analysis
with a numerical comparison between static and dynamic analyzers.
encapsulates
methods in the the form
essential
of acharacteristics
diagram and of our The
table. approach,
diagram highlighting
illustratedthe indistinct
Figure time
17 visually
complexities and trade-oﬀs associated with each analyzed feature static
encapsulates the essential characteristics of our approach, highlighting the distinct strings, static Im- time
port Address Table (IAT), and dynamic API calls.
complexities and trade-offs associated with each analyzed feature static strings, static
A comparative
Import Address Table analysis
(IAT),of and
ransomware
dynamicclassification
API calls. features is described in Table 6,
with a numerical comparison between static and dynamic analyzers.

Figure 17. Ransomware classification system features.

A comparative analysis of ransomware classification features is described in Table 6,

with a numerical comparison between static and dynamic analyzers.
Information 2024, 15, 46 26 of 29

Table 6. Comparative analysis of ransomware classification features.

Feature Strings Import Adress Table API Calls

Analysis Method Static Static Dynamic
Time Complexity (per sample) 5s 5–10 s 30–60 s
Similarity Accuracy 0.3 0.86 1 (Perfect Matching)
Balanced speed and Thorough analysis, longer
Analysis Efficiency Fast processing
comprehensive examination processing time
Trade-off (Speed Emphasis on speed, Balanced approach, Longer processing time for
vs. Accuracy) lower accuracy high accuracy thorough dynamic analysis

5. Conclusions and Future Work

In this paper, we proposed a comprehensive approach for ransomware classification
based on the comparison of similarity matrices derived from static analysis, dynamic
analysis, and visualization. We extracted features from ransomware samples using multi-
ple analysis techniques and generated similarity matrices based on these features. These
matrices were then compared using various comparison algorithms to identify similar-
ities and differences between the samples. The resulting similarity scores were used to
classify the samples into different categories, such as families, variants, and versions. We
evaluated our approach using a dataset of ransomware samples and demonstrated that
it can accurately classify the samples with a high degree of accuracy. One advantage of
our approach is the use of visualization, which allows us to classify and cluster large
datasets of ransomware in a more intuitive and effective way. In addition, static anal-
ysis has the advantage of being fast and accurate, while dynamic analysis allows us to
classify and cluster packed ransomware samples. Our study demonstrates the potential
of using a comprehensive approach based on the comparison of multiple analysis tech-
niques, including static analysis, dynamic analysis, and visualization, for the accurate and
efficient classification of ransomware. It also highlights the importance of considering
multiple analysis techniques in the development of effective ransomware classification
methods, especially when dealing with large datasets and packed samples. In conclusion,
our proposed comprehensive approach for ransomware classification is an effective and
efficient method for accurately classifying and clustering ransomware samples. The use
of visualization techniques is particularly useful for large datasets, while static analysis is
fast and accurate, and dynamic analysis is useful for finding packed ransomware samples.
By considering multiple analysis techniques, we can develop more effective methods for
classifying and detecting ransomware, helping to protect individuals and organizations
from this growing and evolving threat. In our future work, we plan to incorporate dynamic
instrumentation into our ransomware classification approach to improve its accuracy and
efficiency. Dynamic instrumentation involves monitoring and modifying the behavior of
a program as it is being executed, which can provide valuable insights into the internal
functions and communication patterns of ransomware. One approach we plan to explore
is using dynamic instrumentation to track the internal functions of ransomware and how
they interact with each other. By understanding the function calls and communication
patterns of ransomware, we can potentially identify weaknesses and vulnerabilities that
can be exploited to limit its damage or even decrypt affected files without paying the
ransom. Additionally, we plan to investigate the use of machine learning techniques in
combination with dynamic instrumentation to automate the process of identifying and
classifying ransomware. By training a model on a large dataset of ransomware samples and
their corresponding internal function calls and communication patterns, we can potentially
develop a system that can accurately and efficiently classify new ransomware samples in
real time.
Information 2024, 15, 46 27 of 29

Author Contributions: Conceptualization, B.Y., N.A. and M.A.A.; methodology, B.Y. and M.A.A.;
software, B.Y.; validation, B.Y., N.A. and M.A.A.; formal analysis, B.Y., N.A. and M.A.A.; investigation,
B.Y., M.S.E. and M.A.A.; resources, B.Y.; data curation, B.Y. and M.A.A.; writing—original draft
preparation, B.Y., A.D.J., N.A. and M.A.A.; writing—review and editing, B.Y., M.S.E., A.D.J., N.A. and
M.A.A.; visualization, B.Y., M.S.E., A.D.J., N.A. and M.A.A.; supervision, A.D.J., M.S.E. and M.A.A.;
project administration, B.Y., M.S.E., A.D.J., N.A. and M.A.A.; funding acquisition, A.D.J. All authors
have read and agreed to the published version of the manuscript.
Funding: This research was funded by the University College Dublin (UCD), School of Computer
Science, Dublin, Ireland, grant number 13/RC/2077.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: This article does not contain any studies with human participants or
animals performed by any of the authors.
Data Availability Statement: Data in this research paper will be shared upon request made to the
corresponding author.
Conflicts of Interest: All authors declare that they have no conflict of interest for the presented work.

References
1. Gopinath, M.; Sethuraman, S.C. A comprehensive survey on deep learning based malware detection techniques. Comput. Sci. Rev.
2023, 47, 100529.
2. Brown, A.; Gupta, M.; Abdelsalam, M. Automated machine learning for deep learning based malware detection. Comput. Secur.
2024, 137, 103582.
3. Kok, S.; Abdullah, A.; Jhanjhi, N.; Supramaniam, M. Ransomware, threat and detection techniques: A review. Int. J. Comput. Sci.
Netw. Secur. 2019, 19, 136.
4. Yadav, C.S.; Singh, J.; Yadav, A.; Pattanayak, H.S.; Kumar, R.; Khan, A.A.; Haq, M.A.; Alhussen, A.; Alharby, S. Malware analysis
in iot & android systems with defensive mechanism. Electronics 2022, 11, 2354.
5. Rey, V.; Sánchez, M.S.; Celdrán, A.H.; Bovet, G. Federated learning for malware detection in IoT devices. Comput. Netw. 2022,
204, 108693. [CrossRef]
6. Johnson, S.; Gowtham, R.; Nair, A.R. Ensemble Model Ransomware Classification: A Static Analysis-based Approach. In Inventive
Computation and Information Technologies: Proceedings of ICICIT 2021; Springer Nature: Singapore, 2022; pp. 153–167.
7. Al-rimy, B.A.S.; Maarof, M.A.; Shaid, S.Z.M. Ransomware threat success factors, taxonomy, and countermeasures: A survey and
research directions. Comput. Secur. 2018, 74, 144–166. [CrossRef]
8. Akhtar, Z. Malware detection and analysis: Challenges and research opportunities. arXiv 2021, arXiv:2101.08429.
9. Tahir, R. A study on malware and malware detection techniques. Int. J. Educ. Manag. Eng. 2018, 8, 20. [CrossRef]
10. Yamany, B.; Elsayed, M.S.; Jurcut, A.D.; Abdelbaki, N.; Azer, M.A. A New Scheme for Ransomware Classification and Clustering
Using Static Features. Electronics 2022, 11, 3307. [CrossRef]
11. Yamany, B.E.M.; Azer, M.A. SALAM Ransomware Behavior Analysis Challenges and Decryption. In Proceedings of the 2021
Tenth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 5–7 December 2021;
pp. 273–277.
12. Fernando, D.W.; Komninos, N.; Chen, T. A study on the evolution of ransomware detection using machine learning and deep
learning techniques. IoT 2020, 1, 551–604. [CrossRef]
13. Khan, F.; Ncube, C.; Ramasamy, L.K.; Kadry, S.; Nam, Y. A digital DNA sequencing engine for ransomware detection using
machine learning. IEEE Access 2020, 8, 119710–119719. [CrossRef]
14. Liu, K.; Xu, S.; Xu, G.; Zhang, M.; Sun, D.; Liu, H. A review of android malware detection approaches based on machine learning.
IEEE Access 2020, 8, 124579–124607. [CrossRef]
15. Bae, S.I.; Lee, G.B.; Im, E.G. Ransomware detection using machine learning algorithms. Concurr. Comput. Pract. Exp. 2020, 32,
e5422. [CrossRef]
16. Chakkaravarthy, S.S.; Sangeetha, D.; Cruz, M.V.; Vaidehi, V.; Raman, B. Design of intrusion detection honeypot using social
leopard algorithm to detect IoT ransomware attacks. IEEE Access 2020, 8, 169944–169956. [CrossRef]
17. El-Kosairy, A.; Azer, M.A. Intrusion and ransomware detection system. In Proceedings of the 2018 1st International Conference
on Computer Applications & Information Security (ICCAIS), Riyadh, Saudi Arabia, 4–6 April 2018; pp. 1–7.
18. Vishwakarma, R.; Jain, A.K. A honeypot with machine learning based detection framework for defending IoT based botnet DDoS
attacks. In Proceedings of the 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli,
India, 23–25 April 2019; pp. 1019–1024.
19. Keong Ng, C.; Rajasegarar, S.; Pan, L.; Jiang, F.; Zhang, L.Y. VoterChoice: A ransomware detection honeypot with multiple voting
framework. Concurr. Comput. Pract. Exp. 2020, 32, e5726. [CrossRef]
Information 2024, 15, 46 28 of 29

20. Pont, J.; Arief, B.; Hernandez-Castro, J. Why current statistical approaches to ransomware detection fail. In Proceedings of the
International Conference on Information Security, Bali, Indonesia, 16–18 December 2020; Springer International Publishing:
Cham, Switzerland, 2020; pp. 199–216.
21. Yewale, A.; Singh, M. Malware detection based on opcode frequency. In Proceedings of the 2016 International Conference
on Advanced Communication Control and Computing Technologies (ICACCCT), Ramanathapuram, India, 25–27 May 2016;
pp. 646–649.
22. Rezaei, S.; Afraz, A.; Rezaei, F.; Shamani, M.R. Malware detection using opcodes statistical features. In Proceedings of the 2016
8th International Symposium On Telecommunications (IST), Tehran, Iran, 27–28 September 2016; pp. 151–155.
23. Verma, V.; Muttoo, S.K.; Singh, V.B. Multiclass malware classification via first-and second-order texture statistics. Comput. Secur.
2020, 97, 101895. [CrossRef]
24. Du, P.; Sun, Z.; Chen, H.; Cho, J.H.; Xu, S. Statistical estimation of malware detection metrics in the absence of ground truth. IEEE
Trans. Inf. Forensics Secur. 2018, 13, 2965–2980. [CrossRef]
25. Bijitha, C.V.; Sukumaran, R.; Nath, H.V. A survey on ransomware detection techniques. In Secure Knowledge Management in
Artificial Intelligence Era: 8th International Conference, SKM 2019, Goa, India, 21–22 December 2019; Proceedings 8; Springer: Singapore,
2020; pp. 55–68.
26. Bello, A.; Maurushat, A. Synthesis of Evidence on Existing and Emerging Social Engineering Ransomware Attack Vectors. In
Cybersecurity Issues, Challenges, and Solutions in the Business World; IGI Global: Hershey, PA, USA, 2023; pp. 234–254.
27. Cai, C.X.; Zhao, R. Salience theory and cryptocurrency returns. J. Bank. Financ. 2024, 159, 107052. [CrossRef]
28. Oz, H.; Aris, A.; Levi, A.; Uluagac, A.S. A survey on ransomware: Evolution, taxonomy, and defense solutions. ACM Comput.
Surv. (CSUR) 2022, 54, 1–37. [CrossRef]
29. Alzahrani, S.; Xiao, Y.; Sun, W. An analysis of conti ransomware leaked source codes. IEEE Access 2022, 10, 100178–100193.
[CrossRef]
30. Shu, R.; Xia, T.; Williams, L.; Menzies, T. Omni: Automated ensemble with unexpected models against adversarial evasion attack.
Empir. Softw. Eng. 2022, 27, 26. [CrossRef]
31. Alagappan, A.; Venkatachary, S.K.; Andrews, L.J.B. Augmenting Zero Trust Network Architecture to enhance security in virtual
power plants. Energy Rep. 2022, 8, 1309–1320. [CrossRef]
32. Whyte, C.; Mazanec, B. Understanding Cyber-Warfare: Politics, Policy and Strategy; Routledge: Oxford, UK, 2023.
33. Berrueta, E.; Morato, D.; Magaña, E.; Izal, M. A survey on detection techniques for cryptographic ransomware. IEEE Access 2019,
7, 144925–144944. [CrossRef]
34. Kara, I.; Aydos, M. The rise of ransomware: Forensic analysis for windows based ransomware attacks. Expert Syst. Appl. 2022,
190, 116198. [CrossRef]
35. Gómez-Hernández, J.A.; Sánchez-Fernández, R.; García-Teodoro, P. Inhibiting crypto-ransomware on windows platforms through
a honeyfile-based approach with R-Locker. IET Inf. Secur. 2022, 16, 64–74. [CrossRef]
36. Almomani, I.; Alkhayer, A.; El-Shafai, W. A crypto-steganography approach for hiding ransomware within HEVC streams in
android IoT devices. Sensors 2022, 22, 2281. [CrossRef]
37. Ahmed, M.; Afreen, N.; Ahmed, M.; Sameer, M.; Ahamed, J. An inception V3 approach for malware classification using machine
learning and transfer learning. Int. J. Intell. Netw. 2023, 4, 11–18. [CrossRef]
38. Chaganti, R.; Ravi, V.; Pham, T.D. A multi-view feature fusion approach for effective malware classification using Deep Learning.
J. Inf. Secur. Appl. 2023, 72, 103402. [CrossRef]
39. Eren, M.E.; Bhattarai, M.; Rasmussen, K.; Alexandrov, B.S.; Nicholas, C. MalwareDNA: Simultaneous Classification of Malware,
Malware Families, and Novel Malware. In Proceedings of the 2023 IEEE International Conference on Intelligence and Security
Informatics (ISI), Charlotte, NC, USA, 2–3 October 2023; pp. 1–3.
40. Marques, A.B.; Branco, V.; Costa, R.; Costa, N. Data Visualization in Hybrid Space—Constraints and Opportunities for Design.
In Proceedings of the International Conference on Design and Digital Communication, Barcelos, Portugal, 3–5 October 2022;
Springer Nature: Cham, Switzerland, 2022; pp. 3–15.
41. Rimon, S.I.; Haque, M.M. Malware Detection and Classification Using Hybrid Machine Learning Algorithm. In Proceedings
of the International Conference on Intelligent Computing & Optimization, Hua Hin, Thailand, 27–28 October 2022; Springer
International Publishing: Cham, Switzerland, 2022; pp. 419–428.
42. Mallik, A.; Khetarpal, A.; Kumar, S. ConRec: Malware classification using convolutional recurrence. J. Comput. Virol. Hacking Tech.
2022, 18, 297–313. [CrossRef]
43. Abbasi, M.S.; Al-Sahaf, H.; Mansoori, M.; Welch, I. Behavior-based ransomware classification: A particle swarm optimization
wrapper-based approach for feature selection. Appl. Soft Comput. 2022, 121, 108744. [CrossRef]
44. Kim, J.; Lee, S. Malware Visualization and Similarity via Tracking Binary Execution Path. Teh. Vjesn. 2022, 29, 221–230.
45. Saxe, J.; Sanders, H. Malware Data Science: Attack Detection and Attribution; No Starch Press: San Francisco, CA, USA, 2018.
46. Kong, K.; Zhang, Z.; Guo, C.; Han, J.; Long, G. PMMSA: Security analysis system for android wearable applications based on
permission matching and malware similarity analysis. Future Gener. Comput. Syst. 2022, 137, 349–362. [CrossRef]
47. Mudgil, P.; Gupta, P.; Mathur, I.; Joshi, N. A novel similarity measure for context-based search engine. In Proceedings of the
International Conference on Innovative Computing and Communications: Proceedings of ICICC 2022; Springer Nature: Singapore; 2022,
Volume 2, pp. 791–808.
Information 2024, 15, 46 29 of 29

48. Abbas, A.R.; Mahdi, B.S.; Fadhil, O.Y. Breast and lung anticancer peptides classification using N-Grams and ensemble learning
techniques. Big Data Cogn. Comput. 2022, 6, 40. [CrossRef]
49. Cucchiarelli, A.; Morbidoni, C.; Spalazzi, L.; Baldi, M. Algorithmically generated malicious domain names detection based on
n-grams features. Expert Syst. Appl. 2021, 170, 114551. [CrossRef]
50. Di Mauro, M.; Galatro, G.; Liotta, A. Experimental review of neural-based approaches for network intrusion management. IEEE
Trans. Netw. Serv. Manag. 2020, 17, 2480–2495. [CrossRef]
51. Dong, S.; Xia, Y.; Peng, T. Network abnormal traffic detection model based on semi-supervised deep reinforcement learning. IEEE
Trans. Netw. Serv. Manag. 2021, 18, 4197–4212. [CrossRef]
52. Pelletier, C.; Webb, G.I.; Petitjean, F. Deep learning for the classification of Sentinel-2 image time series. In Proceedings of the
IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019;
pp. 461–464.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Identifying Ransomware-Specific Properties Using Static Analysis of Executables
No ratings yet
Identifying Ransomware-Specific Properties Using Static Analysis of Executables
10 pages
A3-Static Malware Analysis To Identify Ransomware Properties
No ratings yet
A3-Static Malware Analysis To Identify Ransomware Properties
8 pages
Ransomware Attack Detection Based On Pertinent System Calls Using Machine Learning Techniques
No ratings yet
Ransomware Attack Detection Based On Pertinent System Calls Using Machine Learning Techniques
23 pages
Comparative Analysis of Feature Extraction Methods of PXC
No ratings yet
Comparative Analysis of Feature Extraction Methods of PXC
7 pages
Mal Wares
No ratings yet
Mal Wares
48 pages
MIRAD A Method For Interpretable Ransomware Attack Detection
No ratings yet
MIRAD A Method For Interpretable Ransomware Attack Detection
19 pages
Bounouh
No ratings yet
Bounouh
13 pages
AI-Powered Windows Malware Detection
No ratings yet
AI-Powered Windows Malware Detection
10 pages
Ransomware Detection Framework
No ratings yet
Ransomware Detection Framework
9 pages
08 Rohit Final Malware Research Paper
No ratings yet
08 Rohit Final Malware Research Paper
13 pages
Dynamic Malware Analysis Guide
No ratings yet
Dynamic Malware Analysis Guide
7 pages
Applsci 12 00172 v2
No ratings yet
Applsci 12 00172 v2
45 pages
Integrated Malware Analysis Using Machine Learning PDF
No ratings yet
Integrated Malware Analysis Using Machine Learning PDF
8 pages
v1 Covered
No ratings yet
v1 Covered
10 pages
Malware Analysis Using Machine Learning and Deep Learning Techniques
No ratings yet
Malware Analysis Using Machine Learning and Deep Learning Techniques
7 pages
Malware Analysis and Classification Survey
No ratings yet
Malware Analysis and Classification Survey
9 pages
Ransomware Attack Modeling and Artificial Intelligence-Based Ransomware Detection For Digital Substations
No ratings yet
Ransomware Attack Modeling and Artificial Intelligence-Based Ransomware Detection For Digital Substations
5 pages
Automated Classification and Analysis of Internet Malware
No ratings yet
Automated Classification and Analysis of Internet Malware
18 pages
805-Article Text-3656-1-10-20220310
No ratings yet
805-Article Text-3656-1-10-20220310
16 pages
Malware Family Detection Approach Using Image Processing Techniques: Visualization Technique
No ratings yet
Malware Family Detection Approach Using Image Processing Techniques: Visualization Technique
4 pages
Paper - 1 - 179999913001 - 9117-Article Text-15506-1-10-20210129
No ratings yet
Paper - 1 - 179999913001 - 9117-Article Text-15506-1-10-20210129
9 pages
Enhancing Ransomware Detection A Registry Analysis-Based Approach
No ratings yet
Enhancing Ransomware Detection A Registry Analysis-Based Approach
6 pages
Usenixsec2016unveil Paper
No ratings yet
Usenixsec2016unveil Paper
17 pages
Deep Learning for Malware Detection
No ratings yet
Deep Learning for Malware Detection
28 pages
2018 - The Dynamic Analysis of WannaCry Ransomware
No ratings yet
2018 - The Dynamic Analysis of WannaCry Ransomware
8 pages
p317 Han
No ratings yet
p317 Han
6 pages
Data-Centric Machine Learning Approach For Early Ransomware Detection and Attribution
No ratings yet
Data-Centric Machine Learning Approach For Early Ransomware Detection and Attribution
7 pages
Malware Classification Based On Image Segmentation: Wanhu
No ratings yet
Malware Classification Based On Image Segmentation: Wanhu
8 pages
Revised
No ratings yet
Revised
21 pages
Computers 11 00160 v2
No ratings yet
Computers 11 00160 v2
15 pages
Ransomewareattack Review1
No ratings yet
Ransomewareattack Review1
9 pages
A Survey of The Recent Trends in Deep Le
No ratings yet
A Survey of The Recent Trends in Deep Le
30 pages
Ransomware Attack Detection Formatting - Edited
No ratings yet
Ransomware Attack Detection Formatting - Edited
11 pages
Using Static and Dynamic Malware Features To Perfo
No ratings yet
Using Static and Dynamic Malware Features To Perfo
12 pages
A Formal Concept Analysis Approach
No ratings yet
A Formal Concept Analysis Approach
9 pages
Malware Detection via Data Mining
No ratings yet
Malware Detection via Data Mining
5 pages
Classifying Android Malware Categories Based On Dynamic Features: An Integration of Feature Reduction and Selection Techniques
No ratings yet
Classifying Android Malware Categories Based On Dynamic Features: An Integration of Feature Reduction and Selection Techniques
23 pages
SADFE18 Paper 10
No ratings yet
SADFE18 Paper 10
7 pages
Preventing Ransomware Attacks
100% (1)
Preventing Ransomware Attacks
10 pages
Document Malware
No ratings yet
Document Malware
9 pages
2021A Survey On Windows-Based Ransomware Taxonomy and Detection Mechanisms
No ratings yet
2021A Survey On Windows-Based Ransomware Taxonomy and Detection Mechanisms
36 pages
Dynamic Malware Analysis
No ratings yet
Dynamic Malware Analysis
6 pages
Ethical Hacking Implementation For Lime Worm Ransomware Detection
No ratings yet
Ethical Hacking Implementation For Lime Worm Ransomware Detection
11 pages
Survey of Unpacking Malware
No ratings yet
Survey of Unpacking Malware
17 pages
Behavioral Fingerprinting
No ratings yet
Behavioral Fingerprinting
7 pages
2.1 Malware Detection Based On Opcode Frequency (2016)
No ratings yet
2.1 Malware Detection Based On Opcode Frequency (2016)
9 pages
MalClassifier Malware Family Classification Using Network Flow Sequence
No ratings yet
MalClassifier Malware Family Classification Using Network Flow Sequence
13 pages
Sustainability 14 00008
No ratings yet
Sustainability 14 00008
24 pages
Analysis of Cyber Security Threats Using
No ratings yet
Analysis of Cyber Security Threats Using
5 pages
Malware Analysis: Static vs Dynamic
No ratings yet
Malware Analysis: Static vs Dynamic
6 pages
Analysis of Malware Behavior: Type Classification Using Machine Learning
No ratings yet
Analysis of Malware Behavior: Type Classification Using Machine Learning
7 pages
1 s2.0 S0167739X18307325 Main
No ratings yet
1 s2.0 S0167739X18307325 Main
11 pages
Automated Classification and Analysis of Internet Malware: (Mibailey, Jonojono, Janderse, Zmao, Farnam) @umich - Edu
No ratings yet
Automated Classification and Analysis of Internet Malware: (Mibailey, Jonojono, Janderse, Zmao, Farnam) @umich - Edu
20 pages
Malware Classification Dimva08
No ratings yet
Malware Classification Dimva08
20 pages
Malware Detection and Suppression Using Blockchain Technology
No ratings yet
Malware Detection and Suppression Using Blockchain Technology
6 pages
Survey of Machine Learning Techniques Fo
No ratings yet
Survey of Machine Learning Techniques Fo
55 pages
Efficient and Effective Malware Detection System
No ratings yet
Efficient and Effective Malware Detection System
5 pages
Detecting Ransomware Using Support Vector Machines: Yuki Takeuchi Kazuya Sakai Satoshi Fukumoto
No ratings yet
Detecting Ransomware Using Support Vector Machines: Yuki Takeuchi Kazuya Sakai Satoshi Fukumoto
6 pages
Python Malware Evasion Techniques
No ratings yet
Python Malware Evasion Techniques
12 pages
Daa Abstract PDF
No ratings yet
Daa Abstract PDF
2 pages
DD Unit 4b
No ratings yet
DD Unit 4b
35 pages
DD Unit 3
No ratings yet
DD Unit 3
32 pages
Tools & Techniques For Malware Analysis and Classification
No ratings yet
Tools & Techniques For Malware Analysis and Classification
22 pages
Malware Behavior Analysis
No ratings yet
Malware Behavior Analysis
10 pages
Abusitta 2021
No ratings yet
Abusitta 2021
17 pages
Presentation 12
No ratings yet
Presentation 12
6 pages
Blood Relation Questions For Bank Clerk Pre
No ratings yet
Blood Relation Questions For Bank Clerk Pre
33 pages
Lesson Plan - Cot 2
100% (1)
Lesson Plan - Cot 2
4 pages
Tafadzwa Ndirowei Dissertation (1) FINAL
No ratings yet
Tafadzwa Ndirowei Dissertation (1) FINAL
86 pages
8august2010 - Confidence Interval and Sample Size
No ratings yet
8august2010 - Confidence Interval and Sample Size
5 pages
The Art of Holding in Therapy An Essential Intervention Digital Test Bank
No ratings yet
The Art of Holding in Therapy An Essential Intervention Digital Test Bank
317 pages
Additional Reviewer in Practical Research 2
No ratings yet
Additional Reviewer in Practical Research 2
1 page
Coffee Addiction Investigatory Project
No ratings yet
Coffee Addiction Investigatory Project
3 pages
LIKHA-TEMPLATE
No ratings yet
LIKHA-TEMPLATE
9 pages
Informational Writing Skills
No ratings yet
Informational Writing Skills
14 pages
Practical Research Guide for Students
No ratings yet
Practical Research Guide for Students
8 pages
Blackbook Project Final
100% (2)
Blackbook Project Final
84 pages
Econometrics I Lecture 2 Wooldridge
No ratings yet
Econometrics I Lecture 2 Wooldridge
40 pages
DEEPSEEK
No ratings yet
DEEPSEEK
15 pages
Social Media Management and Information Dissemination
No ratings yet
Social Media Management and Information Dissemination
28 pages
Translation and Adaptation of Child and Adolescent Mindfulness Measurement Into Bahasa Version
No ratings yet
Translation and Adaptation of Child and Adolescent Mindfulness Measurement Into Bahasa Version
10 pages
Buku Labsheet DKM Pmu Material Science 2018
No ratings yet
Buku Labsheet DKM Pmu Material Science 2018
28 pages
Ayush Luthra - Resume
No ratings yet
Ayush Luthra - Resume
1 page
Operations Research
No ratings yet
Operations Research
100 pages
EU Cosmetic Claims Guidelines
No ratings yet
EU Cosmetic Claims Guidelines
15 pages
Writing Professional, Letter, Social Media, Report
No ratings yet
Writing Professional, Letter, Social Media, Report
10 pages
Cognitive Psy 2
No ratings yet
Cognitive Psy 2
28 pages
GPAT 2021 Exam Paper Analysis
No ratings yet
GPAT 2021 Exam Paper Analysis
62 pages
On GLP
100% (3)
On GLP
80 pages
Customer Satisfaction Towards Himalaya Products
No ratings yet
Customer Satisfaction Towards Himalaya Products
37 pages
Gen Math
No ratings yet
Gen Math
5 pages
Foundations of Nursing Prac Tice: Maria Girlie D. Jordan, MSN Course Professor ST Paul University Philippines
No ratings yet
Foundations of Nursing Prac Tice: Maria Girlie D. Jordan, MSN Course Professor ST Paul University Philippines
17 pages
HMPYC80 Scope
No ratings yet
HMPYC80 Scope
3 pages
The Effect of Working Students To Academic Performance
86% (7)
The Effect of Working Students To Academic Performance
8 pages
The Color Purple Thesis Ideas
100% (3)
The Color Purple Thesis Ideas
5 pages
Preferences For Health Information and Decision-Making: Development of The Health Information Wants (HIW) Questionnaire
No ratings yet
Preferences For Health Information and Decision-Making: Development of The Health Information Wants (HIW) Questionnaire
9 pages
Mantel Haenszel Pada Hubungan Antenatal Care (Anc) Terhadap BBLR Di
No ratings yet
Mantel Haenszel Pada Hubungan Antenatal Care (Anc) Terhadap BBLR Di
11 pages

Information 15 00046

Uploaded by

Information 15 00046

Uploaded by

information

Information 2024, 15, 46. https://doi.org/10.3390/info15010046 https://www.mdpi.com/journal/information

important part of defending against ransomware as it allows organizations and individu-

Figure 1. Ransomware lifecycle from creation to extortion.

cutoff date. Secondly, the effectiveness of ransomware detection can be context-dependent,

2.1. Ransomware Detection Approaches and Techniques

2.1.1. Machine Learning

employ tree-like structures to make decisions based on predefined conditions or rules,

Table 1. Comparison between ransomware detection approaches.

Figure 2. Ransomware infection vectors.

Table 3. Ransomware history timeline.

Date Ransomware Family Event Description

3.2. Ransowmare Classification with Visualization Techniques

3.3. Ransomware’s Features Tracking System

Vendors Performance Disparities in Family Sample Detection

Figure 6. Vendors Performance for Samples Detection.

TableThe data depicted

Sample Size Distribution

Figure 7. Sample Size Distribution.

Figure 8. IAT Count Distribution.

Internal Disassembled-Functions Count Distribution

Information 2024, 15, x FOR PEER REVIEW 19 of 32

Figure 10. Sample Segments Count Distribution.

Visualization comparison is a powerful tool for analyzing and classifying different

4.2. Static Ransomware Classification System

4.3. Dynamic Ransomware Classification System

Information 2024, 15, x FOR PEER REVIEW 21 of 32

Table 5. Machines and tools used in dynamic analyzer lab setup.

Components Purpose Technology Output

Components Purpose Technology Output

Figure 13. Full-matched ransomware classification system block diagram.

4.5.4. Proactive Sample Analysis

Figure 17. Ransomware classification system features.

Figure 17. Ransomware classification system features.

A comparative analysis of ransomware classification features is described in Table 6,

Table 6. Comparative analysis of ransomware classification features.

Feature Strings Import Adress Table API Calls

5. Conclusions and Future Work

You might also like