0% found this document useful (0 votes)

46 views10 pages

Opem

The document proposes OPEM, a machine learning approach that combines static and dynamic features to detect malware. Static features are extracted by analyzing opcode sequences without executing files, while dynamic features are obtained by executing files in a sandbox and monitoring system calls and behavior. OPEM represents files as vectors of weighted opcode sequence frequencies for static features and binary indicators of observed behaviors for dynamic features. It then uses these hybrid static-dynamic features for supervised malware detection. The approach aims to leverage the advantages of both static and dynamic analysis while mitigating their individual limitations.

Uploaded by

Nicolas Costa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views10 pages

Opem

Uploaded by

Nicolas Costa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

OPEM: A Static-Dynamic Approach for

Machine-learning-based Malware Detection

Igor Santos, Jaime Devesa, Felix Brezo, Javier Nieves, and Pablo G. Bringas

S3 Lab, DeustoTech - Computing, Deusto Institute of Technology

University of Deusto,
Avenida de las Universidades 24, 48007
Bilbao, Spain
{isantos,jaime.devesa,felix.brezo,pablo.garcia.bringas}@deusto.es

Abstract. Malware is any computer software potentially harmful to

both computers and networks. The amount of malware is growing every
year and poses a serious global security threat. Signature-based detection
is the most extended method in commercial antivirus software, however,
it consistently fails to detect new malware. Supervised machine learning
has been adopted to solve this issue. There are two types of features
that supervised malware detectors use: (i) static features and (ii) dyna-
mic features. Static features are extracted without executing the sample
whereas dynamic ones requires an execution. Both approaches have their
advantages and disadvantages. In this paper, we propose for the first
time, OPEM, an hybrid unknown malware detector which combines the
frequency of occurrence of operational codes (statically obtained) with
the information of the execution trace of an executable (dynamically ob-
tained). We show that this hybrid approach enhances the performance
of both approaches when run separately.

Keywords: malware, hybrid, static, dynamic, machine learning, com-

puter security

1 Introduction

Machine-learning-based malware detectors (e.g., [1–4]) commonly rely on data-

sets that include several characteristic features for both malicious samples and
benign software to build classification tools that detect malware in the wild (i.e.,
undocumented malware). Two kind of features can be used to face obfuscated
and previously unseen malware: statically or dynamically extracted characteris-
tics. Static analysis extract several useful features from the executable in ins-
pection without actually executing it, whereas dynamic analysis executes the
inspected specimen in a controlled environment called ‘sandbox’ [5]. The main
advantages of static techniques are that they are safer because they do not exe-
cute malware, they are able to analyse all the execution paths of the binary, and
the analysis and detection is usually fast [5]. However, they are not resilient to
packed malware (executables that have been either compressed or cyphered) [6]
or complex obfuscation techniques [7]. On the contrary, dynamic techniques can
guarantee that the executed code shows the actual behaviour of the executable
and, therefore, they are the preferred choice when a whole understanding of the
binary is required [8]. However, they have also several shortcomings: they can
only analyse a single execution path, they introduce a significant performance
overhead, and malware can identify the controlled environments [9].
Given this background, we present here OPEM, the first machine-learning-
based malware detector that employs a set of features composed of both static
and dynamic features. The static features are based on a novel representation
of executables: opcode sequences [10]. This technique models an executable as
sequences of operational codes (i.e., the action to perform in machine code lan-
guage) of a fixed length and computes their frequencies to generate a vector of
frequencies of opcode sequences. On the other hand, the dynamic features are
extracted by monitoring system calls, operations, and raised exceptions on an
execution within an emulated environment to finally generate a vector of binary
characteristics representing whether a specific comportment is present within an
executable or not [11]. In summary, our main contributions to the state of the art
are the following ones: (i) we present a new hybrid representation of executables
composed of both statically and dynamically extracted features, (ii) based upon
this representation, we propose a new malware detection method which employs
supervised learning to detect previously unseen and undocumented malware and
(iii) we perform an empirical study to determine which benefits brings this hybrid
approach to the standalone static and dynamic representations.

2 Overview of OPEM

2.1 Statically Extracted Features

To represent executables using opcodes, we extract the opcode-sequences and

their frequency of appearance. Specifically, we define a program ρ as a set of
ordered opcodes o, ρ = (o1 , o2 , o3 , o4 , ..., o`−1 , o` ), where ` is the number of ins-
tructions I of the program ρ. An opcode sequence os is defined as a subset of
opcodes within the executable file where os ⊆ ρ; it is made up of opcodes o,
os = (o1, o2, o3, ..., om1 , om ) where m is the length of the sequence of opcodes
os. Consider an example code formed by the opcodes mov, add, push and add;
the following sequences of length 2 can be generated: s1 = (mov,add), s2 = (add,
push) and s3 = (push, add).
Afterwards, we compute the frequency of occurrence of each opcode sequence
within the file by using term frequency (tf) [12] that is a weight widely used in
n
information retrieval: tfi,j = P i,j where ni,j is the number of times the
k nk,j
sequence
P si,j (in our case opcode sequence) appears in an executable e, and
n
k k,j is the total number of terms in the executable e (in our case the total
number of possible opcode sequences)
We define the Weighted Term Frequency (WTF) as the result of weighting
the relevance of each opcode when calculating the term frequency. To calculate
the relevance of each individual opcode, we collected malware from the VxHea-
vens website1 to assemble a malware dataset of 13,189 malware executables and
we collected 13,000 executables from our computers. Using this dataset, we di-
sassemble each executable and compute the mutual information
gain for each
P P p(x,y)
opcode and the class: I(X; Y ) = yY xX p(x, y) log p(x)·p(y) where X is
the opcode frequency and Y is the class of the file (i.e., malware or benign soft-
ware), p(x, y) is the joint probability distribution function of X and Y , and p(x)
and p(y) are the marginal probability distribution functions of X and Y . In our
particular case, we defined the two variables as the single opcode and whether or
not the instance was malware. Note that this weight only measures the relevance
of a single opcode and not the relevance of an opcode sequence.
Using these weights, we computed the WTF as the product of sequence fre-
quencies and the previously calculated weight of every opcode in the sequence:
wtfi,j = tfi,j · oz S weight(o z)
Q
100 where weight(oz ) is the calculated weight, by
means of mutual information gain, for the opcode oz and tfi,j is the sequence
frequency measure for the given opcode sequence. We obtain a vector v compo-
sed of weighted opcode-sequence frequencies, v = ((os1 , wtf1 ), ..., (osn , wtfn )),
where osi is the opcode sequence and wtfi is the weighted term frequency for
that particular opcode sequence.

2.2 Dynamically Extracted Features

Behaviour monitoring is a dynamic analysis technique in which the suspicious file
is executed inside a contained and secure environment, called sandbox, in order
to get a complete and detailed trace of the actions performed in the system. There
are two different approaches for dynamic analysis [13]: (i) taking a snapshot of
the complete system before running the suspicious program and comparing it
with another snapshot of the system after the execution in order to find out
differences and (ii) monitoring the behaviour of the executable during execution
with specialised tools.
For our research we have chosen a sandbox [11] that monitors the behaviour
of the executable during execution. The suspicious Windows Portable Executable
(PE) files are executed inside the sandbox environment, and relevant Windows
API calls are logged, showing their behaviour. This work is a new approach of
sandbox using both emulation (Qemu) and simulation (Wine) techniques, with
the aim of achieving the greatest transparency possible without interfering with
the system.
We describe now the two main platforms of our sandbox solution:

– Wine is an open-source and complete re-implementation (simulation) of the

Win-32 Application Programming Interface (API). It allows Windows PE fi-
les to run as-if-natively under Unix-based operating systems. However. there
are still some limitations in the implementation, which hinders some pro-
grams from working properly.
1
http://vx.netlux.org/
– Qemu is an open-source pure software virtual machine emulator that works
by performing equivalent operations in software for any given CPU instruc-
tion. Unfortunately, there are several malicious executables aware of being
executed in a contained environment exploiting different bugs within this vir-
tual machine. However, they can be fixed easily [14]. As Peter Ferrie stated
[14], only pure software virtual machine emulators can approach complete
transparency, and it should be possible, at least in theory, to reach the point
where detection of the virtual machine is unreliable.

Every call done by a process (identified by its PID) to the Windows API
(divided in families, e.g., registry, memory or files) is stored into a log, specifying
the state of the parameters before (IN ) and after (OUT ) in the body of the
functions. Thereby, we can obtain a complete and homogeneous trace with all
the behaviour of the processes, without any interference with the system.
For each executable analysed in the sandbox, we obtain a complete in-raw
trace with its detailed behaviour. To automatically extract the relevant informa-
tion in a vector format from the traces, we developed several regular expression
rules, which define various specific actions performed by the binary, and a par-
ser to identify them. Most of the actions defined are characteristic of malicious
behaviour but there are both benign and malicious behaviour rule definitions.
We have classified them into seven different groups:

– Files: Every action involving manipulation of files, like creation, opening or

searching.
– Protection: Most of malware avoid execution if they are being debugged
or executed in a virtual environment.
– Persistence: Once installed in the System, the malware wants to survive
reboots, e.g., by adding registry keys or creating toolbars.
– Network: Actions regarding to network connectivity, e.g., creation of a RPC
pipe or accessing an URL.
– Processes: Manipulation of processes and threads, like creation of multiple
threads.
– System Information: Retrieving information about the System, e.g., get-
ting the web browsing history.
– Errors: Errors raised by Wine, like error loading a DLL, or an unhandled
page fault.

The behaviour of an executable is a vector made up of the aforementioned

features. We represent an executable as a vector v composed by binary charac-
teristics c, where c can be either 1 (true) or 0 (false), v = (c1 , c2 , c3 , ..., cn−1 , cn )
and n is the number of total monitored actions.
In this way, we have characterised the vector information as binary digits,
called features, each one representing the corresponding characteristic of the
behaviour. When parsing a report, if one of the defined actions is detected by a
rule, the corresponding feature is activated. The resulting vector for each pro-
gram’s trace is a finite sequence of bits, a proper information for classifiers to
effectively recognize patterns and correlate similarities across a huge amount of
instances [15]. Likewise, both in-raw trace log and feature sequence for each
analysed executable are stored in a database for further treatment.

3 Experimental Validation

To validate our proposed method, we used two different datasets to test the
system: a malware dataset and a benign software dataset. We downloaded several
malware samples from the VxHeavens website to assemble a malware dataset of
1,000 malicious programs. For the benign dataset, we gathered 1,000 legitimate
executables from our computers.
We extracted the opcode-sequence representation for every file in that dataset
for a opcode-sequence length n = 2. The number of features obtained with an
opcode-length of two was very high: 144,598 features. To deal with this, we
applied a feature selection step using Information Gain [16] and we selected the
top 1,000 features. We extracted the dynamic characteristics for the malware and
benign by monitoring it in the emulated environment. The number of features
was 63. We combined this two different datasets into one, creating thus a hybrid
static-dynamic dataset. To compare our method, we have also kept the datasets
with only the static features and only the dynamic features. To validate our
approach, we performed the following the steps:

– Cross validation: To evaluate the performance of machine-learning classi-

fiers, k-fold cross validation is usually used in machine-learning experiments
[17].
Thereby, for each classifier we tested, we performed a k-fold cross validation
[18] with k = 10. In this way, our dataset was split 10 times into 10 different
sets of learning (90% of the total dataset) and testing (10% of the total
data).
– Learning the model: For each validation step, we conducted the lear-
ning phase of the algorithms with the training datasets, applying different
parameters or learning algorithms depending on the concrete classifier. Spe-
cifically, we used the following four models:
• Decision Trees: We used Random Forest [19] and J48 (Weka’s C4.5 [20]
implementation).
• K-Nearest Neighbour: We performed experiments over the range k = 1
to k = 10 to train KNN.
• Bayesian networks: We used several structural learning algorithms; K2
[21], Hill Climber [22] and Tree Augmented Naı̈ve (TAN) [23]. We also
performed experiments with a Naı̈ve Bayes classifier [24].
• Support Vector Machines: We used a Sequential Minimal Optimization
(SMO) algorithm [25], and performed experiments with a polynomial
kernel [26], a normalised polynomial kernel [26], Pearson VII function-
based universal kernel [27], and a Radial Basis Runction (RBF) based
kernel [26].
Table 1. Accuracy results (%).

Static Dynamic Hybrid

Classifier
Approach Approach Approach
KNN K=1 94.83 77.19 96.22
KNN K=2 93.15 76.72 95.36
KNN K=3 94.16 76.68 94.63
KNN K=4 93.89 76.58 94.46
KNN K=5 93.50 76.35 93.68
KNN K=6 93.38 76.34 93.52
KNN K=7 92.87 76.33 93.51
KNN K=8 92.89 76.31 93.30
KNN K=9 92.10 76.29 92.94
KNN K=10 92.24 76.24 92.68
DT: J48 92.61 76.72 93.59
DT: Random Forest N=10 95.26 77.12 95.19
SVM: RBF Kernel 91.93 76.75 93.25
SVM: Polynomial Kernel 95.50 76.87 95.99
SVM: Normalised Polynomial Kernel 95.90 77.26 96.60
SVM: Pearson VII Kernel 94.35 77.23 95.56
Naı̈ve Bayes 90.02 74.36 90.11
Bayesian Network: K2 86.73 75.73 87.20
Bayesian Network: Hill Climber 86.73 75.73 87.22
Bayesian Network: TAN 93.40 75.47 93.53

– Testing the model: To evaluate each classifier’s capability, we measured

the True Positive Ratio (TPR), i.e., the number of malware instances co-
rrectly detected, divided by the total number of malware files:

TP
TPR = (1)
TP + FN

Table 2. TPR results.

Static Dynamic Hybrid

Classifier
Approach Approach Approach
KNN K=1 0.95 0.88 0.95
KNN K=2 0.96 0.88 0.97
KNN K=3 0.94 0.88 0.94
KNN K=4 0.95 0.89 0.96
KNN K=5 0.92 0.89 0.90
KNN K=6 0.93 0.89 0.94
KNN K=7 0.90 0.89 0.92
KNN K=8 0.91 0.89 0.93
KNN K=9 0.88 0.89 0.91
KNN K=10 0.90 0.89 0.91
DT: J48 0.93 0.95 0.94
DT: Random Forest 0.96 0.85 0.96
SVM: RBF Kernel 0.89 0.95 0.90
SVM: Polynomial Kernel 0.96 0.93 0.97
SVM: Normalised Polynomial Kernel 0.94 0.94 0.96
SVM: Pearson VII Kernel 0.95 0.89 0.93
Naı̈ve Bayes 0.90 0.57 0.90
Bayesian Network: K2 0.83 0.63 0.83
Bayesian Network: Hill Climber 0.83 0.63 0.83
Bayesian Network: TAN 0.91 0.85 0.91
Table 3. FPR results.

Static Dynamic Hybrid

Classifier
Approach Approach Approach
KNN K=1 0.05 0.34 0.03
KNN K=2 0.10 0.35 0.06
KNN K=3 0.05 0.35 0.05
KNN K=4 0.07 0.36 0.07
KNN K=5 0.05 0.36 0.05
KNN K=6 0.06 0.36 0.07
KNN K=7 0.04 0.36 0.07
KNN K=8 0.05 0.36 0.07
KNN K=9 0.04 0.36 0.07
KNN K=10 0.05 0.36 0.06
DT: J48 0.08 0.34 0.01
DT: Random Forest N=10 0.06 0.31 0.06
SVM: RBF Kernel 0.05 0.42 0.03
SVM: Polynomial Kernel 0.05 0.39 0.05
SVM: Normalised Polynomial Kernel 0.02 0.40 0.03
SVM: Pearson VII Kernel 0.06 0.34 0.01
Naı̈ve Bayes 0.10 0.09 0.10
Bayesian Network: K2 0.09 0.12 0.09
Bayesian Network: Hill Climber 0.09 0.12 0.09
Bayesian Network: TAN 0.04 0.34 0.04

where T P is the number of malware cases correctly classified (true positives)

and F N is the number of malware cases misclassified as legitimate software
(false negatives).
We also measured the False Positive Ratio (FPR), i.e., the number of benign
executables misclassified as malware divided by the total number of benign
files:

Table 4. AUC results.

Static Dynamic Hybrid

Classifier
Approach Approach Approach
KNN K=1 0.95 0.89 0.96
KNN K=2 0.96 0.88 0.97
KNN K=3 0.97 0.88 0.98
KNN K=4 0.97 0.88 0.98
KNN K=5 0.97 0.88 0.98
KNN K=6 0.98 0.88 0.98
KNN K=7 0.98 0.88 0.98
KNN K=8 0.98 0.88 0.98
KNN K=9 0.98 0.88 0.98
KNN K=10 0.97 0.88 0.98
DT: J48 0.93 0.78 0.93
DT: Random Forest N=10 0.99 0.89 0.99
SVM: RBF Kernel 0.92 0.77 0.93
SVM: Polynomial Kernel 0.95 0.77 0.96
SVM: Normalised Polynomial Kernel 0.96 0.77 0.97
SVM: Pearson VII Kernel 0.94 0.77 0.96
Naı̈ve Bayes 0.93 0.85 0.93
Bayesian Network: K2 0.94 0.86 0.94
Bayesian Network: Hill Climber 0.94 0.86 0.94
Bayesian Network: TAN 0.98 0.87 0.98
FP
FPR = (2)
FP + TN
where F P is the number of benign software cases incorrectly detected as
malware and T N is the number of legitimate executables correctly classified.
Furthermore, we measured the accuracy, i.e., the total number of the classi-
fier’s hits divided by the number of instances in the whole dataset:

TP + TN
Accuracy(%) = · 100 (3)
TP + FP + TP + TN
Besides, we measured the Area Under the ROC Curve (AUC) that establis-
hes the relation between false negatives and false positives [28]. The ROC
curve is obtained by plotting the TPR against the FPR.

Tables 1, 2, 3 and 4 show the obtained results in terms of accuracy, TPR,

FPR and AUC, respectively. For every classifier, the results were improved when
using the combination of both static and dynamic features. In particular, the
best overall results were obtained by SVM trained with Polynomial Kernel and
Normalised Polynomial Kernel.
The obtained results validate our initial hypothesis that building an unknown
malware detector based on opcode-sequence is feasible. The machine-learning
classifiers achieved high performance in classifying unknown malware. Nevert-
heless, there are several considerations regarding the viability of this method.
First, regarding the static approach, it cannot counter packed malware. Pac-
ked malware is the result of cyphering the payload of the executable and de-
ciphering it when the executable is finally loaded into memory. A way to solve
this obvious limitation of our malware detection method is the use of a generic
dynamic unpacking schema such as PolyUnpack [6], Renovo [29], OmniUnpack
[30] and Eureka [31].
Second, with regards to the dynamic approach, in order to take advantage
over antivirus researchers, malware writers have included diverse evasion tech-
niques [14, 32] based on bugs on the virtual machines implementation to fight
back. Nevertheless, with the aim of reducing the impact of these countermea-
sures, we can improve the Qemu’s source code [14] in order to solve the bugs
and not to be vulnerable to the above-mentioned techniques. It is also possible
that some malicious actions are only triggered under specific circumstances de-
pending on the environment, so relying on a single program execution will not
manifest all its behaviour. This is solved with a technique called multiple execu-
tion path [33], making the system able to obtain different behaviours displayed
by the suspicious executable.

4 Concluding remarks

While machine-learning methods are a suitable approach for unknown malware,

they use either static or dynamic features to train the algorithms. A combination
of both approaches can be useful in order to improve the results of static and
dynamic approaches. In this paper, we have presented OPEM which is the first
combination of both static and dynamic approaches to detect unknown malware.
The future development of this malware detection system will be concentra-
ted in three main research areas. First, we will focus on facing packed executables
using a dynamic unpacker. Second, we plan to extend both the dynamic analysis
and the static dynamic in order to improve the results of this hybrid malware
detector. Finally, we will study the problem of scalability of malware databases
using a combination of feature and instance selection methods.

References
1. Schultz, M., Eskin, E., Zadok, F., Stolfo, S.: Data mining methods for detection
of new malicious executables. In: Proceedings of the 22n d IEEE Symposium on
Security and Privacy. (2001) 38–49
2. Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In:
Proceedings of the 10t h ACM SIGKDD international conference on Knowledge
discovery and data mining, ACM New York, NY, USA (2004) 470–478
3. Moskovitch, R., Stopel, D., Feher, C., Nissim, N., Elovici, Y.: Unknown malcode
detection via text categorization and the imbalance problem. In: Proceedings of
the 6t h IEEE International Conference on Intelligence and Security Informatics
(ISI). (2008) 156–161
4. Santos, I., Penya, Y., Devesa, J., Bringas, P.: N-Grams-based file signatures for
malware detection. In: Proceedings of the 11th International Conference on Enter-
prise Information Systems (ICEIS), Volume AIDSS. (2009) 317–320
5. Christodorescu, M.: Behavior-based malware detection. PhD thesis (2007)
6. Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: Polyunpack: Automating
the hidden-code extraction of unpack-executing malware. In: Proceedings of the
22nd Annual Computer Security Applications Conference (ACSAC). (2006) 289–
300
7. Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection.
In: Proceedings of the 23rd Annual Computer Security Applications Conference
(ACSAC). (2007) 421–430
8. Kolbitsch, C., Holz, T., Kruegel, C., Kirda, E.: Inspector Gadget: Automated
Extraction of Proprietary Gadgets from Malware Binaries. In: Proceedings of the
30th IEEE Symposium on Security & Privacy. (2010)
9. Cavallaro, L., Saxena, P., Sekar, R.: On the limits of information flow techniques
for malware analysis and containment. Lecture Notes in Computer Science 5137
(2008) 143–163
10. Santos, I., Brezo, F., Nieves, J., Penya, Y.K., Sanz, B., Laorden, C., Bringas, P.G.:
Opcode-sequence-based malware detection. Lecture notes in computer science
5965 (2010) 35–43
11. Devesa, J., Santos, I., Cantero, X., Penya, Y.K., Bringas, P.G.: Automatic
Behaviour-based Analysis and Classification System for Malware Detection. In:
Proceedings of the 12th International Conference on Enterprise Information Sys-
tems (ICEIS). (2010)
12. McGill, M., Salton, G.: Introduction to modern information retrieval. McGraw-Hill
(1983)
13. Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis
using cwsandbox. IEEE Security & Privacy 5(2) (2007) 32–39
14. Ferrie, P.: Attacks on virtual machine emulators. In: Proc. of AVAR Conference.
(2006) 128–143
15. Lee, T., Mody, J.: Behavioral classification. In: Proceedings of the 15th European
Institute for Computer Antivirus Research (EICAR) Conference. (2006)
16. Kent, J.: Information gain and a general measure of correlation. Biometrika 70(1)
(1983) 163
17. Bishop, C.: Pattern recognition and machine learning. Springer New York. (2006)
18. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation
and model selection. In: International Joint Conference on Artificial Intelligence.
Volume 14. (1995) 1137–1145
19. Breiman, L.: Random forests. Machine learning 45(1) (2001) 5–32
20. Quinlan, J.: C4. 5 programs for machine learning. Morgan Kaufmann Publishers
(1993)
21. Cooper, G.F., Herskovits, E.: A bayesian method for constructing bayesian belief
networks from databases. In: Proceedings of the 7th conference on Uncertainty in
artificial intelligence. (1991)
22. Russell, S.J., Norvig: Artificial Intelligence: A Modern Approach (Second Edition).
Prentice Hall (2003)
23. Geiger, D., Goldszmidt, M., Provan, G., Langley, P., Smyth, P.: Bayesian network
classifiers. In: Machine Learning. (1997) 131–163
24. Lewis, D.: Naive (Bayes) at forty: The independence assumption in information
retrieval. Lecture Notes in Computer Science 1398 (1998) 4–18
25. Platt, J.: Sequential minimal optimization: A fast algorithm for training support
vector machines. Advances in Kernel Methods-Support Vector Learning 208 (1999)
26. Amari, S., Wu, S.: Improving support vector machine classifiers by modifying
kernel functions. Neural Networks 12(6) (1999) 783–789
27. Üstün, B., Melssen, W., Buydens, L.: Facilitating the application of Support Vector
Regression by using a universal Pearson VII function based kernel. Chemometrics
and Intelligent Laboratory Systems 81(1) (2006) 29–40
28. Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine
learning methods for predicting fault proneness models. International Journal of
Computer Applications in Technology 35(2) (2009) 183–193
29. Kang, M., Poosankam, P., Yin, H.: Renovo: A hidden code extractor for packed
executables. In: Proceedings of the 2007 ACM workshop on Recurring malcode.
(2007) 46–53
30. Martignoni, L., Christodorescu, M., Jha, S.: Omniunpack: Fast, generic, and safe
unpacking of malware. In: Proceedings of the 23rd Annual Computer Security
Applications Conference (ACSAC). (2007) 431–441
31. Sharif, M., Yegneswaran, V., Saidi, H., Porras, P., Lee, W.: Eureka: A Framework
for Enabling Static Malware Analysis. In: Proceedings of the European Symposium
on Research in Computer Security (ESORICS). (2008) 481–500
32. Ferrie, P.: Anti-Unpacker Tricks. In: Proc. of the 2nd International CARO Works-
hop. (2008)
33. Moser, A., Kruegel, C., Kirda, E.: Exploring multiple execution paths for malware
analysis. In: Proceedings of the 28th IEEE Symposium on Security and Privacy.
(2007) 231–245

Malware Detection for Researchers
No ratings yet
Malware Detection for Researchers
11 pages
Malware Detection Using Machine Learning and Deep Learning
No ratings yet
Malware Detection Using Machine Learning and Deep Learning
10 pages
Malware Detection Using ANN
No ratings yet
Malware Detection Using ANN
10 pages
Document Malware
No ratings yet
Document Malware
9 pages
Behavior-Based Malware Detection
No ratings yet
Behavior-Based Malware Detection
16 pages
Malware Detection for Tech Experts
No ratings yet
Malware Detection for Tech Experts
6 pages
Paper - 1 - 179999913001 - 9117-Article Text-15506-1-10-20210129
No ratings yet
Paper - 1 - 179999913001 - 9117-Article Text-15506-1-10-20210129
9 pages
Malware Detection With LSTM Using Opcode Language
100% (1)
Malware Detection With LSTM Using Opcode Language
7 pages
14th ICCCNT 2023 Paper 943
No ratings yet
14th ICCCNT 2023 Paper 943
5 pages
Bounouh
No ratings yet
Bounouh
13 pages
Udayakumar 2017
No ratings yet
Udayakumar 2017
6 pages
Malware Detection via Behavior Analysis
No ratings yet
Malware Detection via Behavior Analysis
15 pages
Malware
No ratings yet
Malware
10 pages
Malware - Detection - Research - Paper - Updated Soheb6
No ratings yet
Malware - Detection - Research - Paper - Updated Soheb6
8 pages
08 Rohit Final Malware Research Paper
No ratings yet
08 Rohit Final Malware Research Paper
13 pages
Detection of Advanced Malware by Machine Learning Techniques
No ratings yet
Detection of Advanced Malware by Machine Learning Techniques
8 pages
Malware Analysis Using Machine Learning and Deep Learning Techniques
No ratings yet
Malware Analysis Using Machine Learning and Deep Learning Techniques
7 pages
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
No ratings yet
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
9 pages
Intelligent Malware Detection System: Yanfang Ye Dingding Wang, Tao Li Dwang003, Taoli@cs - Fiu.edu Dongyi Ye
No ratings yet
Intelligent Malware Detection System: Yanfang Ye Dingding Wang, Tao Li Dwang003, Taoli@cs - Fiu.edu Dongyi Ye
5 pages
Machine Learning in Malware Detection
No ratings yet
Machine Learning in Malware Detection
8 pages
Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning
No ratings yet
Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning
10 pages
IEEE Access LaTeX Template
No ratings yet
IEEE Access LaTeX Template
18 pages
Malware Detection Research Paper Updated Soheb6
No ratings yet
Malware Detection Research Paper Updated Soheb6
6 pages
AI-Powered Windows Malware Detection
No ratings yet
AI-Powered Windows Malware Detection
10 pages
Catch Them Alive: Malware Detection
No ratings yet
Catch Them Alive: Malware Detection
19 pages
A Fast Malware Detection Algorithm Based On Objective-Oriented Association Mining
No ratings yet
A Fast Malware Detection Algorithm Based On Objective-Oriented Association Mining
24 pages
Mal Wares
No ratings yet
Mal Wares
48 pages
Research Paper 2 Malware Detection
No ratings yet
Research Paper 2 Malware Detection
24 pages
Comparative Analysis of Feature Extraction Methods of PXC
No ratings yet
Comparative Analysis of Feature Extraction Methods of PXC
7 pages
05137328
No ratings yet
05137328
9 pages
Synopsis 1
No ratings yet
Synopsis 1
7 pages
Entroplyzer Android Malware Classification and Characterisation
No ratings yet
Entroplyzer Android Malware Classification and Characterisation
12 pages
MalClassifier Malware Family Classification Using Network Flow Sequence
No ratings yet
MalClassifier Malware Family Classification Using Network Flow Sequence
13 pages
Malwarepjct PDF
No ratings yet
Malwarepjct PDF
70 pages
Detecting Malware in Portable Executable Files Using Machine Learning Approach
No ratings yet
Detecting Malware in Portable Executable Files Using Machine Learning Approach
7 pages
Malware Detection Technique
No ratings yet
Malware Detection Technique
23 pages
A Multi-View Feature Fusion Approach For Effective Malware Classification Using Deep Learning
No ratings yet
A Multi-View Feature Fusion Approach For Effective Malware Classification Using Deep Learning
15 pages
Building A Malware Detection System Based On A Mac
No ratings yet
Building A Malware Detection System Based On A Mac
6 pages
Android Malware Detection
No ratings yet
Android Malware Detection
12 pages
Deep Neural Network Based Malware Detection Using Two Dimensional Binary Program Features
No ratings yet
Deep Neural Network Based Malware Detection Using Two Dimensional Binary Program Features
10 pages
Using Static and Dynamic Malware Features To Perfo
No ratings yet
Using Static and Dynamic Malware Features To Perfo
12 pages
Malware Analysis
No ratings yet
Malware Analysis
14 pages
Malware Classification Dimva08
No ratings yet
Malware Classification Dimva08
20 pages
ICIIS
No ratings yet
ICIIS
6 pages
Paprer CJ Usenix03
No ratings yet
Paprer CJ Usenix03
18 pages
Deep Learning for Malware Detection
No ratings yet
Deep Learning for Malware Detection
28 pages
Integrated Malware Analysis Using Machine Learning PDF
No ratings yet
Integrated Malware Analysis Using Machine Learning PDF
8 pages
Research 4
No ratings yet
Research 4
17 pages
Opcode Sequences As Representation of Executables For Data-Mining-Based Unknown Malware Detection (Elsevier-2013) PDF
No ratings yet
Opcode Sequences As Representation of Executables For Data-Mining-Based Unknown Malware Detection (Elsevier-2013) PDF
19 pages
Malware Detection Using ML
No ratings yet
Malware Detection Using ML
20 pages
Malware Detection and Classification Based On Graph Convolutional Networks and Function Call Graphs
No ratings yet
Malware Detection and Classification Based On Graph Convolutional Networks and Function Call Graphs
11 pages
PE Malware Analysis
No ratings yet
PE Malware Analysis
29 pages
Deep Learning Models For Real-Time Automatic Malware Detection - Docx Main
No ratings yet
Deep Learning Models For Real-Time Automatic Malware Detection - Docx Main
17 pages
Effective Malware Detection Based On Behaviour and Data Features
No ratings yet
Effective Malware Detection Based On Behaviour and Data Features
16 pages
Malware Classification & Analysis Latest
No ratings yet
Malware Classification & Analysis Latest
19 pages
Internet 2016 1 40 40038
No ratings yet
Internet 2016 1 40 40038
6 pages
5474-Article Text-8699-1-10-20200511
No ratings yet
5474-Article Text-8699-1-10-20200511
8 pages
Automated Machine Learning For Deep Learning Based Malware Detection
No ratings yet
Automated Machine Learning For Deep Learning Based Malware Detection
17 pages
Ex - No:8b Date: Channel Equalization Using Rls Algorithm
No ratings yet
Ex - No:8b Date: Channel Equalization Using Rls Algorithm
5 pages
PlantPAx Distributed Control System - Templates - P
No ratings yet
PlantPAx Distributed Control System - Templates - P
128 pages
Bitcoin Wallet Script Guide
No ratings yet
Bitcoin Wallet Script Guide
16 pages
Xiaomi 220333QBI Fog 2023-01-12 15-37-27
No ratings yet
Xiaomi 220333QBI Fog 2023-01-12 15-37-27
90 pages
COMSOFT AIDA-NG: AFTN/AMHS Solutions
No ratings yet
COMSOFT AIDA-NG: AFTN/AMHS Solutions
64 pages
DE1-SoC User Manual
No ratings yet
DE1-SoC User Manual
115 pages
MCA Arrear Exam Guidelines 2014
No ratings yet
MCA Arrear Exam Guidelines 2014
4 pages
Certificate: Project Title Prepared by Class
60% (5)
Certificate: Project Title Prepared by Class
21 pages
Digital Systems Lab 1 Manual
No ratings yet
Digital Systems Lab 1 Manual
8 pages
CS1428 Section 255 Spring 2021
No ratings yet
CS1428 Section 255 Spring 2021
12 pages
Proposal On Design and Implementation of A Computerized VISITOR MANAGEMENT SYSTEM (A Case Study of Giver's
No ratings yet
Proposal On Design and Implementation of A Computerized VISITOR MANAGEMENT SYSTEM (A Case Study of Giver's
5 pages
B.Tech Exam: Computer Organization
No ratings yet
B.Tech Exam: Computer Organization
10 pages
速成西班牙语第2册学习辅导用书高清电子书下载 PDF (刘建等) (外语教学与研究出版社) (2…
No ratings yet
速成西班牙语第2册学习辅导用书高清电子书下载 PDF (刘建等) (外语教学与研究出版社) (2…
1 page
Hostel Management System
79% (57)
Hostel Management System
42 pages
Azure de Project
No ratings yet
Azure de Project
29 pages
Software-Defined Networks for ISPs
No ratings yet
Software-Defined Networks for ISPs
19 pages
Artificial Intelligence Fundamentals
No ratings yet
Artificial Intelligence Fundamentals
31 pages
Steam Cloud Sync Log
No ratings yet
Steam Cloud Sync Log
36 pages
Cimplicity 2023 Datasheet
No ratings yet
Cimplicity 2023 Datasheet
3 pages
Magsino Module
No ratings yet
Magsino Module
4 pages
Wireshark Q and A
No ratings yet
Wireshark Q and A
26 pages
Object-Oriented Testing Guide
No ratings yet
Object-Oriented Testing Guide
28 pages
Splunk-CrowdStrike Hunting Cheat Sheet
No ratings yet
Splunk-CrowdStrike Hunting Cheat Sheet
8 pages
CS350 Lecture 1: Slide 1: Different Views of An Operating System
No ratings yet
CS350 Lecture 1: Slide 1: Different Views of An Operating System
2 pages
Python Programming - Question Bank
No ratings yet
Python Programming - Question Bank
6 pages
RUMA - MISTRYIPv4 Address, Classful Addressing, Dotted-Decimal Notation, IP Sub-netting2020-05-19IP - Subnetting
No ratings yet
RUMA - MISTRYIPv4 Address, Classful Addressing, Dotted-Decimal Notation, IP Sub-netting2020-05-19IP - Subnetting
43 pages
VPN Services for Ethiopian Users
No ratings yet
VPN Services for Ethiopian Users
15 pages
Ip Services 6300-6400 Aruba
No ratings yet
Ip Services 6300-6400 Aruba
234 pages
Python
No ratings yet
Python
37 pages
Configure E-ports on Brocade FC Switch
No ratings yet
Configure E-ports on Brocade FC Switch
5 pages

Opem

Uploaded by

Opem

Uploaded by

OPEM: A Static-Dynamic Approach for

Machine-learning-based Malware Detection

S3 Lab, DeustoTech - Computing, Deusto Institute of Technology

Abstract. Malware is any computer software potentially harmful to

Keywords: malware, hybrid, static, dynamic, machine learning, com-

Machine-learning-based malware detectors (e.g., [1–4]) commonly rely on data-

2.1 Statically Extracted Features

To represent executables using opcodes, we extract the opcode-sequences and

2.2 Dynamically Extracted Features

– Wine is an open-source and complete re-implementation (simulation) of the

– Files: Every action involving manipulation of files, like creation, opening or

The behaviour of an executable is a vector made up of the aforementioned

– Cross validation: To evaluate the performance of machine-learning classi-

Static Dynamic Hybrid

– Testing the model: To evaluate each classifier’s capability, we measured

Table 2. TPR results.

Static Dynamic Hybrid

Static Dynamic Hybrid

where T P is the number of malware cases correctly classified (true positives)

Table 4. AUC results.

Static Dynamic Hybrid

Tables 1, 2, 3 and 4 show the obtained results in terms of accuracy, TPR,

While machine-learning methods are a suitable approach for unknown malware,

You might also like