0% found this document useful (0 votes)

15 views55 pages

Video

The document is a comprehensive survey on the detection of malicious PDF files, detailing various techniques and attacks that exploit PDF vulnerabilities. It discusses the structure of PDF files, existing detection methods, and introduces a novel active learning framework for enhanced detection. The paper highlights the need for improved detection tools due to the increasing prevalence of cyber-attacks utilizing malicious PDFs.

Uploaded by

altantictort

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views55 pages

Video

Uploaded by

altantictort

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Detection of malicious PDF files and

directions for enhancements: A state-of-the

art survey

2015. 06. 01

Hyungjin Im
(imhj9121@seoultech.ac.kr)
Table of Contents
1. Introduction

2. Structure of PDF files

3. Techniques and possible attacks via PDF files

4. Advanced methods for the detection of malicious PDF files

5. Dataset collection and preliminary analysis

6. Our suggested active learning based framework

7. Discussion and conclusions

2
Introduction
• Since 2009, cyber-attacks against businesses and
organizations have increased
• In 2013, 91% of all organizations were hit with cyber-
attack
• 9% were the victims of targeted attacks
• Email containing attachments of malicious files has
become an attractive platform by which to initiate cyber-
attacks against organizations
• Existing tools are limited in their ability to detect and
identify the attacks that occur within email
Introduction
• Attackers usually use social engineering in order to encourage the
recipient to open a malicious email, open an attachment, or press a
link

• As most email servers prevent attachments of executable files to

email messages, the non-executable files attached to an email have
played a major role in many recent cyberattacks.

• Users consider non-executable files safer than executables, and

thus, they are less suspicious toward such files received by email
– non-executable files are as dangerous as executable files, since their
readers can contain vulnerabilities
– the most popular file types for targeted attacks in 2008e2009 were PDF
and Microsoft Office files.

4
Introduction
• An incident aimed at the Israeli Ministry of Defense (IMOD) took
place on January 15, 2014
– it identified an attack in which attackers sent email messages, allegedly
from IMOD, with a malicious PDF file attachment posing as an IMOD
document
– When opened, the PDF file installed a Trojan horse that enabled the
attacker to take control of the computer
– clearly demonstrates that the existing solutions previously mentioned
are insufficient in detecting and preventing such attacks

• In this Survey paper, they present several significant studies

pertaining to PDF detection us Machine learning algorithms based
on static analysis, dynamic analysis.
• This paper also outlines a novel Active Learning (AL) framework and
highlights the correlation between the structural incompatibility of
PDF files and their maliciousness.

5
Structure of PDF files
• A Portable Document Format (PDF) is a formatting language first
conceived by John Warnock, one of the founders of Adobe Systems.
The first version, version 1.0, was introduced in 1993

• Has many functions beyond simple text: it can include images and
other multimedia elements, be password protected, execute
JavaScript, etc

• Supported in all the prominent operating systems for the PC and

mobile platforms

6
Structure of PDF files
• A PDF file is comprised of four basic parts
– Objects - basic elements in a PDF file
– File Structure - defines how the objects are accessed and how they are
updated.
– Document Structure - defines how objects are logically and hierarchically
organized to reflect the. content of a PDF file
– Content Streams - objects that contain instructions which define the appearance
of the page.

7
Structure of PDF files
• Object
– Indirect objects
• objects referenced by a number
– Direct objects
• objects that are not referenced by a number
– Object types: Boolean, Numeric, String,
Name, Null, Array, Dictionary, Stream

8
Structure of PDF files
• File Structure
– Header: the first line of a PDF file which specifies the version
number of PDF specification which the document uses. Header
format is “%PDF-[version number]”.
– Body: contains all the PDF objects. The body is used to hold all
of the document's data that is shown to the user.
– Cross reference: a table that includes the position of every
indirect object in memory and allows random access to objects
in the file, so the application does not need to read the whole file
to locate a particular object
– Trailer: provides relevant information about how the application
reading the file should find the cross reference table and other
special objects. The trailer also contains information about the
number of revisions made to the document. All PDF readers
should begin reading a file from this section.

9
Structure of PDF files

10
Structure of PDF files

11
Techniques and possible attacks via PDF files

• Protected mode uses the sandbox technique in order to

create an isolated environment for the Acrobat Reader
rendering agent to run while reading a PDF file.

• JavaScript code attack (1/2)

– PDF files can contain client-side JavaScript code for
legitimate purposes including: 3D content, form
validation, and calculations.
– The primary goal of the malicious JavaScript code
inside a PDF file is to exploit a vulnerability in the
PDF viewer in order to divert the normal execution
flow to the embedded malicious JavaScript code

12
Techniques and possible attacks via PDF files

• JavaScript code attack (2/2)

– performing a heap spraying attack, as
implemented through JavaScript
– Another malicious activity that can be carried
out using JavaScript is downloading an
executable file from the Internet

13
Techniques and possible attacks via PDF files

• Code obfuscation is legitimately used to prevent reverse

engineering of proprietary applications
• It can also be used by attackers to conceal malicious JavaScript
code from being recognized

Obfuscation technique Detatils

Separating malicious code Malicious code is spread among multiple
over multiple object objects. Code chunks are collected and
merged and compiled to form a malicious
piece of code only during runtime
Applying filters Filters are used to conceal malicious code
White space randomization Random white spaces are inserted in the
malicious code in order to evade recognition
by signature based maliciousness detectors.
White spaces do not affect the code since
JavaScript ignores them

14
Techniques and possible attacks via PDF files

Obfuscation technique Detatils

Comment randomization Random comments are inserted in the
malicious code in order to evade recognition
by signature based maliciousness detectors
Variable name randomization Changing the variable's name randomly in
order to fool signature based maliciousness
detectors.
Integer obfuscation Representing numbers in a different way. For
example, this can be used to hide a specific
memory address.
String obfuscation Making changes to string in order to make it
difficult for a human analyst to understand the
code. For example, by splitting string into
several substrings
Function name obfuscation Hiding the name of the function used which
can provide a clue about the code's intention.
This is done by creating a pointer with a
random name to the required function.
15
Techniques and possible attacks via PDF files

Obfuscation technique Detatils

Advanced code obfuscation String can hold encrypted malicious code. The
decryption process takes place during runtime,
just before usage. Metadata fields and even
the document's words can also be used to
store malicious code.
Block randomization Changing the syntax of the code but not its
action
Dead code Inserting blocks of code that are not intended
to be executed.
Pointless code Inserting blocks of code do not perform
anything.

16
Techniques and possible attacks via PDF files

• Embedded files attack

– A PDF file can contain other file types inside of it, for example,
HTML, JavaScript, SWF, XLSX, EXE, Microsoft Office files or
even another PDF file
– An attacker can use this functionality in order to embed a
malicious file inside a benign file.
– The PDF viewer will not allow the launching of an embedded
executable file because of its blacklist

17
Techniques and possible attacks via PDF files

• Mimicry attacks attempt to change a malicious file's

structure and objects so that the file is similar to a benign
file.
– embedding malicious EXE payload into a benign PDF file
– embedding a malicious PDF file into a benign PDF file
– JavaScript injection in which malicious JavaScript code that is
embedded in the PDF file

18
Techniques and possible attacks via PDF files

• Form submission and URI attack

– Adobe Reader supports the option of
submitting the PDF form from a client to a
specific server using the/submitform
command
– Adobe generates an FDF file from a PDF in
order to send the data to a specified URL. If
the URL belongs to a remote webserver, it is
able to respond. Responses are temporarily
stored in the %APPData% directory which
automatically pops up in the default web
browser
19
Advanced methods for the detection of malicious
PDF files
• Taxonomy of academic research on detection methods
of malicious PDF files

20
Advanced methods for the detection of malicious
PDF files
• Detection methods based on static analysis
– Includes methods aimed at statically analyzing the embedded
JavaScript code inside the PDF files
– Conduct static analysis based on the PDF file's metadata.
– JavaScript analysis
• Both methods apply machine learning algorithms to the tokenized code in
order to build a classification model and classify new, unfamiliar PDF files
after the embedded JavaScript code has been extracted from them
– Metadata analysis
• Analyze a PDF file by examining its metadata
• These approaches share a focus on global or statistical information about
the PDF file's objects and structure, rather than on its actual content

21
Advanced methods for the detection of malicious
PDF files
• Detection methods based on JavaScript analysis
• Lexical analysis
– Srndic and Laskov introduced PJScan
– One-Class Support Vector Machine (OCSVM), a machine learning method,
is used to automatically construct models from available data for
subsequent classification of new data.
– The feature extraction component makes use of an open source PDF
rendering library called “POPPLER” for searches for embedded JavaScript
code in a document
– After the JavaScript code has been found and extracted, a lexical analysis
is performed on it using “Mozilla SpiderMonkey”

22
Advanced methods for the detection of malicious
PDF files
• Detection methods based on JavaScript analysis
• Clustering
– Vatamanu et al introduced two different static methods for clustering PDF
files based on tokenization of their embedded JavaScript.
– The first is hierarchical bottom up clustering and the second is hash table
clustering.
– Clustering method of the identification of similar scripts that have been
obfuscated using different techniques
– The fingerprint is a set of unique JavaScript tokens and their frequencies

23
Advanced methods for the detection of malicious
PDF files
• Detection methods based on Metadata analysis
• Keywords analysis
– Maiorka et al introduced the PDF Malware Slayer (PDFMS), a static
analysis tool which characterizes PDF files according to the set of
embedded keywords and their occurrence
– Consists of two modules: a data retrieval module which retrieves files for
the training and testing phases, and a feature extractor module which
determines the type of features to be used by the classifier
– To retrieve the keywords from the PDF file, the authors used the PDFid tool
(Python script)
– The files were characterized by keywords such as:/JS,/JavaScript,/ Encrypt,
obj, stream, filter, etc
– Their main contribution is the ability to detect malicious PDF files whether or
not they contain JavaScript code, unlike previously described tools such as
PJScan

24
Advanced methods for the detection of malicious
PDF files
• Detection methods based on Metadata analysis
• Hierarchical structure analysis
– Srndic and Laskov introduced a high performance static method for the
detection of malicious PDF documents which, instead of analyzing
JavaScript or any other content, makes use of essential differences in the
structural properties of malicious and benign PDF files.
– When an attacker injects malicious content into the PDF file, the file
structure inevitably changes.
– The PDF is parsed using the PDF parser, POPPLER. The parser extracts
structural paths from malicious and benign real-world PDF files which is
used to create the training set.
– . Two classification models were trained: SVM e LibSVM and Decision Tree
C5.0 inference implementation.
– Their main contribution is a novel technique for the detection of malicious
PDF files based on the difference between the underlying structural
properties of benign and malicious PDF files

25
Advanced methods for the detection of malicious
PDF files
• Detection methods based on Metadata analysis
• Content metadata analysis
– Smutz and Stavrou presented PDFRate, a framework which is based on
meta-features extracted from a document's content for the detection of
malicious PDF files
– The process is based on the use of a selfimplemented reliable parser for
feature extraction, because existing tools are unable to deal with malformed
documents.
– Two data sources were used for the research: the first is the Contagio
dataset collection and the second is based on monitoring the network of a
large university's HTTP traffic.

26
Advanced methods for the detection of malicious
PDF files
• Detection methods based on Metadata analysis
• Term frequency and entropy analysis
– Contrary to aforementioned approaches rely upon a PDF parser's ability to
extract relevant data from objects embedded in the PDF file, the following
study proposes two different detection methods that do not employ a PDF
parser
– Pareek and Eswari introduced two static analysis methods for the detection
of malicious PDF. The first method is based on entropy, and the second is
based on n-gram term frequency
– The first entropy based method was used to measure the uncertainty or
randomness in a given dataset. A file is represented as a set of byte
sequences
– Low entropy of a file is not a strong indicator of maliciousness, however it
can be a useful feature in combination with other features.
– The second method, the n-gram based approach, takes substrings of a
given large string where the n-gram can be words or bytes

27
Advanced methods for the detection of malicious
PDF files
• Detection methods based on Metadata analysis
• Term frequency and entropy analysis (2/3)
– the following two papers take a different approach and focus on the
development of an applicable network's IDS aimed at the detection of
malicious PDFs that pass through that network component.
– The first work presented is that of Kittilsen in which he attempted to
implement an anomaly based network IDS, which employs an SVM
classifier to detect malicious PDF files.
– The IDS uses SNORT, u2boat and tcpflow tools to extract PDF files from
the network stream to the hard drive
– The classification process begins offline after a period of time, and the user
has access to the file in the meantime.
– The author's own pdfextract.py script, written in Python, was used to extract
18 string features from the file and count their occurrences.

28
Advanced methods for the detection of malicious
PDF files
• Detection methods based on Metadata analysis
• Term frequency and entropy analysis (3/3)
– The second work was presented two years later by Knut Borg as a
continuation of Kittilsen's research
– This thesis focuses on online detection of PDF files, while Kittilsen's thesis
featured offline detection Kittilsen's proposed IDS extracted PDF files from
the network traffic to the local hard drive and then executed a classification
algorithm to detect maliciousness.
– The answer to the first question is that the detection system in its current
form should not be implemented in a real environment because of its many
faults, including the limitations of SNORT
– Due to reasons of insufficient applicability, the last two works described
above will not be listed as solutions in the summary tables presented in the
upcoming section.

29
Advanced methods for the detection of malicious
PDF files
• Detection methods based on dynamic analysis
• All of the following dynamic analysis methods focus on the
analysis of embedded JavaScript code
• The first sub-category presents studies that statically extract the
JavaScript code and includes three methods.
• Two of these methods, MDScan and PDF Scrutinizer , start with
a static extraction of the embedded JavaScript code from a PDF
file and then execute the extracted code using a JavaScript
engine.
• The third method, ShellOS V1 also appears in the second sub-
category of dynamic extraction as ShellOS V2
• MPScan also belongs to this second subcategory of dynamic
extraction as it extracts the JavaScript code dynamically during
runtime

30
Advanced methods for the detection of malicious
PDF files
• Detection methods based on dynamic analysis
• Static JavaScript extraction
– MDScan and PDF Scrutinizer rely on a PDF parser that should be capable
of parsing the PDF file, locating the embedded JavaScript, and extracting it
– Tzermias et al introduced the design and implementation of MDScan, a
standalone malicious document scanner which uses both static and
dynamic analysis methods to detect malicious PDF files.
– Then it pulls out the embedded JavaScript code and examines it by actually
running it on a SpiderMonkey JavaScript engine
– Used string variables are dynamically analyzed during execution, and if
some form of shellcode
– MDScan does not rely on previously known vulnerabilities and thus, is able
to detect malicious PDF documents which exploit unknown vulnerabilities
(zero-day) in PDF readers.
– The benign dataset consisted of 2000 benign PDF files found in Google.
Evaluation results show a TPR of 89% and an FPR of 0%

31
Advanced methods for the detection of malicious
PDF files
• Detection methods based on dynamic analysis
• Static JavaScript extraction
– Schmitt et al introduced PDF Scrutinizer, a malicious PDF detection and
analysis tool that also uses static and dynamic analysis methods to detect
maliciousness
– The first is a parser, which simulates the way Adobe Reader parses a
document
– The second is an action extractor
– The third module consists of an actions executor
– During execution, libemu35 library is used to analyze variable values for the
existence of shellcode
– Both static and dynamic heuristics are applied to detect maliciousness
– Static heuristics focus on JavaScript code string analysis to find a signature
of known suspicious, vulnerable, or malicious function
– Dynamic heuristics focus on the detection of malicious code behavior

32
Advanced methods for the detection of malicious
PDF files
• Detection methods based on dynamic analysis
• Static JavaScript extraction
– The following study differs from the previous work presented in this section
in several respects.
– First, ShellOS is an operating system. Second, unlike previous runtime
analysis techniques that use software-based CPU emulation, the proposed
framework leverages hardware virtualization technology
– Second, unlike previous runtime analysis techniques that use software-
based CPU emulation, the proposed framework leverages hardware
virtualization technology
– Finally, it can't examine a PDF file as a whole, and instead it relies on a host
operating system

33
Advanced methods for the detection of malicious
PDF files
• Detection methods based on dynamic analysis
• Static JavaScript extraction
– Snow et al presented ShellOS, a framework for the detection of code
injection attacks, based on code analysis during runtime
– ShellOS is a new lightweight operation system kernel designed for efficient
execution of code streams.
– ShellOS runs as a guest under a host operating system using Kernel Virtual
Machine
– When shellcode is executed, ShellOS collects useful information, such as
function name and parameters logged.
– The increased analysis performance enables the framework to process
more of the network stream and execute longer code sequences

34
Advanced methods for the detection of malicious
PDF files
• Detection methods based on dynamic analysis
• Dynamic JavaScript extraction
– Lu et al. introduced MPScan, a technique that integrates static malware
detection and dynamic JavaScript de-obfuscation
– MPScan is composed of two modules: an embedded code extraction
module and a multilevel malware detection module that includes a
shellcode/heap spraying detection component and an opcode signature
matching component that searches for malicious signatures in the
JavaScript opcode
– And then evaluated by the static detection module
– Previous methods such as MDScan and PDFphoneyC statically parse the
PDF file and extract JavaScript code and then examine the code
dynamically by running it in the emulated environment of the SpiderMonkey
JavaScript engine
– For the evaluation phase, the authors collected 198 malicious PDF samples
from the Internet and nine malicious PDF samples from the Metasploit
framework.

35
Advanced methods for the detection of malicious
PDF files

36
Advanced methods for the detection of malicious
PDF files
• Advanced methods and coping with exiting attacks (1/2)
• Each of the aforementioned analytical approaches ) has its pros
and cons
• Consequently, a hybrid detection framework meshing static and
dynamic detection techniques could reduce the likelihood of
evasion of the detection mechanism by a malicious PDF.
• The malicious code inside the PDF does not know that it is being
analyzed, because it is not opened by the PDF reader or by an
emulator.
• The static analysis approaches can be divided roughly into two
groups: the first group analyzes the JavaScript code embedded
inside the PDF in a variety of representations. The second group
relies upon meta-feature based approaches and focuses on the
content and structure of the PDF file

37
Advanced methods for the detection of malicious
PDF files
• Advanced methods and coping with exiting attacks (2/2)
• Looking at the disadvantages, static analysis can be evaded
using code obfuscation
• Whenever machine learning methods based on static analysis
are used for detecting unknown malicious code applications,
there is a question about the capability of the suggested
framework for detecting obfuscated code inside PDF files
• We have also presented studies employing a dynamic analysis
approach for detecting malicious PDF files
• In most of these studies, this approach dynamically runs the
JavaScript code embedded in a PDF file by performing pre-static
analysis of the PDF file in order to extract JavaScript code which
will be analyzed dynamically.

38
Advanced methods for the detection of
malicious PDF files

39
Dataset collection and preliminary analysis
• Acquired a total of 50,908 PDF files, including 45,763 malicious and
5145 benign files, from four sources

• The malicious PDF files contain several types of malware families such as
viruses, Trojans, and backdoors. We also included obfuscated PDF files.
• Analysis of our large dataset of 50,908 files by the parser shows that most
of the malicious files are not compatible with the PDF file format
specifications

40
Dataset collection and preliminary analysis
• The incompatibility observed was located at the end of the file, in the
line between “startxref” and “%% EOF” lines.
• This line should contain a number serving as a reference (offset) to
where the last cross reference table section is located in the file.
• In cases of incompatibility, the number that appears is incorrect.
includes the number of compatible files (bracketed) in each of our
collected datasets.
• Note that while incompatible benign files were not present in our
dataset, this does not mean that there weren't any incompatible
benign files.
• It might, however, suggest the very low probability of incompatibility
among benign files and provides support of our observation
mentioned above

41
Our suggested active learning based
framework
• In this survey we presented many studies that were based on
machine learning approaches and were successfully used to induce
malicious PDF detection models. However, all of them focus on
passive learning
• With passive learning, the induced detection model, as accurate as
it is , quickly becomes obsolete since it is incapable of adaptive
learning and integrating new malicious PDF files
• The detection model must be sustained and updated with newly
labeled, informative PDF files
• In cases in which the PDF files are labeled as malicious by the
human expert, they will be used to update the antivirus tool as well,
which is currently the most common solution for organizations.

42
Our suggested active learning based
framework

43
Our suggested active learning based
framework
• The PDF files transported over the Internet are collected and
scrutinized within our framework
• Then, the “known files module” filters all the known benign and
malicious PDF files and antivirus signatur
• The unknown PDF files are then checked for their compatibility as
viable PDF files
• The incompatible PDF files are immediately blocked from being
transported into the organizational network
• Since only compatible files are relevant for organizations and
innocent users, just these files are transformed into vector form for
the advanced check

44
Our suggested active learning based
framework
• This framework provides detection solutions for both instances,
whether the malicious file is compatible or not, and it does somore
efficiently than any other solution that exists today.
• The framework uses the insight that most of the malicious files are
incompatible as a first layer of filtering, and not as a detection rule.
• As noted, there is no reason to open an incompatible file e be it
benign or malicious. Therefore, this understanding provides a
significant reduction (~96.5%) of the analysis efforts of suspected
malicious files.

45
Our suggested active learning based
framework
• Specifically, JavaScript code attacks, embedded file attacks, and
form submission and URI attacks, are the most common attacks
launched via PDF files and three of them are present in our data set
• As being a large and representative dataset based upon trusted
sources, our conclusion of high incompatibility among malicious files
is empirically well based
• The PDF files which are compatible and unknown are then
introduced to the detection model which is a classifier induced by
Machine Learning algorithms.
• The Active Learning methods are aimed at efficiently updating the
detection model and antivirus tool in light of the creation of new PDF
files

46
Our suggested active learning based
framework
• Consider employing several algorithms in order to induce detection
models, one of them is the SVM classification algorithm with the
radial basis function (RBF) kernel in a supervised learning approach

• This projection into higher dimensional space actually makes the

induced model complex and thus more difficult for an attacker to
understand.

• The detection model scrutinizes PDF files and provides two

– A classification decision using the SVM classification algorithm
– Distance calculation from the SVM's separating hyperplane using Equation

47
Our suggested active learning based
framework
• Accordingly, in our context, there are two types of files that may be
considered informative.

• The first type includes PDF files in which the classifier has limited
confidence as to their classification.
– Acquiring them as labeled examples will probably improve the model's detection
capabilities.
– In practical terms, these PDF files will have new features or special combinations
of existing features that should fairly represent their operations and ambience
• The second type of informative file includes those that lie deep
inside the malicious side of the SVM margin and are a maximal
distance from the separating hyperplane according to Equation

48
Our suggested active learning based
framework
• Training: A detection model is trained over an initial training set that
includes both malicious and benign PDF files.
• Detection and updating: For every unknown PDF file that is both
transported over the Internet traffic and through the framework, the
framework's detection model provides a classification, and its active
learning method provides a rank representing how informative the
file is
• The purpose of this framework is to provide a better solution than
random selection or passive learning employed nowadays

49
Discussion and conclusions
• aimed to review the methods, techniques, and tools used for the
detection of malicious PDF files
• These PDF's are usually attached to emails that are sent to
organizations in order to perform the initial penetration of an APT
attack, therefore their detection is a significant concern which
requires attention
• One should note that we don't claim that every malicious PDF file is
incompatible
• And therefore, after the incompatibility check within our framework,
we aim at providing a comprehensive static and dynamic analysis
based on advanced Machine Learning algorithms and detection
models
• The Framework does not rely upon the fact that most of the
malicious files are incompatible, therefore in the case that an
attacker crafts a malicious PDF as an incompatible file, it will be
filtered out and will not be transported to the organizational network
50
Discussion and conclusions
• In this survey paper we do not provide an elaborate
segmentation on our dataset and the attacks which
occurred within it
• Based on this survey, they propose that the detection
model include a hybrid detection approach that conducts
both static and dynamic analysis
• For the static analysis phase, the key to precise and
sensitive detection is preliminary knowledge of the
primary attack and evasion techniques that could be
used by a PDF file

51
Discussion and conclusions
• All the extracted features mentioned in this article can be
leveraged by an ensemble of classifiers such that each
classifier will be induced from different sets of features.
• It was shown by Menahem et al. that using an ensemble
of classifiers using different features can signifi- cantly
improve detection capabilities
• This detection approach provides a comprehensive
indication of the file's purposes and is robust against
many evasion techniques
• This paper also suggest running each suspicious PDF
file through several versions of Adobe in order to
compare its behavior

52
Discussion and conclusions
• In future work, While machine learning has been successfully used
to induce malicious PDF detection models, all methods utilizing this
approach focus on passive learning
• Suggest pertains to the fact that PDFs are one of the most common
type of files that act as malicious attachments, however one cannot
ignore the phenomenon of malicious Microsoft Office files attached
to email
• We suggest combining email features (mentioned previously) with
features extracted from attached Microsoft Office files, thus
enhancing the detection of malicious office files as was explained in
reference to the PDF files

53
Q&A
Thank for your Attention!!

A Pattern Recognition System For Malicious PDF Files Detection
No ratings yet
A Pattern Recognition System For Malicious PDF Files Detection
15 pages
GACS25
No ratings yet
GACS25
9 pages
Malicious PDF Detection System
No ratings yet
Malicious PDF Detection System
2 pages
An Effective Machine Learning Based Approach For PDF Malware Detection
No ratings yet
An Effective Machine Learning Based Approach For PDF Malware Detection
6 pages
Detecting Malicious PDFs with ML
No ratings yet
Detecting Malicious PDFs with ML
10 pages
DSN 14
No ratings yet
DSN 14
13 pages
Detecting Malicious PDFs
No ratings yet
Detecting Malicious PDFs
3 pages
Malicious PDF Analysis Guide
No ratings yet
Malicious PDF Analysis Guide
4 pages
672642bcdc6305cc1d871def 37982191816
No ratings yet
672642bcdc6305cc1d871def 37982191816
2 pages
Designing A PDF Malware Detection System Using Mac
No ratings yet
Designing A PDF Malware Detection System Using Mac
15 pages
Ibrahimbello MaliciousDOcumentForensicProject
No ratings yet
Ibrahimbello MaliciousDOcumentForensicProject
10 pages
Ma Mod5
No ratings yet
Ma Mod5
13 pages
Hidost A Static Machine-Learning-Based Detector of Malicious Files
No ratings yet
Hidost A Static Machine-Learning-Based Detector of Malicious Files
20 pages
Detecting
No ratings yet
Detecting
12 pages
Gopaldinne 2021
No ratings yet
Gopaldinne 2021
5 pages
PDF Malware Analysis Guide
No ratings yet
PDF Malware Analysis Guide
45 pages
A Feature Set of Small Size For The PDF Malware Detection
No ratings yet
A Feature Set of Small Size For The PDF Malware Detection
6 pages
Malicious PDF Analysis Ebook
No ratings yet
Malicious PDF Analysis Ebook
23 pages
Malicious PDF Detection Guide
No ratings yet
Malicious PDF Detection Guide
26 pages
De Obfuscation and Detection of Malicious PDF Files With High Accuracy Hicss2013
No ratings yet
De Obfuscation and Detection of Malicious PDF Files With High Accuracy Hicss2013
10 pages
Gmu CS TR 2012 5
No ratings yet
Gmu CS TR 2012 5
16 pages
PDF-Malware: An Overview On Threats, Detection and Evasion Attacks
No ratings yet
PDF-Malware: An Overview On Threats, Detection and Evasion Attacks
6 pages
Malicious Origami in PDF
No ratings yet
Malicious Origami in PDF
27 pages
Preprints202301 0557 v1
No ratings yet
Preprints202301 0557 v1
9 pages
Combining Static and Dynamic Analysis For The Detection of Malicious Documents
No ratings yet
Combining Static and Dynamic Analysis For The Detection of Malicious Documents
6 pages
Mod-5 - Malware Analysis
No ratings yet
Mod-5 - Malware Analysis
8 pages
Obfuscation Detection PDF Files Peepdf Caro2011
No ratings yet
Obfuscation Detection PDF Files Peepdf Caro2011
72 pages
Obfuscation Detection PDF Files Peepdf Caro2011
No ratings yet
Obfuscation Detection PDF Files Peepdf Caro2011
72 pages
Exploitation and Sanitization of Hidden Data in PDF Files: Supriya Adhatarao Cédric Lauradoux
No ratings yet
Exploitation and Sanitization of Hidden Data in PDF Files: Supriya Adhatarao Cédric Lauradoux
11 pages
Basic PDF Word Document Analysis
No ratings yet
Basic PDF Word Document Analysis
10 pages
PDF S: Detecting Javascript-Based Attacks in PDF Documents: Crutinizer
No ratings yet
PDF S: Detecting Javascript-Based Attacks in PDF Documents: Crutinizer
8 pages
PDF Malware Detection for Analysts
No ratings yet
PDF Malware Detection for Analysts
9 pages
1 en 12 Chapter
No ratings yet
1 en 12 Chapter
14 pages
2513 Ijsptm 04
No ratings yet
2513 Ijsptm 04
6 pages
Malicious JavaScript Insights
No ratings yet
Malicious JavaScript Insights
44 pages
Malicious PDF Files Detection 2017
No ratings yet
Malicious PDF Files Detection 2017
9 pages
PDFCode Inj
No ratings yet
PDFCode Inj
18 pages
Analyzing Pdfs Like Binaries: Adversarially Robust PDF Malware Analysis Via Intermediate Representation and Language Model
No ratings yet
Analyzing Pdfs Like Binaries: Adversarially Robust PDF Malware Analysis Via Intermediate Representation and Language Model
18 pages
Explainable PDF Malware Detection
No ratings yet
Explainable PDF Malware Detection
27 pages
PDF Malware Detection for Experts
No ratings yet
PDF Malware Detection for Experts
15 pages
PDF Parsing for Security Experts
No ratings yet
PDF Parsing for Security Experts
14 pages
Mohammed Et Al. - 2021 - HAPSSA Holistic Approach To PDF Malware Detection
No ratings yet
Mohammed Et Al. - 2021 - HAPSSA Holistic Approach To PDF Malware Detection
6 pages
Robust Alcode Detection
No ratings yet
Robust Alcode Detection
7 pages
PDF Malware Detection A Hybrid Approach Using Random Forest and K-Nearest Neighbors
No ratings yet
PDF Malware Detection A Hybrid Approach Using Random Forest and K-Nearest Neighbors
6 pages
Analyzing Malicious PDF Files
No ratings yet
Analyzing Malicious PDF Files
34 pages
The Rise of PDF Malware
No ratings yet
The Rise of PDF Malware
21 pages
2 FB 8
No ratings yet
2 FB 8
8 pages
Portable Data Exfiltration - XSS For PDFs
No ratings yet
Portable Data Exfiltration - XSS For PDFs
15 pages
Explainable Ensemble Learning Based Detection of E
No ratings yet
Explainable Ensemble Learning Based Detection of E
23 pages
Electronics 11 03142 v2
No ratings yet
Electronics 11 03142 v2
18 pages
2020-12-09 - Portable Document Flaws 101
No ratings yet
2020-12-09 - Portable Document Flaws 101
83 pages
2014 Corona Lux0r Dynamic Tool
No ratings yet
2014 Corona Lux0r Dynamic Tool
11 pages
Malware Analysis
No ratings yet
Malware Analysis
9 pages
Portable Document Format Malware
No ratings yet
Portable Document Format Malware
17 pages
Introduction To Malware
No ratings yet
Introduction To Malware
86 pages
Risks of Untrustworthy PDF Files
No ratings yet
Risks of Untrustworthy PDF Files
5 pages
What Is PDF File Extension
No ratings yet
What Is PDF File Extension
3 pages
Fluffy Manager 5000: Capcom Mod Tool
No ratings yet
Fluffy Manager 5000: Capcom Mod Tool
3 pages
Sattelite News Channel
No ratings yet
Sattelite News Channel
12 pages
Security Threats and Controls
No ratings yet
Security Threats and Controls
37 pages
Large Panel Prefab
0% (1)
Large Panel Prefab
18 pages
Juniper Jn0-102 Exam Questions & Answers: Number: JN0-102 Passing Score: 800 Time Limit: 120 Min File Version: 48.5
No ratings yet
Juniper Jn0-102 Exam Questions & Answers: Number: JN0-102 Passing Score: 800 Time Limit: 120 Min File Version: 48.5
105 pages
General PDF
No ratings yet
General PDF
206 pages
PP 2282 Mini Switch Monitor Data Sheet
No ratings yet
PP 2282 Mini Switch Monitor Data Sheet
2 pages
The Role of Carbon Capture and Storage in The Energy Transition
No ratings yet
The Role of Carbon Capture and Storage in The Energy Transition
23 pages
ISO 50003:2021 Transition Guide
100% (1)
ISO 50003:2021 Transition Guide
20 pages
Railway Reservation Case Study
76% (21)
Railway Reservation Case Study
9 pages
Bus Terminal
75% (4)
Bus Terminal
4 pages
Final Documentation of 4th Year Project
No ratings yet
Final Documentation of 4th Year Project
151 pages
Time Tracking Time Configuration Analyzer Tool TimeCAT FAQ
100% (1)
Time Tracking Time Configuration Analyzer Tool TimeCAT FAQ
5 pages
Onboarding Security Awareness Training 2.0
No ratings yet
Onboarding Security Awareness Training 2.0
17 pages
SNiP Vs Eurocode
No ratings yet
SNiP Vs Eurocode
105 pages
Commercial Battery Charger Guide
No ratings yet
Commercial Battery Charger Guide
1 page
Atomic Engine Price List AUG2011
100% (1)
Atomic Engine Price List AUG2011
8 pages
How To Join Windows Computer To Domain - Ahadu Bank
No ratings yet
How To Join Windows Computer To Domain - Ahadu Bank
9 pages
Bendix King KI-825 EHSI Safety Display System
No ratings yet
Bendix King KI-825 EHSI Safety Display System
2 pages
Lab Guide 6
No ratings yet
Lab Guide 6
12 pages
Training Facility: Building Attributes Case Studies Emerging Issues Relevant Codes and Standards Major Resources
No ratings yet
Training Facility: Building Attributes Case Studies Emerging Issues Relevant Codes and Standards Major Resources
10 pages
Fundamentals of Digital Systems
No ratings yet
Fundamentals of Digital Systems
3 pages
Nirmala 3 Yrs Exp FSD
No ratings yet
Nirmala 3 Yrs Exp FSD
1 page
User's Guide 3263: Configuring Home City, Time and Date Settings
No ratings yet
User's Guide 3263: Configuring Home City, Time and Date Settings
2 pages
Application of Line Surge Arresters in Power Distribution and Transmission Systems
No ratings yet
Application of Line Surge Arresters in Power Distribution and Transmission Systems
6 pages
Polling Data Registers From Allen-Bradley PLCS: Application User Guide
No ratings yet
Polling Data Registers From Allen-Bradley PLCS: Application User Guide
24 pages
Unit Overview - Photo Editing - Y4
No ratings yet
Unit Overview - Photo Editing - Y4
6 pages
E Governance Ncert
No ratings yet
E Governance Ncert
8 pages
G606 Configuration Software Manual
No ratings yet
G606 Configuration Software Manual
30 pages
Quizzes
No ratings yet
Quizzes
5 pages

Video

Uploaded by

Video

Uploaded by

Detection of malicious PDF files and

directions for enhancements: A state-of-the

2. Structure of PDF files

3. Techniques and possible attacks via PDF files

4. Advanced methods for the detection of malicious PDF files

5. Dataset collection and preliminary analysis

6. Our suggested active learning based framework

7. Discussion and conclusions

• As most email servers prevent attachments of executable files to

• Users consider non-executable files safer than executables, and

• In this Survey paper, they present several significant studies

• Supported in all the prominent operating systems for the PC and

• Protected mode uses the sandbox technique in order to

• JavaScript code attack (1/2)

• JavaScript code attack (2/2)

• Code obfuscation is legitimately used to prevent reverse

Obfuscation technique Detatils

Obfuscation technique Detatils

Obfuscation technique Detatils

• Embedded files attack

• Mimicry attacks attempt to change a malicious file's

• Form submission and URI attack

• This projection into higher dimensional space actually makes the

• The detection model scrutinizes PDF files and provides two

You might also like