0% found this document useful (0 votes)
8 views10 pages

Inbound 6531693056417816326

This research analyzes malware threats in PDF files, focusing on data theft through static and dynamic analysis methods. The findings indicate that combining both approaches enhances detection capabilities, with static analysis identifying malware indicators and dynamic analysis revealing behavioral patterns. The study emphasizes the importance of these methods in improving data security against increasingly sophisticated malware attacks targeting PDF files.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

Inbound 6531693056417816326

This research analyzes malware threats in PDF files, focusing on data theft through static and dynamic analysis methods. The findings indicate that combining both approaches enhances detection capabilities, with static analysis identifying malware indicators and dynamic analysis revealing behavioral patterns. The study emphasizes the importance of these methods in improving data security against increasingly sophisticated malware attacks targeting PDF files.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Jurnal Teknik Informatika dan Sistem Informasi ISSN 2407-4322

Vol. 12, No. 1, Maret 2025, Hal. 319-328 E- ISSN 2503-2933 319

Malware Analysis Of Data Theft Untilizing PDF File


Rosihan*1, Yasir Muin*2, Saiful Do Abdullah3
1,2,3
Informatic, Faculti of Enginnering, Khairun of University
e-mail: 1rosihan@unkhair.ac.id, *2yasirmuin@unkhair.ac.id, 3saifuladobdullah@unkhair.ac.id

Abstract
The increased use of PDF files as a medium for cyberattacks has created significant challenges
in data security, especially in terms of detection and mitigation of malware threats designed for data theft.
This research aims to analyze malware threats in PDF files using static and dynamic analysis approaches
to identify patterns that characterize malicious activity that could result in sensitive information leakage.
Through static analysis, this research identifies specific elements in the PDF file structure, such as
metadata, signatures, as well as malicious indicators that are usually hidden within scripts and
encapsulated objects. On the other hand, dynamic analysis is performed by utilizing a sandbox environment
to monitor the runtime behavior of PDF files, including network activity, file system access. The results
show that the combination of static and dynamic analysis is able to provide more comprehensive detection,
where static analysis is effective in quickly identifying signs of malware, while dynamic analysis provides
deeper insights into malware behavior patterns that are not detected by static analysis alone.This research
provides a static and dynamic analysis-based framework for malware identification on PDF files, as well
as providing insights into the effectiveness of these methods in the context of modern data security. The
findings of this research have the potential to support the development of security systems.

Keywords Malware, Static Anlysis, Dunamic Analysis.

1. INTRODUCTION

T he rapid advancement of information and communication technology has brought


significant changes to various aspects of life, including the exchange and storage of
information [1]. One of the most widely used file formats for digital document distribution
is the Portable Document Format (PDF) [2]. PDF files are favored due to their ability to preserve
the original document format, compatibility across multiple platforms, and superior security
features compared to other document formats [3]. However, behind these advantages, PDFs have
also become a primary target for cyberattacks, particularly in the context of data theft through
malware dissemination [4]. According to recent reports, the most common types of malware
targeting mobile devices include Scam (45%), Phishing (30.8%), and Malvertising (15.8%).
These findings indicate that cybersecurity threats, including data theft attempts, continue to
escalate with increasingly diverse and complex methods. One prevalent method involves social
engineering techniques designed to embed malware into various file formats, including PDF files.
PDFs are often chosen because they are perceived as secure by many users, making these
seemingly innocuous files effective tools for deceiving victims. When a malware-infected PDF
file is opened, the malicious software can be activated immediately to access sensitive data, such
as personal information, user credentials, or financial data[5].
However, behind these advantages, PDFs have also become a primary target for
cyberattacks, particularly in the context of data theft through malware dissemination. The
increasing cases of data theft by embedding malware into PDF files have become a serious
concern in the world of cybersecurity. This attack exploits vulnerabilities in the PDF structure or
uses social engineering techniques to trick users into opening malicious files[6]. After the infected
PDF file is opened, the malware can be activated to steal sensitive data from the victim's system,

Received June1st,2012; Revised June25th, 2012; http://jurnal.mdp.ac.id jatisi@mdp.ac.id


320 Jatisi ISSN 2407-4322
Vol. 12, No. 1, Maret 2025, Hal. 319-328 E-ISSN 2503-2933

including personal information, credentials, and financial data. This risk poses a significant threat
to individuals and organizations that store important information in PDF format[7].
Previous studies have shown that the Portable Document Format (PDF) has become a popular
medium for malware dissemination due to its ability to embed objects exploitable by attackers,
such as JavaScript scripts and encrypted objects. Saputra et al. (2023) employed the Support
Vector Machine (SVM) algorithm with a Byte Frequency Distribution (BFD)-based feature
extraction method to detect malware in PDF files. In their study, the Sequential Forward Selection
(SFS) method was utilized to reduce feature dimensionality, achieving a detection accuracy of
99.11% and an F1 score of 99.65%[8].
Previous research has revealed various techniques used by attackers to inject malware
into PDF files, such as obfuscation, the use of JavaScript, and embedding malicious objects[8].
Additionally, various malware analysis methods, both static and dynamic, have been developed
to detect and prevent the spread of malware through PDFs[9]. However, the main challenge faced
is how these methods can effectively detect and address increasingly sophisticated and hidden
threats[10].
This research aims to conduct a comprehensive analysis of malware analysis methods
used in detecting data theft through PDFs. The focus of this analysis is the comparison between
static and dynamic analysis techniques, with an emphasis on the effectiveness of each method in
identifying malware attacks, as well as understanding how malware operates.

2. METHODOLOGY

This research applies a quantitative approach with an experimental design to analyze and
assess the effectiveness of malware analysis techniques in detecting and preventing data theft
through PDF files. The research was conducted through several stages, namely sample collection,
static analysis, dynamic analysis, and result evaluation..

Figure 1. Research Method

2.1 Sample Collection


The first stage of this research is the collection of PDF file samples suspected of
containing malware. Samples were taken from various sources, including public malware
databases, security incident reports, and other relevant sources. These samples were then
classified based on the type of malware contained within them and the type of attack carried out.

2.1.1 Static Analysis


Static analysis is performed on PDF samples to identify harmful elements without
executing the file. The technique used in this analysis includes Reverse Engineering. This method
is used to analyze the internal structure of a PDF to detect suspicious elements, such as JavaScript

Rosihan, et., al [Malware Analysis of Data Theft Untilizing PDF File]


Jatisi ISSN 2407-4322
Vol. 12, No. 1, Maret 2025, Hal. 319-328 E- ISSN 2503-2933 321

scripts or hidden objects. Reverse Engineering is the process of dismantling or analyzing the
internal structure of a program or file to understand how it works, without needing to know the
original source code used to create it. In the context of malware analysis on PDF files, reverse
engineering is performed to uncover hidden or suspicious elements that may be used to carry out
cyber attacks, such as inserting malware.

2.1.2 Dynamic Analysis


Dynamic analysis is performed by executing the PDF file in a controlled environment to
observe the behavior of the malware file. The stages involved include (1) Behavioral analysis
stage. This stage is used to monitor the activities performed by the malware when executed. (2).
Anomaly detection is carried out to understand what is considered normal behavior in file
extensions, referring to how the file typically behaves. Any activity that exceeds this behavior,
such as attempting to execute code, open network ports, or access system files, is considered an
anomaly.

2.2. Report
The report is the final stage that explains the results obtained from static and dynamic
analysis, providing an overview of the analyzed malware.

3. HASIL DAN PEMBAHASAN

The results of this research reveal various methods and techniques used by malware to
steal data through PDF files. The analysis was conducted comprehensively, covering both static
and dynamic analysis of PDF files suspected of containing malware. The findings of this research
will be outlined in several key discoveries obtained from the analysis.

1. Static Analysis
Static analysis is conducted by thoroughly examining the structure and code of the malware
file without executing the malware file. The main focus of this analysis is to identify the
permissions requested by the malware file, the components involved, and potentially
suspicious behavior based on its source code.

Title oRosihan, et., al [Malware Analysis of Data Theft Untilizing PDF File]
322 Jatisi ISSN 2407-4322
Vol. 12, No. 1, Maret 2025, Hal. 319-328 E-ISSN 2503-2933

Figure 2. Script Malware PDF.

The program code in image 2 shows that the AndroidManifest.xml file requests several
permissions that indicate potential dangerous behavior, especially in the context of data theft
through SMS. Here are some visible permits. Receive_Sms: Allows malware files to receive
incoming SMS messages. This permission can be used to monitor incoming SMS messages
without the user's consent. Internet: Provides access to the internet, enabling malware files to
send data to external servers. This permission can be used to send stolen data to other locations.
Send_Sms: Allows malware files to send SMS without the user's knowledge. This permission is
often used by malicious malware files to send messages to premium numbers or spread malware
to other contacts. Read_Phone_State: Allows malware files to read phone status information,
such as IMEI and device number. This information can be used to track users or devices.
Receive_Boot_Completed: Allows malware files to start automatically when the device finishes
booting. This permission is often used by malware to remain active without user interaction.
Wake_Lock: Allows malware files to prevent the device from entering sleep mode, keeping the
malware files active in the background.
MainActivity: The main component (activity) that is set to launch first when the malware file is
opened. This activity may be used for the main user interface.
ReceiveSms: A receiver configured to listen for android.provider events Telephony
Sms_Received. This component will activate every time an incoming SMS is received and can
be used to capture the content of the incoming SMS, which can then be forwarded to a malware
file or an external server. SendSms: Possibly another component that handles SMS delivery.
Combined with the SEND_SMS permission, this component can be used to send messages
without the user's knowledge.
Another piece of evidence found in the malware file is several permissions still related to
the previous image. As shown in the following image 4

Rosihan, et., al [Malware Analysis of Data Theft Untilizing PDF File]


Jatisi ISSN 2407-4322
Vol. 12, No. 1, Maret 2025, Hal. 319-328 E- ISSN 2503-2933 323

Figure 3. Script malware PDF.

Based on image 3, the script contains a BroadcastReceiver designed to process SMS


received by the device. Here is a complete analysis of that part of the script: ReceiveSms: is a
subclass of BroadcastReceiver, which is used to handle events received by the malware file, in
this case, the SMS_RECEIVED event. This class is responsible for processing the SMS received
by the device and then extracting certain data from that SMS. OkHttpClient: is a library used to
make HTTP requests and send SMS data through the attacker's or perpetrator's Telegram API.
The data sent includes the sender's number, message content, and some important device
information such as Brand, User, Model, and Product. Callback HTTP Request: In the enqueue
section, this code uses the onResponse and onFailure callbacks to handle the server's response. If
the delivery is successful, the log will record the server's response, and if it fails, it will print an
error message.
This code exhibits very dangerous behavior, with signs of SMS-based data theft malware.
This malware file automatically monitors incoming SMS messages and sends them along with
device information to the attacker's server. This is a common example of malware designed to
capture SMS data and send it to an external server for data theft purposes.

2. Dinamic Analysis
Dynamic analysis is conducted to observe the behavior of malware files directly when
executed in a controlled environment. This method provides further information regarding the
activities of malware files that cannot be detected through static analysis. The process includes
monitoring network communication, detecting malicious activities, and the specific behavior
of malware files in handling user data.

Title oRosihan, et., al [Malware Analysis of Data Theft Untilizing PDF File]
324 Jatisi ISSN 2407-4322
Vol. 12, No. 1, Maret 2025, Hal. 319-328 E-ISSN 2503-2933

Figure 4. Malware detection result dynamic analysis

In picture 4. Explaining that based on the detection results through Metadefender, the
malware file invitation.pdf was detected, it was automatically identified that the invitation had an
apk extension that decompresses to the pdf format. The detection results show that the file was
detected as malware with a detection rate of 8%. Two out of twenty-four antivirus machines used
in this analysis identified this malware file as malware, with a Trojan detection
label.Android.SmsSpy. This result indicates that the malware file contains a harmful component
capable of accessing and monitoring users' text message (SMS) communications.

Rosihan, et., al [Malware Analysis of Data Theft Untilizing PDF File]


Jatisi ISSN 2407-4322
Vol. 12, No. 1, Maret 2025, Hal. 319-328 E- ISSN 2503-2933 325

Figure 5. Sanbox Dinamic Analysis

Results in Figure 5. Showing that Shanbox successfully detected this invitation file as
malware with a 50% presentation. With a threat score of 100%. This malware file was detected
as a Trojan malware.Android.SmsSpy. The test results from both machines yielded the same
outcome that the invitation file.This PDF is a type of SMS trojan spyware malware..

2.1.3. Report
Dynamic analysis of the malware file reveals that this malware is identified as
Trojan.Android.SmsSpy. Based on the results of the dynamic testing, this malware file shows a
very serious threat potential, marked by a threat score of 100/100. During the analysis process, a
number of suspicious activities were detected, such as the extraction of SMS data, the
transmission of user information to an external server via the Telegram API, and access to device
information such as user ID and device model. This malware file exploits the permissions granted
to activate a continuous data collection process running in the background, as well as performing
encryption to hide data transmission activities.
Furthermore, the activities monitored during the dynamic analysis indicate that the
malware file actively accesses SMS messages and sends them to an external server, which poses
a significant risk to user data privacy and security. These results are consistent with the findings
from the static analysis, which also identified permissions and functions in the source code that
allow access to SMS messages and device data. For example, the static analysis showed the

Title oRosihan, et., al [Malware Analysis of Data Theft Untilizing PDF File]
326 Jatisi ISSN 2407-4322
Vol. 12, No. 1, Maret 2025, Hal. 319-328 E-ISSN 2503-2933

presence of malware file permissions to access SMS messages and use external APIs, which was
further corroborated by findings in the dynamic analysis through direct observation of data
transmission activities via the Telegram API.
The correlation between static and dynamic analysis shows consistency in detecting the
type of malware and the level of threat generated. Static analysis provides indications of access
to sensitive data through existing permissions, while dynamic analysis offers concrete evidence
through direct observation of malware file behavior when executed. These two analysis methods
complement each other; static analysis offers insights into potential threats based on permissions
and code structure, while dynamic analysis confirms these threats through direct observation of
malware file behavior.

Table 1. Analysis Static and DInamic Result


Aspec Static Analysis Dinamic Anlysis
Application Identity Name: UNDANGAN.PDF Name: UNDANGAN.PDF
Size 6.6 MiB 6.6 MiB
SHA256 bc867d5c4c67b8a6731f70918ab689c7 bc867d5c4c67b8a6731f70918ab689c768c1791583f
68c1791583fa777b47a89d5ef1b821f6 a777b47a89d5ef1b821f6
Type Android Package (APK) Android Package (APK)
Threat Score - 100/100
Malware Type Trojan.Android.SmsSpy
Identified - SMS access Confirmed: SMS and device access permissions
Permissions - Device data access were used for data exfiltration activities through an
- Use of external API external API. (Telegram)
Dangerous Potential access to SMS and device - SMS retrieval
Activities data through the listed permissions - Data transmission to an external server via the
Telegram API
- Access to user ID and device model
Background - The data collection process runs in the background
Process without the user's knowledge.
Data Encryption - Encryption activities were detected in an attempt to
obscure the data exfiltration process.
Correlation of Identifying permissions for accessing Confirming hazardous activities identified in static
Findings sensitive data analysis through direct observation

4. CONCLUSION

This research analyzes data theft methods carried out through malicious PDF files using
malware analysis techniques. From the results obtained, several conclusions can be drawn as
follows: (1). PDF Structure as an Attack Vector: PDF files can be manipulated to hide malicious
payloads, especially in objects such as streams, metadata, and embedded content. Static analysis
shows that the structure of PDF objects, such as annotation objects and embedded JavaScript, is
often used to conceal malicious code. This malicious code can be executed when a user opens or
interacts with the PDF file. (2). Strings and Metadata in PDF: The analysis results also show that
the strings inserted into the document often include URLs or scripts that lead to dangerous
external sites. This facilitates data theft by directing users to malicious sites that aim to collect
personal information or credentials. Unnaturally altered metadata, including the malware file
name used to create the document, often serves as an indicator of malware presence. (3). Network
Activity Related to Malicious PDF: The network activity observed during interaction with the
infected PDF file indicates attempts to communicate with a remote server. This indicates that PDF
files not only function as malware distributors but also as tools for extracting and sending data to
third parties. These findings indicate that malicious PDFs often possess advanced networking
capabilities to steal sensitive data. (4)Detected System Changes: Additionally, the results of the
static analysis show that the malicious PDF files cause changes in the system, such as registry

Rosihan, et., al [Malware Analysis of Data Theft Untilizing PDF File]


Jatisi ISSN 2407-4322
Vol. 12, No. 1, Maret 2025, Hal. 319-328 E- ISSN 2503-2933 327

modifications and the addition of new malicious processes or services. This indicates that
dangerous PDFs do not only rely on exploitation at the malware file level but also attempt to
embed themselves deeper into the operating system to continuously steal data. (5). Effectiveness
of the Analysis Method: The use of static analysis in this research has proven effective in detecting
harmful patterns in the structure and content of PDF files. However, this method can be improved
by using dynamic analysis to capture the behavior of malicious files during runtime, which may
not be detected in static analysis alone.

5. SUGGESTION

Based on the research results regarding the analysis of malware in PDF files used for data
theft, several suggestions for further research can be proposed as follows: (1). Development of
Machine Learning-Based Detection Tools. In future research, the development of malware
detection tools using machine learning algorithms or artificial intelligence could be an effective
solution for detecting threats in PDF files in real-time. Algorithms designed to study patterns in
metadata, object structures, and hidden malicious code within PDF files can improve malware
detection accuracy and reduce false positive rates. (2). Testing on Various Operating Systems and
Platforms. Future research can examine how malicious PDF files interact with different operating
systems and platforms. (Windows, macOS, Linux, dan perangkat mobile). This will help
understand the variations in malware behavior caused by differences in security mechanisms and
the architecture of the operating systems used. (3). Development of PDF Detection-Based
Security Policies: Based on the results of this study, it is proposed that further research be
conducted to assist in the development of specific security policies for the management and
monitoring of PDF files in the corporate environment. This policy may include stricter rules
regarding the use of PDF files in communication, as well as increased monitoring of PDF files
uploaded to the internal system.

EXPRESSION OF GRATITUDE

I would like to express my deepest gratitude to the Faculty of Engineering at Khairun


University for the funding support provided for this research. Without the assistance and financial
support from the Faculty of Engineering, this research would not be able to be carried out
properly. This support played a crucial role in completing the research on malware analysis that
exploits PDF files for data theft. Hopefully, the results of this research can provide benefits and
real contributions to the development of science and data security in the future.

REFERENCES

[1] Basuki, A., Hadi, I. K., & Raindra, M. (2024). Pengaruh Perkembangan Teknologi
Informasi dan Komunikasi Terhadap Pelaksanaan Tugas TNI. Jurnal
Mahatvavirya, 11(2), 123-129.

[2] Saputra, H., Stiawan, D., & Satria, H. (2023). Malware Detection in Portable Document
Format (PDF) Files with Byte Frequency Distribution (BFD) and Support Vector Machine
(SVM). Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), 9(4), 1144-
1153.

Title oRosihan, et., al [Malware Analysis of Data Theft Untilizing PDF File]
328 Jatisi ISSN 2407-4322
Vol. 12, No. 1, Maret 2025, Hal. 319-328 E-ISSN 2503-2933

[3] Wiharja, S., Pradeka, D., & Suteddy, W. (2024). Comparative Study of The Effect of
Datasets and Machine Learning Algorithms for PDF Malware Detection. Digital Zone:
Jurnal Teknologi Informasi dan Komunikasi, 15(1), 80-93.

[4] Ariyaningsih, S., Andrianto, A. A., Kusuma, A. S., & Prastyanti, R. A. (2023). Korelasi
Kejahatan Siber dengan Percepatan Digitalisasi di Indonesia. Justisia: Jurnal Ilmu
Hukum, 1(1), 1-11.

[5] Decode Avas io, https://decoded.avast.io/threatresearch/avast-q1-2024-threat-report/

[5] Butarbutar, R. (2023). Kejahatan Siber Terhadap Individu: Jenis, Analisis, dan
Perkembangannya. Technology and Economics Law Journal, 2(2), 3.

[6] Prayudi, Y., Kom, M., Kom, F. Y. S., & Kom, M. (2018). Aplikasi Pengujian Celah
Keamanan pada Aplikasi Berbasis Web.

[7] Santoso, J. T. (2023). Teknologi Keamanan Siber (Cyber Security). Penerbit Yayasan
Prima Agus Teknik, 1-173.

[8] Saputra, H., Stiawan, D., & Satria, H. (2023). Malware Detection in Portable Document
Format (PDF) Files with Byte Frequency Distribution (BFD) and Support Vector Machine
(SVM). Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), 9(4), 1144-
1153.

[8] Rafsanjani, A. S. (2023). A Malicious Url Detection Framework Using Priority Coefficient
and Feature Evaluation (Doctoral dissertation, Universiti Teknologi Malaysia).

[9] Hadiprakoso, R. B., Qomariasih, N., & Yasa, R. N. (2021). Identifikasi Malware Android
Menggunakan Pendekatan Analisis Hibrid Dengan Deep Learning. Jurnal Teknologi
Informasi Universitas Lambung Mangkurat (JTIULM), 6(2), 77-84.

[10] Sianipar, V. R. (2023). Analisis dan Deteksi Malware pada Protokol Jaringan
Menggunakan Metode Malware Analisis Dinamis dan Malware Analisis Statis (Doctoral
dissertation, Program Studi Teknik Informatika).

Rosihan, et., al [Malware Analysis of Data Theft Untilizing PDF File]

You might also like