0% found this document useful (0 votes)
23 views6 pages

V25I0107

Hybrid Machine Learning Models for the Detection and Prevention of Cyber Fraud Apps

Uploaded by

ijesatj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views6 pages

V25I0107

Hybrid Machine Learning Models for the Detection and Prevention of Cyber Fraud Apps

Uploaded by

ijesatj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Journal of Engineering Science and Advanced Technology (IJESAT)

Vol 25 Issue 01, JAN, 2025

Hybrid Machine Learning Models for the Detection


and Prevention of Cyber Fraud Apps
Sk. Jabeerulla1, Madati Krishna Vamsi2 Koritala Nagaraju3
Asst. Professor, Department Of Computer Science Department Of Computer Science
Department Of Computer Science And Engineering, And Engineering,
And Engineering, Vignan’s Lara Institute of Vignan’s Lara Institute of
Vignan’s Lara Institute of Technology and Science, Technology and Science,
Technology and Science, Guntur, India, Guntur, India,
Guntur, India, krishnavamsi.madati@gmail.com koritalanagaraju8998@gmail.com
jabeershaik1482@gmail.com
Govindu Krishna Sai4 Nagam Guru Venkata Lokesh5
Department Of Computer Science Department Of Computer Science
And Engineering, And Engineering,
Vignan’s Lara Institute of Vignan’s Lara Institute of
Technology and Science, Technology and Science,
Guntur, India, Guntur, India,
krishnasaigovindu789@gmail.com nagamlokesh93@gmail.com

Abstract—The rapid growth of Android applications has in- and applications. Yet, such openness leaves users exposed to
creased the risk of cyber fraud, because the malicious appli- enormous security risks, mainly from malicious applications
cations exploit the permissions of the users for carrying out distributed by third-party platforms and sometimes even offi-
unauthorized activities like data theft, premium SMS fraud, and
ransomware attacks. The traditional methods of malware detec- cial application stores like Google Play.
tion are signature- and heuristic-based approaches, which cannot Malicious applications, or malware, exploit user permissions
cope with sophisticated attacks such as zero-day malware and to execute unauthorized actions such as:
obfuscation techniques. It will introduce a hybrid methodology,
• Data Theft: Accessing sensitive user data such as con-
synergistically combining Support Vector Machines and Artificial
Neural Networks, in order to identify malware by permission tacts, messages, and browsing history.
patterns. A Genetic Algorithm (GA) is used in the system for • Premium SMS Fraud: Sending expensive SMS mes-
feature selection that optimizes the dataset for accuracy and sages without user consent.
reduces the computational overhead. • Ransomware Attacks: Encrypting user files and de-
The ANN model shows better detection performance with an
accuracy of 94.2%, which is higher than that of SVM, which
manding money for decryption.
achieves 91.8%. The framework is further enhanced by the use The threat has exponentially increased cyber fraud. Global
of a real-time detection system as a Flask-based web application, financial losses have crossed billions of dollars per year. With
where users can upload Android Package (APK) files for analysis.
mobile applications now central to essential daily functions,
The application extracts permissions based on static analysis
techniques and returns classification results as either malicious including banking, communication, and healthcare, threats and
or benign with a confidence score. risks amplify.
Some key contributions of this research include ANN integra-
tion into the improvement of nonlinear feature modeling, GA B. Problem Statement
usage to optimally choose features, and the creation of a real-
time malware detection system that is scalable and user-friendly. The conventional approaches used for malware detection
The framework proposed hereby shows the power of machine rely on either signature-based or heuristic-based methods.
learning in enhancing Android application-based cybersecurity Signature-based methods are highly efficient for known threats
and presents a stepping stone for further research work in
but fail against zero-day attacks or malware employing ob-
malware detection.
fuscation techniques. Heuristic-based methods are prone to
I. INTRODUCTION high false-positive rates, often flagging harmless applications
as malicious.
A. Background Traditional approaches have limitations; hence, more adap-
Android is the leader in the mobile landscape, with billions tive and scalable solutions are needed. Machine Learning (ML)
of users and devices spread worldwide. Being an open-source models offer a promising alternative, but several challenges
platform, it encourages an active ecosystem of developers remain:

ISSN No: 2250-3676 www.ijesat.com Page | 37


International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 25 Issue 01, JAN, 2025
• High-Dimensional Data: Android applications request a allow for permissions extraction to identify patterns indicative
large number of permissions, many of which are redun- of malicious behavior.
dant or irrelevant for detecting malicious intent. Strengths of Static Analysis:
• Model Scalability: The ability of models to perform well • Efficiency: Processes a large number of applications
on large datasets and varied types of malware. quickly.
• Real-Time Applicability: Deploying ML models in user- • Scalability: Suitable for analyzing thousands of apps
friendly, scalable systems for real-world use. simultaneously.
• Reproducibility: Consistent feature extraction across anal-
C. Objectives
yses.
This research is meant to be a hybrid machine. A learning Limitations of Static Analysis:
model that detects and prevents cyber fraud in An- droid
• Code Obfuscation: Malware developers often hide their
applications, meeting the following significant challenges:
code to evade detection.
• Permissions-Based Detection: Using static analysis to
• Lack of Behavioral Insights: Does not capture runtime
extract permissions from APK files as features for ML behavior like API calls or network activity.
models. • Limited Zero-Day Detection: Relies heavily on prede-
• Hybrid Model Integration: Employing SVM for base-
fined patterns, which may not generalize to new threats.
line performance and ANN for non-linear patterns.
• Feature Optimization: Using GA to reduce redundant B. Dynamic Analysis
permissions and improve computational efficiency. Dynamic analysis evaluates an application’s behavior during
• Real-Time Deployment: Designing a scalable, web-
runtime in a sandbox environment. This approach identifies
based detection system using Flask to enable real-time malicious activities such as network communication, file ac-
analysis. cess, and API calls.
D. Contributions Advantages:
• Behavioral Insights: Captures real-time malicious activi-
Key contributions of this work are:
ties.
• ANN for Malware Detection: Integration of ANN with
• Zero-Day Detection: Identifies new malware based on
permissions-based malware detection to achieve better suspicious behavior patterns.
accuracy and flexibility. • Comprehensive: Observes all operational contexts of an
• Feature Selection Using GA: Demonstrates that GA
application.
can optimize large permission datasets, minimizing model
complexity without losing performance. Disadvantages:
• Flask Real-Time Application: A user-friendly system • Resource-Intensive: Time- and computation-dependent.

that makes machine learning predictions actionable and • Scalability Issues: Not feasible for large-scale analysis.

useful for end users. • Evasion Techniques: Malware can detect sandbox envi-

• Benchmarking Analysis: The performance of SVM and ronments and modify its behavior.
ANN models is critically evaluated on Android malware
C. Machine Learning for Malware Detection
datasets, thereby providing a comparative assessment of
the strengths and weaknesses of both. Machine Learning offers significant advantages over tradi-
tional methods by identifying complex data patterns. Features
E. Significance extracted from Android applications, such as permissions and
This paper overcomes significant challenges associated with system calls, serve as input for ML models.
detection of Android malware by presenting an integration of 1) Support Vector Machine (SVM): The robustness of SVM
current state of the art approaches within machine learning and in handling high-dimensional data has established it as a go-to
deployable solutions. The proposed system demonstrates the approach for detecting malware.
efficacy of combining static analysis with hybrid ML models Limitations:
to effectively detect and mitigate cyber threats, paving the way • Struggles with non-linear patterns without kernel func-
for future innovations in mobile cybersecurity. tions.
• Computationally expensive for large datasets.
II. LITERATURE SURVEY
2) Artificial Neural Networks (ANNs): ANNs have great ca-
A. Static Analysis pabilities in modeling non-linear relationships and interactions
Static analysis involves examining an application’s code, between features, hence, they can handle large and complex
manifest files, and permissions without executing the APK. datasets efficiently.
The reasons why this approach is used widely in malware Limitations:
detection is because of its computational efficiency, where it • Require hyperparameter tuning.
can easily handle the large datasets. Tools like Androguard • Computationally intensive during training.

ISSN No: 2250-3676 www.ijesat.com Page | 38


International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 25 Issue 01, JAN, 2025
D. Feature Selection Methods – Premium SMS Malware
Feature selection enhances model performance by eliminat- All APK files were gathered from reliable sources such
ing irrelevant features. as research datasets and repositories. To avoid bias in model
1) Genetic Algorithm (GA): This approach uses a natu- training and achieve accurate results, the dataset is balanced.
ral selection-based strategy to identify an optimal subset of
C. Permissions Extraction
features, where their relevance is assessed by a fitness score
reflecting the performance of the model. Permissions are extracted from APK files using static anal-
Advantages: ysis techniques.Each APK contains an AndroidManifest.xml
• Handles high-dimensional feature spaces.
file that holds a list of requested permissions, which is critical
• Captures interactions between permissions.
in the detection of malware since some permissions can be
used to gain access by malicious applications. Permissions are
III. PROPOSED METHODOLOGY crucial features for detecting malware, as certain permissions
A. Framework Overview (e.g., SEND SMS, WRITE EXTERNAL STORAGE) are of-
ten misused by malicious apps.
This section outlines the comprehensive methodology Permissions Overview:
adopted for the detection and prevention of cyber fraud
• 1: Permission is requested.
targeting Android applications. The proposed framework en-
• 0: Permission is not requested.
compasses static analysis, feature selection using Genetic
Algorithm (GA), hybrid machine learning models, and real- The permissions identified are 428, forming the initial
time deployment. Each phase is explained using flowcharts. feature space for analysis.
1) Dataset Preparation: Collect and preprocess benign and
malicious APK files.
2) Permissions Extraction: Extract static features (permis-
sions) from APKs through manifest file analysis.
3) Feature Selection: Use GA for the optimization of the Fig. 2. Permissions Extraction Process
permission set by removing irrelevant and redundant
features. D. Feature Selection Using Genetic Algorithm (GA)
4) Model Training: SVM and ANN models are to be trained
using the optimized feature set. Effective feature selection is a critical factor towards achiev-
5) Real-Time Deployment: A Flask-based web application ing higher model accuracy and efficiency, thereby lowering
is to be implemented for real-time malware detection. the computational costs. The initial feature space of 428
permissions is reduced, as many permissions are redundant
or irrelevant. The study uses a Genetic Algorithm to select
the most discriminatory permissions for accurate malware
identification.
Steps in Genetic Algorithm:
1) Initialization: A randomly generated population of fea-
ture subsets is created. Each subset corresponds to a
unique combination of permissions.
2) Measuring Subset Quality: Fitness of every subset mea-
sured in terms of finding the SVM classifier’s accuracy
for classification on that particular subset through learn-
Fig. 1. Overall Framework ing on it.
3) Selection: The best-performing subsets are selected for
the next generation.
B. Dataset Preparation
4) Crossover: Selected subsets are combined to produce
The dataset for this study is categorized into two types of new subsets, simulating biological reproduction.
APK files: benign and malicious. 5) Mutation: Random changes are introduced to subsets to
• Benign Apps: 1,500 APKs gathered from the period of explore additional features.
2015–2017. 6) Termination: The process continues until the optimal
• Malicious Apps: 1,200 APKs distributed across various feature subset is found.
malware categories:
– Adware E. Model Training
– Ransomware The refined feature set is used to train two machine learning
– SMS Malware models: SVM and ANN.
– Scareware Support Vector Machine (SVM):

ISSN No: 2250-3676 www.ijesat.com Page | 39


International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 25 Issue 01, JAN, 2025

Fig. 3. Genetic Algorithm Process Fig. 5. Real-Time Deployment Workflow

• Purpose: Used as a baseline model for handling high- IV. RESULTS AND DISCUSSION
dimensional datasets. A. Performance Metrics
• Kernel: Radial Basis Function (RBF) for handling non- The SVM and ANN models’ performances are presented
linear relationships. through several key evaluation metrics particularly below:
• Optimization: Hyperparameters (e.g., C, gamma) are • Accuracy: The general correctness of the predictions.
tuned using grid search. • Precision: The accuracy of positive predictions, calcu-
Artificial Neural Network (ANN): lated as true positives divided by total predicted positives.
• Purpose: Models the complex interdependencies and non- • Recall: The proportion of correctly identified positives

linear effects which exist in permissions data. among all actual positive instances.
• Architecture: • F1-Score: This metric provides a unified measure of
model performance by harmonizing precision and recall.
– Input Layer: 428 nodes or the reduced subset from
GA.
TABLE I
– Hidden Layers: Four layers with 256, 128, 128, and COMPARISON OF PERFORMANCE METRICS
32 nodes, using ReLU activation.
– Output Layer: One node with sigmoid activation for Metric SVM ANN
Accuracy 91.8% 94.2%
binary classification. Precision 90.7% 93.4%
• Regularization: To avoid overfitting, we used dropout Recall 89.6% 92.8%
F1-Score 90.1% 93.1%
with the dropout rate of 0.2.
• Training: Utilizes binary cross-entropy loss with Adam
optimizers. B. Confusion Matrices
Confusion matrices break down prediction results into four
quadrants: correct identifications (TP and TN) and incorrect
identifications (FP and FN).

Fig. 4. Model Training Workflow

F. Real-Time Deployment
The trained models are deployed in a Flask-based web
application for real-time analysis of uploaded APK files.
Workflow for Real-Time System:
Fig. 6. Confusion Matrices for SVM and ANN
1) APK Upload: Users upload APK files through a web
interface.
2) Permissions Extraction: The system extracts permissions C. Feature Importance
from the APK’s manifest file. It presents the most critical permissions contributing to
3) Feature Encoding: Extracted permissions are encoded malware detection and enhances insights into the decision-
into a binary vector based on the selected feature subset. making process of the model, hence model explainability.
4) Model Prediction: The SVM or ANN model classifies Top permissions identified include:
the APK as malicious or benign. • SEND SMS: Associated with SMS fraud.
5) Result Display: Outputs the classification result along • RECEIVE BOOT COMPLETED: Enables
with a confidence score. persistence after reboot, a common malware tactic.

ISSN No: 2250-3676 www.ijesat.com Page | 40


International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 25 Issue 01, JAN, 2025
• WRITE EXTERNAL STORAGE: Frequently F. Final Results
exploited for modifying or encrypting files. The hybrid framework demonstrates superior malware de-
tection performance for Android applications with the follow-
ing outcomes:
1) ANN Outperforms SVM: The ANN demonstrates its
capability of effectively capturing non-linear interactions
with an accuracy of 94.2% and precision of 91.8%.
2) Feature Optimization via GA: GA reduces permissions
from 428 to an optimal subset, improving computational
efficiency without compromising accuracy.
3) Practical Real-Time System: The Flask-based applica-
tion allows seamless real-time APK analysis, providing
actionable insights to users.
V. CONCLUSION AND FUTURE WORK
Fig. 7. Feature Importance Analysis
This research proposed a hybrid machine learning frame-
work for detecting and preventing cyber fraud in Android ap-
D. Artificial Neural Network (ANN) Architecture plications. By combining Support Vector Machines (SVM) and
Artificial Neural Networks (ANN) with feature selection using
The ANN model is optimized with non-linear patterns in
Genetic Algorithms (GA), the framework achieved high detec-
permission data. It consists of:
tion accuracy and efficiency. The ANN model outperformed
• Input Layer: 428 nodes representing features of permis-
the SVM baseline with an accuracy of 94.2%, showcasing its
sions. capability to model non-linear interactions in the data.
• Hidden Layers: Four layers with ReLU activation:
The use of GA successfully reduced feature dimensionality,
– Layer 1: 256 nodes minimizing computational overhead while retaining predic-
– Layer 2: 128 nodes tive performance. The real-time deployment of the frame-
– Layer 3: 128 nodes work through a Flask-based web application demonstrated its
– Layer 4: 32 nodes practical applicability, enabling end-users to detect malware
• Output Layer: The model utilizes a solitary node with effectively.
sigmoid activation to facilitate binary classification out- Key Contributions:
comes. • ANN Integration: Enabled the framework to model non-
• Dropout Regularization: Prevents overfitting by adding linear relationships in permission data, outperforming
dropout layers after each hidden layer. traditional models like SVM.
• Feature Optimization via GA: Successfully reduced the
dimensionality of the feature space without affecting ac-
curacy and with the reduction of computational overhead.
• Real-Time Deployment: Developed a user-friendly
Flask-based application ensuring practical relevance and
scalability.
Future Work:
• Integration of Dynamic Analysis: Incorporating runtime
behavioral characteristics, such as API calls and system
logs, to enhance detection capabilities.
Fig. 8. Artificial Neural Network (ANN) Architecture • Ensemble Models: Exploring ensemble techniques to
combine multiple machine learning models for improved
robustness and accuracy.
E. Observations • Expanded Dataset: Successfully reduced the dimension-

• ANN Outperforms SVM: ANN achieves higher ac- ality of the feature space without affecting accuracy and
curacy and F1-score, demonstrating better capability to with the reduction of computational overhead.
model non-linear interactions. • Cross-Platform Support: Adapting the framework for

• Feature Optimization: GA effectively reduces feature malware detection in other mobile operating systems,
dimensionality while maintaining performance. including iOS.
• Real-Time Viability: Flask deployment ensures scala- This research concludes by introducing a highly effective,
bility and usability for malware detection in practical scalable, and efficient machine learning-based solution for de-
scenarios. tecting Android malware. This innovative system significantly

ISSN No: 2250-3676 www.ijesat.com Page | 41


International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 25 Issue 01, JAN, 2025
strengthens mobile ecosystem security and sets the stage for
pioneering advancements in malware detection and prevention
technologies.
REFERENCES
[1] Zhuo Chen, L. Wu, Y. Hu, J. Cheng, Y. Hu, Y. Zhou, Z. Tang, Y. Chen,
J. Li, and K. Ren, ”Lifting the Grey Curtain: Analyzing the Ecosystem
of Android Scam Apps,” IEEE Transactions on Mobile Computing, vol.
20, no. 9, pp. 1–15, 2021.
[2] F. Chollet, Deep Learning with Python, 1st ed. Manning Publications,
2018.
[3] A. Gupta, A. Malhotra, and K. Sharma, ”Machine learning techniques
for Android malware detection based on permissions,” Journal of
Cybersecurity Applications, vol. 54, no. 1, pp. 1–15, 2020.
[4] N. Patel and M. Kumar, ”Optimizing Android malware detection using
Genetic Algorithms for feature selection,” International Journal of
Computer Applications, vol. 178, no. 23, pp. 35–42, 2019.
[5] A. Souri and H. Hosseini, ”Comprehensive analysis of machine learn-
ing approaches in malware detection,” Human-centric Computing and
Information Sciences, vol. 8, pp. 1–20, 2018.
[6] Canadian Institute for Cybersecurity, ”AndMal2019 dataset for Android
malware analysis,” [Online]. Available: https://www.unb.ca/cic/datasets/
invesandmal2019.html.
[7] H. Gascon, D. Arp, and K. Rieck, ”Using graph structures for malware
detection in Android applications,” Proceedings of the ACM Security
Conference, pp. 1–10, 2013.
[8] G. Suarez-Tangil, J. Tapiador, and P. Peris-Lopez, ”Advanced classifica-
tion techniques for Android malware families based on code structures,”
Expert Systems with Applications, vol. 41, no. 4, pp. 1104–1117, 2014.
[9] J. Sahs and L. Khan, ”Static analysis and machine learning for Android
malware detection,” Proceedings of the European Security Conference,
pp. 141–147, 2012.
[10] Y. Zhou and X. Jiang, ”A comprehensive study on Android malware
evolution and characteristics,” IEEE Symposium on Privacy and Security
Research, pp. 95–109, 2012.
[11] S. Kumar and D. S. Gill, ”Static analysis techniques for permissions-
based Android malware detection,” International Journal of Data Sci-
ence Applications, vol. 9, pp. 45–55, 2021.
[12] R. Mehta, ”Integration of machine learning in combating Android
malware,” Journal of Mobile Security Studies, vol. 10, pp. 15–25, 2020.
[13] A. Williams and T. Smith, ”Comparative analysis of static and dynamic
analysis for Android malware detection,” Cybersecurity Advances Pro-
ceedings, pp. 67–78, 2019.
[14] ”Androguard documentation,” A comprehensive guide on analysis tools
for an application developed on Android. [Online]. Available: https://
androguard.readthedocs.io/en/latest/.
[15] F. Alajmi and A. Alsulami, ”Hybrid Feature Based Approaches in Smart
Device Malware Detection for Energy Awareness through Android,”
Procedia Computer Science Journal, vol. 170, pp. 832–838, 2020.

ISSN No: 2250-3676 www.ijesat.com Page | 42

You might also like