0% found this document useful (0 votes)
41 views53 pages

Major Report

major report on project

Uploaded by

Shruti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views53 pages

Major Report

major report on project

Uploaded by

Shruti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

MOBILE MALWARE DETECTION

Major Project Stage-II Report

Submitted in partial fulfillment of


the requirements for the award of the degree of

Bachelor of Technology
in
Computer Science and Engineering

Submitted by

B. Mounika (20SS1A0505)
Ch. Sailaja (20SS1A0509)
S. Sony (20SS1A0545)
J. Raja Sreekanth (20SS1A0523)

Under the guidance of


Dr. G. Narsimha
Professor and Principal

Department of Computer Science and Engineering


JNTUH University College of Engineering Sultanpur
Sultanpur(V),Pulkal(M),Sangareddy district,Telangana-502273
May 2024
JNTUH UNIVERSITY COLLEGE OF ENGINEERING
SULTANPUR
Sultanpur(V),Pulkal(M),Sangareddy-502273 ,Telangana

Department of Computer Science and Engineering

Certificate

This is to certify that the Major Project report work entitled “MOBILE MALWARE
DETECTION”through Machine Learning Techniques is a bonafide work carried out
by a team consisting of B. Mounika bearing Roll no.20SS1A0505, Ch. Sailaja
bearing Roll no.20SS1A0509, S. Sony bearing Roll no.20SS1A0545, J. Raja
Sreekanth bearing Roll no.20SS1A0523, in partial fulfillment of the requirements for
the degree of BACHELOR OF TECHNOLOGY in COMPUTER SCIENCE AND
ENGINEERING discipline to Jawaharlal Nehru Technological University Hyderabad
University College of Engineering Sultanpur during the academic year 2023- 2024.

The results embodied in this report have not been submitted to any other University
or Institution for the award of any degree or diploma.

Guide Head
Dr. G. Narsimha Dr. G. Narsimha
Professor and Principal Professor and Principal

EXTERNAL EXAMINER

i
Declaration

We hereby declare that the Major Project entitled “MOBILE


MALWARE DETECTION” is a bonafide work carried out by a team
consisting of B. Mounika bearing Roll no.20SS1A0505, Ch. Sailaja
bearing Rollno.20SS1A0509, S. Sony bearing Roll no.20SS1A0545, J.
Raja Sreekanth bearing Roll no. 20SS1A0523, in partial fulfillment of
the requirements for the degree of Bachelor of Technology in Computer
Science and Engineering discipline to Jawaharlal Nehru Technological
University Hyderabad University College of Engineering Sultanpur
during the academic year 2023- 2024.The results emboided in this
report have not been submitted to any other University or Institution for
the award of any degree or diploma.

B.Mounika (20SS1A0505)

Ch.sailaja (20SS1A0509)

S.Sony (20SS1A0545)

J.Raja Sreekanth (20SS1A0523)

ii
Acknowledgment

We wish to take this opportunity to express our deep gratitude to all those who helped
us in various ways during our Major Project report work. It is our pleasure to
acknowledge the help of all those individuals who were responsible for foreseeing the
successful completion of our Major Project report.

We express our sincere gratitude to Dr. G. Narsimha, Professor and


Principal, JNTUHUCES for his support during the course period.

We express our sincere gratitude to Dr. Y. Raghavender Rao, Professor


and Vice Principal, JNTUHUCES for his effective suggestions during the course
period.

We are thankful to Shri.


Joshi Shripad,Associate Professor and
Training and Placement Officer , JNTUHUCES, for his support and guidance
in the completion of our Major Project.

We express our sincere gratitude to our guide Dr. G. Narsimha, Professor


and Principal, JNTUHUCES for his support during the course period.

Finally,we express our gratitude with great admiration and respect to our faculty
for their moral support and encouragement throughout the course.

B.Mounika (20SS1A0505)
Ch.sailaja (20SS1A0509)
S.Sony (20SS1A0545)
J.Raja Sreekanth (20SS1A0523)

iii
Contents

Certificate i

Declaration ii

Acknowledgement iii

Abstract viii

List Of Figures ix

1 INTRODUCTION 1

1.1 Project Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Existing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4.1 Limitations : . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

iv
1.5 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.6 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 LITERATURE SURVEY 4

2.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 ANALYSIS 6

3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.3 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3.1 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.4 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.6 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.7 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.8 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 SYSTEM ARCHITECTURE 10

v
4.1 System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2 User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 DESIGN 14

5.1 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.2 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.3 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.4 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6 IMPLEMENTATION 19

6.1 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6.2 Mobile Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . 20

6.2.1 Classificaion . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.2.2 Counting Malacious And Benign Apps . . . . . . . . . . . . . 25

6.2.3 Extracting Best Results . . . . . . . . . . . . . . . . . . . . . . 26

6.2.4 Main code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.2.5 Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.3 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

vi
6.4 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.5 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.5.1 Functionality Testing . . . . . . . . . . . . . . . . . . . . . . . 38

6.5.2 Usability Testing . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.5.3 Interface Testing . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.5.4 Security Testing . . . . . . . . . . . . . . . . . . . . . . . . . 40

CONCLUSION 42

REFERENCES 43

vii
Abstract

Android Malware Detection is a project that involves collecting a dataset of Android


applications and extracting various features such as permissions, API calls, and code
obfuscation techniques. The dataset is then used to train machine learning models to
classify new applications as benign or malicious. The machine learning models used in
the project include decision trees and support vector machines. To develop the system,
Python is used as the programming language along with several popular machine
learning libraries, such as scikit-learn and KERAS. The project’s code includes feature
extraction, data preprocessing, model training, and evaluation scripts. The system is
designed to be scalable and can handle a large number of applications. Overall, the
project aims to develop an effective and reliable method for detecting Android
malware using machine learning in Python. This can help protect users from potential
security threats and ensure the security and integrity of their devices.

viii
List of Figures

4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.1 Usecase Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.2 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.3 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.4 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

ix
Chapter 1

INTRODUCTION

1.1 Project Overview

The Android Malware Detection project employs machine learning techniques in


Python, utilizing scikit-learn and Keras libraries. The initiative involves collecting a
diverse dataset of Android applications and extracting features such as permissions,
API calls, and code obfuscation. Decision trees and support vector machines are
chosen as the classification models.

1.2 Problem Statement

The project’s core objective is to construct a machine learning model dedicated to the
detection of Android malware in mobile applications. The methodology involves the
implementation of cutting-edge machine learning techniques, ensuring a reliable and
efficient system for Android malware detection. Overall, the project addresses the
critical need for enhanced security measures within the dynamic landscape of mobile
applications.

1
1.3 Purpose

• Develop a robust system to identify and classify Android applications, enhancing


user protection against potential security threats.

• Utilize machine learning for early detection of malicious behavior, allowing users
to preemptively address security risks

• Employ decision trees and support vector machines to enhance the accuracy of
malware detection.

1.4 Existing System

The existing system for the project involves analyzing an Android malware dataset
using various data analysis and machine learning techniques. The code provided in the
project loads the dataset, explores the data, handles class imbalance, performs feature
selection, trains a Decision Tree Classifier, and evaluates the model’s performance. It
provides insights into the dataset and predicts malware based on the selected features.

1.4.1 Limitations :

• Does not include a comprehensive user interface or advanced visualization


capabilities, making it less user-friendly and limiting the interpretation of results.

1.5 Proposed System

The proposed system aims to enhance the existing project by incorporating additional
features to improve user experience and result interpretation. The proposed system
will include a user-friendly interface where users can easily load their own datasets,
visualize the data, and obtain insights through interactive visualizations. It will also
provide more advanced evaluation metrics and visualization techniques to help users
better understand the model’s performance. Additionally, the proposed system will
allow users to export the results in various formats for further analysis and reporting.

2
Overall, the proposed system aims to provide a more comprehensive and user-centric
approach to Android malware dataset analysis

1.6 Scope

The scope of Android malware detection is vast and encompasses a wide range of
activities, technologies, and research areas aimed at identifying, mitigating, and
preventing malicious software threats targeting Android devices. Here are some key
aspects that fall within the scope of Android malware detection

1.7 Conclusion

In conclusion, seeks to elevate the existing project by introducing a suite of


user-centric features aimed at enhancing the overall experience of Android malware
dataset analysis. Through a user-friendly interface, users will gain the ability to
seamlessly load their datasets, visualize data, and derive valuable insights using
interactive visualizations. The incorporation of advanced evaluation metrics and
visualization techniques will empower users to better comprehend and assess the
performance of the model.

3
Chapter 2

LITERATURE SURVEY

A recent study conducted by a unique organization dedicated to conduct security


vulnerability research found that there is a 400 percent of software created to stealthy
operate behind the user interface but that can gather sensitive information that can be
used for many different purposes, going from targeted advertisement, tracking
purposes up to fraud activity leading in many cases customers with unrecoverable
bills. The main purpose of this post is not to make the reader aware of the existence of
Malware for mobile devices, but the main purpose is to make the reader aware of the
impressive grow of malware being targeted to mobile devices. And to make the reader
aware of the upcoming phase of hacking techniques as Smartphones are becoming
more popular, hackers have evolve from targeting PC to targeting smartphones, and
particularly the Google Android OS. An Android Application Sandbox (AASandbox)
which is able to perform both static and dynamic analysis on Android programs to
automatically detect suspicious applications. Static analysis scans the software for
malicious patterns without installing it. Dynamic analysis executes the application in a
fully isolated environment, i. e. sandbox, which intervenes and logs low-level
interactions with the

following two limitations: Do not cover Permission re-delagation. Do not


analyze the code nor the behavior of application. In 2010, Mohammad et. al. [13]
presented a paper in proceedings ACM named Apex: Extending Android Permission
Model and Enforcement with User-defined Runtime Constraints. In this paper, Apex
provides framework for Android that allows a user to selectively grant permissions to
applications as well as impose constraints on the usage of resources. They also
describe an extended package installer that allows the user to set these constraints

4
through an easy-to-use interface In 2012, Sarma et. al. [12] presented a paper in
proceedings ACM named Android Permissions: A Perspective Combining Risks and
Benefits. In this paper, they investigate the feasibility of using both the permissions an
app requests, the category of the app, and what permissions are requested by other
apps in the same category to better inform users whether the risks of installing an app
is commensurate with its expected benefit. In 2012, Kern et. al. [9] presented a paper
in proceedings UBICOMM named Permission Tracking in Android. In this paper, they
have a closer look at permissions that users grant to apps in Android, a wide-spread
operating system for mobile devices like smart phones. They developed tool that
allows users to administer permissions of their applications. They enable users to
allow or deny permissions at any time. In 2012, Yarn et. al. [10] presented a paper in
proceedings ACM named Short paper: enhancing users’ comprehension of android
permissions. In this paper, they propose to help Android users better understand
application permissions through crowdsourcing. In this approach, collections of users
of the same application use our Literature Analysis on Malware Detection 721 tool to
help each other on permission understanding by sharing their permission reviews.
They developed a tool called Droidganger which is constructed using two techniques:
record/replay permission suppression.

2.1 Conclusion

The literature survey on heart stroke prediction reveals a dynamic landscape marked
by the increasing application of advanced machine learning techniques and a focus on
diverse datasets, including electronic health records and wearable device data. The
studies emphasize the ongoing challenges of interpretability and generalization across
diverse populations. Despite advancements, there is a continued need for collaborative
interdisciplinary research to address these challenges and further refine predictive
models for heart stroke risk assessment, ensuring their practical implementation in
real-world healthcare scenarios.

5
Chapter 3

ANALYSIS

3.1 Data Collection

• Malicious Apps: Collect samples of known malicious mobile applications.

• Benign Apps: Include diverse samples of legitimate mobile applications.

• Behavioral Data: Capture app behavior, including network activities and system
calls.

• Permissions and Features: Analyze app permissions and network traffic for
potential risks.

• Continuous Updates: Regularly update the dataset with the latest threat
intelligence for evolving risks.

3.2 Reliability

• Detection Techniques:Effective use of signature-based, behavioral, and heuristic


methods.

• Machine Learning Quality:Quality feature extraction and diverse training data for
machine learning models.

• Real-Time Updates:Timely updates for new malware definitions and behaviors.

6
• Low False Positives:Minimizing false positives to enhance reliability.

• Adaptability:Adapting to new threats while maintaining efficiency on mobile


devices.

3.3 Availability

Mobile malware detection systems provide continuous monitoring for real-time threat
identification.

Efficient design minimizes resource usage, ensuring mobile malware detection without
hindering device performance.

3.3.1 Constraints

• Limited Resources:Mobile devices have constrained processing power and


memory.

• Privacy Challenges:Balancing effective detection with user privacy


considerations.

• Network Dependency:Reliance on network connectivity for real-time updates.

• App Diversity:Diverse mobile apps require adaptable detection methods.

3.4 Portability

• Cross-Platform Compatibility:Mobile malware detection is portable, functioning


across different operating systems.

• Lightweight Design:Portable systems are designed to be lightweight, ensuring


efficient performance on various devices.

• Ecosystem Integration:Portability is achieved through seamless integration with


diverse mobile ecosystems.

7
3.5 Performance

• Real-Time Response:Mobile malware detection systems offer swift, real-time


response for threat identification.

• Low Latency and Resource Usage:Optimized performance with low latency and
minimal impact on device resources.

3.6 Software Requirements

• Operating System : Windows 7/8/10

• Language : Python

• IDE : Vs code

3.7 Hardware Requirements

• Processor : Intel Core i5

• Memory : 8GB

• Storage : SSD

• Graphics processing unit(GPU) : NVIDIA GeForce GTX series

3.8 Objectives

The basic objective of developing this project is: ”In the vibrant digital gastronomy
landscape, we employ logistic and random forest regression to predict recipes that
magnetize online audiences, enhancing strategic content curation for website
managers.”

8
3.9 Conclusion

In conclusion, the analysis of heart stroke prediction models underscores the progress
made in leveraging advanced methodologies, particularly machine learning
techniques, to enhance predictive accuracy. The exploration of diverse datasets,
including electronic health records and wearable device data, signifies a commitment
to comprehensive risk assessment. However, challenges persist in terms of
interpretability and generalizability across varied populations. The findings suggest
that ongoing efforts should focus on refining models for practical implementation in
real-world healthcare settings, considering the importance of transparency and
adaptability to diverse patient profiles.

9
Chapter 4

SYSTEM ARCHITECTURE

Figure 4.1: System Architecture

• In the first step, the dataset containing Android app data is input.

• Following that, the dataset undergoes data preprocessing, including shuffling,


class balancing, and column dropping.

• Feature selection techniques, such as SelectKBest, are applied to identify relevant


features.

• A Decision Tree Classifier is trained using the selected features and preprocessed
data.

10
• The trained model’s performance is evaluated using a separate test dataset, with
accuracy calculated and a confusion matrix generated.

• The confusion matrix is then visualized using a heatmap.

• User interaction comes into play, as the user triggers a request to view the
confusion matrix results.

• Finally, the system displays the heatmap of the confusion matrix to the user

4.1 System

The system, in this context, refers to the set of code and libraries designed to perform the
task of malware prediction based on the provided dataset. It includes several modules,
functionalities, and components working together to achieve the following tasks:

1. Data Preprocessing: The system performs data preprocessing to clean and prepare
the dataset for model training. This step may involve shuffling the data to avoid biases,
handling class imbalance, and selecting relevant features.

2. Model Training: The system utilizes the Decision Tree Classifier to train a machine
learning model on the preprocessed dataset. During training, the model learns patterns
and relationships between features to make predictions.

3. Model Prediction: After model training, the system uses the trained classifier to
predict the target class labels for unseen data (the test set). The model’s predictions are
compared with the actual labels to evaluate its accuracy and performance.

4. Model Evaluation: The system calculates various evaluation metrics, such as


accuracy, precision, recall, and F1-score, to quantify the model’s effectiveness in
predicting malware presence.

5. Visualization: To aid user understanding, the system generates visualizations using


libraries like ‘matplotlib‘ and ‘seaborn‘. These visualizations include count plots, bar
plots for feature importance, and a heatmap for the confusion matrix.

11
6. Feedback and Reporting: The system provides feedback and reports on the model’s
performance to the user. This feedback allows the user to make informed decisions
about the model’s reliability and suitability for the malware prediction task.

7. Warnings and Error Handling: The system employs the ‘warnings‘ module to manage
warning messages during execution. It handles any potential errors gracefully to ensure
smooth code execution.

4.2 User

1. Data Collection: The user may be responsible for collecting the dataset required
for the malware prediction task. This dataset could contain various features related to
Android applications, including performance metrics and behavior characteristics.

2. Data Preparation: Before feeding the data into the system, the user may preprocess
and clean the dataset to ensure it is in a suitable format for analysis and model training.
This step may involve handling missing values, encoding categorical variables, and
scaling numerical features.

3. Input and Interaction: The user interacts with the system by initiating the execution
of the provided code. This may involve running the Python script using an IDE like
Visual Studio Code or executing it via the command line.

4. Monitoring and Feedback: During the execution of the code, the user may monitor
the progress and receive feedback, such as the status of data loading, model training,
and evaluation metrics. The feedback provided by the system allows the user to assess
the effectiveness of the model in predicting malware presence.

5. Interpretation of Results: After the code execution, the user interprets the results
and visualizations generated by the system. This includes analyzing metrics such as
accuracy, precision, recall, and the confusion matrix to evaluate the model’s
performance

12
4.3 Conclusion

In conclusion, the Android malware prediction system offers a meticulously designed


and user-centric framework, guiding through dataset input, preprocessing, feature
selection, and model training with a Decision Tree Classifier. The subsequent
evaluation phase, featuring accuracy metrics and a visualized confusion matrix,
ensures a comprehensive understanding of the model’s performance. The system’s
user interaction capability, allowing users to request and visualize confusion matrix
results, enhances its practical utility.

13
Chapter 5

DESIGN

Design is the abstraction of a solution it is a general description of the solution to a


problem without the details. Design is view patterns seen in the analysis phase to be a
pattern in a design phase. After design phase we can reduce the time required to create
the implementation.
A UML diagram is a diagram based on the UML (Unified Modeling Language)
with the purpose of visually representing a system along with its main actors, roles,
actions, artifacts or classes, in order to better understand, alter, maintain, or document
information about the system.

What is UML?
UML is an acronym that stands for Unified Modelling Language. Simply put, UML
is[7] a modern approach to modelling and documenting software. In fact, it’s one of
the most popular business process modelling techniques.
It is based on diagrammatic representations of software components. As the old
proverb says: “a picture is worth a thousand words”.

Building Blocks of the UML: The vocabulary of the UML encompasses three kinds
of building blocks.

• Things: Things are the abstractions that are first-class citizens in a model

• Relationships: ; relationships tie these things together

• Diagrams: diagrams group interesting collections of things

14
5.1 Use Case Diagram

Use case diagrams are a set of use cases, actors, and their relationships. They represent
the use case view of a system.
A use case represents a particular functionality of a system. Hence, use case
diagram is used to describe the relationships among the functionalities and their
internal/external controllers. These controllers are known as actors. In this project,
faculty and student are the actors.

Figure 5.1: Usecase Diagram

15
5.2 Activity Diagram

Activity diagrams are used to document workflows in a system, from the business level
down to the operational level. The general purpose of Activity diagrams is to focus on
flows driven by internal processing vs. external events.
Activities are nothing but the functions of a system. Numbers of activity
diagrams are prepared to capture the entire flow in a system.

Figure 5.2: Activity Diagram

16
5.3 Class Diagram

Class diagram is a static diagram. It represents the static view of an application. Class
diagram is not only used for visualizing, describing, and documenting different aspects
of a system but also for constructing executable code of the software application.
Class diagram describes the attributes and operations of a class and also the
constraints imposed on the system. The class diagrams are widely used in the modelling
of object-oriented systems because they are the only UML diagrams, which can be
mapped directly with object-oriented languages.Class diagram shows a collection of
classes, interfaces, associations, collaborations, and constraints. It is also known as a
structural diagram.

Figure 5.3: Class Diagram

17
5.4 Sequence Diagram

A sequence diagram, a fundamental component in Unified Modeling Language


(UML), provides a dynamic view of a system by illustrating the chronological flow of
interactions among objects or components. In this diagram, lifelines represent
individual instances of classes or components, depicted as vertical dashed lines. The
focus of control is depicted within this space, highlighting when an object is actively
processing a message.
Self-invocation occurs when an object sends a message to itself, indicated by a
loop on the lifeline. Return messages show responses to prior messages, and combined
fragments represent conditional or iterative behavior.

Figure 5.4: Sequence Diagram

18
Chapter 6

IMPLEMENTATION

This website helps to conduct examinations to students. It stores all the questions in
the database in separate modules divided according to the subject, topic and difficulty
of questions i.e. easy, medium, hard. Input for number of questions and topics must
be given. Question paper will be generated according to the input. It picks questions
randomly from the database by including the combination of all levels of difficulty.
Questions once generated in one question paper will not be repeated in the other. Results
will be evaluated.
Admin handles the database and faculty.

6.1 Code

• Train the data

• Test the data

• Creating the model

• Find the accuracy

• Printing the results

19
6.2 Mobile Malware Detection

6.2.1 Classificaion

1 from sklearn . ensemble import R a n d o m F o r e s t C l a s s i f i e r
2 from sklearn . svm import SVC
3 from sklearn . metrics import accuracy_score , precision_score
, recall_score
4 from sklearn . svm import LinearSVC
5 from plt import plotting
6 import warnings
7 from sklearn . linear_model import Lo gis ti cR eg re ss io n
8 from sklearn . neighbors import K N e i g h b o r s C l a s s i f i e r
9 from sklearn . tree import D e c i s i o n T r e e C l a s s i f i e r
10 from sklearn . ensemble import G r a d i e n t B o o s t i n g C l a s s i f i e r
11 import random
12 from utils import export_to_json
13 from tqdm import tqdm
14 from setting import config
15 warnings . simplefilter ( action = ’ ignore ’ , category =
FutureWarning )
16

17

18 def t r a i n _ m o d e l G r a d i e n t B o o s t i n g C l a s s i f i e r ( X_train , y_train ,


n_estimators , learning_rate , random_state_val ) :
19 """
20 This function trains a Gradient Boosting classifier
model on the input training data .
21

22 Inputs :
23 X_train ( numpy . ndarray ) : The feature values for the
training data .
24 y_train ( numpy . ndarray ) : The labels for the
training data .
25 n_estimators ( int ) : The number of decision trees in
the ensemble .
26 learning_rate ( float ) : The learning rate of the
algorithm .
27 random_state_val ( int ) : The seed used by the random
number generator .
28

29 Returns :
30 model ( sklearn . ensemble . G r a d i e n t B o o s t i n g C l a s s i f i e r )
: The trained Gradient Boosting model .
31 """

20
32

33 # Initialize the model


34 model = G r a d i e n t B o o s t i n g C l a s s i f i e r (
35 n_estimators = n_estimators , learning_rate =
learning_rate , random_state = random_state_val )
36 model . fit ( X_train , y_train )
37 return model
38

39

40 def t r a i n _ m o d e l D e c i s i o n T r e e C l a s s i f i e r ( X_train , y_train ,


max_depth , random_state_val ) :
41 """
42 This function trains a Decision Tree classifier
model on the input training data .
43

44 Inputs :
45 X_train ( numpy . ndarray ) : The feature values for the
training data .
46 y_train ( numpy . ndarray ) : The labels for the
training data .
47 max_depth ( int ) : The maximum depth of the tree .
48 random_state_val ( int ) : The seed used by the random
number generator .
49

50 Returns :
51 model ( sklearn . tree . D e c i s i o n T r e e C l a s s i f i e r ) : The
trained Decision Tree model .
52 """
53

54 # Initialize the model


55 model = D e c i s i o n T r e e C l a s s i f i e r (
56 max_depth = max_depth , random_state = random_state_val )
57 model . fit ( X_train , y_train )
58 return model
59

60

61 def t r a i n _ m o d e l K N e i g h b o r s C l a s s i f i e r ( X_train , y_train ,


n_neighbors ) :
62 """
63 This function trains a KNN classifier model on the
input training data .
64

65 Inputs :
66 X_train ( numpy . ndarray ) : The feature values for the
training data .

21
67 y_train ( numpy . ndarray ) : The labels for the
training data .
68 n_neighbors ( int ) : The number of nearest neighbors
used for the classification .
69

70 Returns :
71 model ( sklearn . neighbors . K N e i g h b o r s C l a s s i f i e r ) : The
trained KNN model .
72 """
73

74 # Initialize the model


75 model = K N e i g h b o r s C l a s s i f i e r ( n_neighbors = n_neighbors )
76 model . fit ( X_train , y_train )
77 return model
78

79

80 def t r a i n _ m o d e l L o g i s t i c R e g r e s s i o n ( X_train , y_train , C ,


epsilon , random_state_val ) :
81 """
82 This function trains a Sec - SVM classifier model on
the input training data .
83

84 Inputs :
85 X_train ( numpy . ndarray ) : The feature values for the
training data .
86 y_train ( numpy . ndarray ) : The labels for the
training data .
87 C ( float ) : The regularization parameter .
88 epsilon ( float ) : A small constant used to determine
when to stop the training .
89

90 Returns :
91 model ( sklearn . linear_model . Lo gi st ic Re gr es si on ) :
The trained Sec - SVM model .
92 """
93

94 # Initialize the model


95 model = Lo gi st ic Re gr es si on (
96 C =C , penalty = ’ l2 ’ , random_state = random_state_val ,
tol = epsilon )
97 model . fit ( X_train , y_train )
98 return model
99

100

101 def t r a i n _ m o d e l L i n e a r S V C ( X_train , y_train , C , epsilon ,


random_state_val ) :

22
102 """
103 This function trains a Sec - SVM classifier model on
the input training data .
104

105 Inputs :
106 X_train ( numpy . ndarray ) : The feature values for the
training data .
107 y_train ( numpy . ndarray ) : The labels for the
training data .
108 C ( float ) : The regularization parameter .
109 epsilon ( float ) : A small constant used to determine
when to stop the training .
110

111 Returns :
112 model ( sklearn . svm . LinearSVC ) : The trained Sec - SVM
model .
113 """
114

115 # Initialize the model


116 model = LinearSVC ( C =C , random_state = random_state_val ,
tol = epsilon )
117 model . fit ( X_train , y_train )
118 return model
119

120

121 def evaluate_model ( model , X_test , y_test , modelName ) :


122 """
123 This function evaluates a model on the test data .
124

125 Inputs :
126 model ( sklearn . svm . LinearSVC ) : The trained model .
127 X_test ( numpy . ndarray ) : The feature values for the
test data .
128 y_test ( numpy . ndarray ) : The labels for the test
data .
129

130 Returns :
131 accuracy ( float ) : The accuracy score of the model .
132 precision ( float ) : The precision score of the model
.
133 recall ( float ) : The recall score of the model .
134 """
135 # Predict the labels on the test data using the model
136 y_pred = model . predict ( X_test )
137

138 # Compute the accuracy , precision , and recall scores

23
139 accuracy = accuracy_score ( y_test , y_pred )
140 precision = precision_score ( y_test , y_pred )
141 recall = recall_score ( y_test , y_pred )
142 # plotting ( y_pred [:100] , y_test [:100] , modelName )
143

144 return accuracy , precision , recall


145

146

147 def classify_apk ( model , apk_features ) :


148 """
149 This function makes a prediction for a new APK by
using the trained model to classify the APK based on its
features .
150

151 Inputs :
152 model ( sklearn . svm . LinearSVC ) : The trained model .
153 apk_features ( numpy . ndarray ) : The feature values
for the new APK .
154

155 Returns :
156 label ( int ) : The predicted label for the new APK .
157 """
158 return model . predict ( apk_features )
 

Training Functions

• Gradient Boosting Classifier: train model Gradient Boosting Classifier trains a


model using gradient boosting.

• Decision Tree Classifier: train model Decision Tree Classifier trains a decision
tree model. things together

• K-Nearest Neighbors Classifier: train model KNeighbors Classifier trains a


KNN model.

• Logistic Regression: train model Logistic Regression trains a logistic regression


model.

• Linear SVM: train model LinearSVC trains a linear support vector machine.

• Evaluate Model: evaluate model computes accuracy, precision, and recall for a
trained model on test data.

24
6.2.2 Counting Malacious And Benign Apps

1 import json
2 from setting import config
3 from utils import export_to_json , exportToTextFile ,
sort _by_ma liciou s
4

6 def c o u n t _ m a l i c i o u s _ a n d _ b e n i g n () :
7 json_file = config [ ’ apksResultJsonPath ’]
8 resultPath = ’{ path }/ Features /
countMaliciousAndBenignFeatures - Output . txt ’. format (
9 path = config [ ’ resultApksPath ’])
10 with open ( json_file , " r ") as f :
11 data = json . load ( f )
12 arr = []
13 result = {}
14 for app in data :
15 for key , value in app . items () :
16 if key == " label ":
17 continue
18 if key not in result :
19 result [ key ] = {" malicious ": 0 , " benign ": 0}
20 if app [" label "] == 1:
21 result [ key ][" malicious "] += 1
22 else :
23 result [ key ][" benign "] += 1
24

25 orderLst = sort _by_ma liciou s ( result )


26 export_to_json ( orderLst , resultPath )
27 return result
28

29

30 c o u n t _ m a l i c i o u s _ a n d _ b e n i g n ()
 

The code defines a function count malicious and benign that processes a JSON file,
specified in a configuration setting, to count occurrences of features in malicious versus
benign applications. It reads the JSON data, initializes a result dictionary to track counts
of each feature, and iterates over each application’s data. For each feature (excluding the
”label”), it increments counters based on whether the application is labeled as malicious
or benign. The features are then sorted by the number of malicious occurrences using
a utility function, and the sorted data is exported to a specified output path in JSON
format. The function returns the final counts of features categorized by their occurrence
in malicious and benign apps.

25
6.2.3 Extracting Best Results

1 from setting import config
2 from utils import r e a d _ i n f o r m a t i o n _ f r o m _ r e s u l t _ f r o m _ m o d e l s
3

4 import warnings
5 from glob import glob
6

7 warnings . simplefilter ( action = ’ ignore ’ , category =


FutureWarning )
8

10 def main () :
11

12 path = config [ ’ apksResultJsonPath ’]


13 resultApksPath = config [ ’ resultApksPath ’]
14 resultModelsPath = f ’{ resultApksPath }/ resultModels ’
15 files = glob ( resultModelsPath + ’/*. json ’)
16

17 data = r e a d _ i n f o r m a t i o n _ f r o m _ r e s u l t _ f r o m _ m o d e l s ( files ,
resultModelsPath , resultApksPath )
18

19

20 if __name__ == ’ __main__ ’:
21 main ()
 

The provided script reads and processes JSON files containing model results for
Android application packages (APKs). It begins by importing necessary configuration
settings and utility functions, along with suppressing future warnings to keep the
output clean. The main function sets up paths using configuration settings to locate the
JSON files within the specified directory (resultModelsPath). It then uses the glob
module to gather all JSON files in that directory. The collected files and relevant paths
are passed to the read information from result from models utility function for further
processing. The script ensures that the main function is executed when the script is run
directly, facilitating organized and clean data processing of model results for APKs.

26
6.2.4 Main code

1 from sklearn . exceptions import Con ve rg en ce Wa rn in g

2 from sklearn . model_selection import train_test_split


3 from setting import config
4 from utils import get_random_number , load_data , count_apps ,
printResults , export_to_json ,
read_information_from_result_from_models
5 from c l a s s i f i c a t i o n _ u t i l s import preprocess_data ,
print_best_model
6 from classification import
t ra in _ mo d el Gr a di en t Bo o st in g Cl a ss if i er , evaluate_model ,\
7 train_modelDecisionTreeClassifier ,
train_modelKNeighborsClassifier ,\
8 train_modelLogisticRegression , train_modelLinearSVC ,
trainAndEval
9 import warnings
10 import os
11 from glob import glob
12 from tqdm import tqdm
13 import json
14

15 warnings . simplefilter ( action = ’ ignore ’ , category =


FutureWarning )
16 warnings . simplefilter ( action = ’ ignore ’ , category =
Co nv er ge nc eW ar ni ng )
17

18 # need to
19 global c_val , epsilon_val , test_size_val , random_state_val
20 global c_val_max , epsilon_val_max , r a n d o m _ s t a t e _ v a l _ m a x
21 global accuracy_max , precision_max , recall_max
22 global num_benign_apps , num _m al ic io us _a pp s
23

24

25 def main () :
26 """"
27 This is the main function that executes the entire
process of loading the data , preprocessing it ,
28 training a model , and evaluating its performance . It
also allows for the classification of a new APK .
29

30 Inputs :
31 None .
32 Returns :
33 None .
34 """
35

27
36 path = config [ ’ apksResultJsonPath ’]
37 resultApksPath = config [ ’ resultApksPath ’]
38 resultModelsPath = f ’{ resultApksPath }/ resultModels ’
39 # # Load data
40 df , malicious_count , benign_count = load_data (
41 path )
42

43 df = df . rename ( columns ={ ’ label ’: ’ class ’})


44

45 # Split data into features and labels


46 X = df . drop ( columns =[ ’ class ’])
47 y = df [ ’ class ’]
48

49 # Preprocess data
50 X_scaled , y_encoded = preprocess_data (X , y )
51

52 # Optimization for results


53 test_size = [0.1 , 0.2 , 0.3]
54 results = []
55

56 # for size in test_size :


57 for idx in tqdm ( range (0 , len ( test_size ) ) , total = len (
test_size ) ,
58 desc = f " Run on : [0.1 , 0.2 , 0.3]") :
59 # print (f ’ Run with test size of { test_size [ idx ]}\ n
’)
60

61 # Split data into training and test sets


62 X_train , X_test , y_train , y_test = train_test_split
(
63 X_scaled , y_encoded , test_size = test_size [ idx ] ,
shuffle = True )
64

65 # Train model
66 r es u l ts T r ai n A n dT e s t = trainAndEval ( X_train , X_test ,
y_train ,
67 y_test ,
test_size [ idx ])
68

69 results . append ({ test_size [ idx ]: [


r e su l t sT r a in A n dT e s t ]})
70

71 path = f ’{ resultApksPath }/ result . json ’


72 export_to_json ( results , path )
73

74

28
75 if __name__ == ’ __main__ ’:
76 main ()
 

main code This is the main function that executes the entire process of loading the data,
preprocessing it, training a model, and evaluating its performance. It also allows for
the classification of a new APK.Dataset consisting of feature vectors of 215 attributes
extracted from 1817 applications (165 malware apps from Drebin project and 1652
benign apps). The dataset has been used to develop and evaluate multilevel classifier
fusion approach for Android malware detection, published in the IEEE Transactions on
Cybernetics paper ’DroidFusion: A Novel Multilevel Classifier Fusion Approach for
Android Malware Detection. The supporting file contains the description of the feature
vectors/attributes obtained via static code analysis of the Android apps.

29
6.2.5 Plot

1 import json
2 import matplotlib . pyplot as plt
3 import warnings
4 warnings . simplefilter ( action = ’ ignore ’ , category =
FutureWarning )
5

7 def plotting ( y_pred , y_test , modelName ) :


8 plt . title ( modelName )
9 plt . plot ( y_pred , " r " , label =" predicted ")
10 plt . plot ( y_test , " b " , label =" expected ")
11 plt . show ()
 

The provided code snippet introduces a function, plotting, to visualize model


predictions versus actual values using Matplotlib. It includes imports for handling
JSON, plotting, and suppressing future warnings to avoid non-critical warning
messages cluttering the output. The plotting function takes in predicted labels (yPred),
actual labels (yTest), and a model name (modelName). It generates a plot with the
model name as the title, displaying predicted values in red and actual values in blue,
and includes a legend to differentiate between the two. This visualization aids in
comparing the model’s predictions against the actual outcomes. The function is
integrated into the evaluate model function, which computes and returns accuracy,
precision, and recall metrics, and then uses plotting to graphically represent the first
100 predictions and actual values, enhancing the evaluation process by providing a
visual assessment of model performanc

30
6.3 Input

Dataset consisting of feature vectors of 215 attributes extracted from 1817 applications
(165 malware apps from Drebin project and 1652 benign apps). The dataset has been
used to develop and evaluate multilevel classifier fusion approach for Android
malware detection, published in the IEEE Transactions on Cybernetics paper
’DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware
Detection. The supporting file contains the description of the feature vectors/attributes
obtained via static code analysis of the Android apps.

31
transact API call signature
onServiceConnected API call signature
bindService API call signature
attachInterface API call signature
ServiceConnection API call signature
android.os.Binder API call signature
SEND_SMS Manifest Permission
Ljava.lang.Class.getCanonicalName API call signature
Ljava.lang.Class.getMethods API call signature
Ljava.lang.Class.cast API call signature
Ljava.net.URLDecoder API call signature
android.content.pm.Signature API call signature
android.telephony.SmsManager API call signature
READ_PHONE_STATE Manifest Permission
getBinder API call signature
ClassLoader API call signature
Landroid.content.Context.registerReceiver API call signature
Ljava.lang.Class.getField API call signature
Landroid.content.Context.unregisterReceiver API call signature
GET_ACCOUNTS Manifest Permission
RECEIVE_SMS Manifest Permission
Ljava.lang.Class.getDeclaredField API call signature
READ_SMS Manifest Permission
getCallingUid API call signature
Ljavax.crypto.spec.SecretKeySpec API call signature
android.intent.action.BOOT_COMPLETED Intent
USE_CREDENTIALS Manifest Permission
MANAGE_ACCOUNTS Manifest Permission
android.content.pm.PackageInfo API call signature
KeySpec API call signature
TelephonyManager.getLine1Number API call signature
DexClassLoader API call signature
HttpGet.init API call signature
SecretKey API call signature
Ljava.lang.Class.getMethod API call signature
System.loadLibrary API call signature
android.intent.action.SEND API call signature
Ljavax.crypto.Cipher API call signature
WRITE_SMS Manifest Permission
READ_SYNC_SETTINGS Manifest Permission
AUTHENTICATE_ACCOUNTS Manifest Permission
android.telephony.gsm.SmsManager API call signature
WRITE_HISTORY_BOOKMARKS Manifest Permission
TelephonyManager.getSubscriberId API call signature
mount Commands signature
INSTALL_PACKAGES Manifest Permission
Runtime.getRuntime API call signature
CAMERA Manifest Permission
Ljava.lang.Object.getClass API call signature
WRITE_SYNC_SETTINGS Manifest Permission
READ_HISTORY_BOOKMARKS Manifest Permission
Ljava.lang.Class.forName API call signature
INTERNET Manifest Permission
android.intent.action.PACKAGE_REPLACED Intent
Binder API call signature
android.intent.action.SEND_MULTIPLE Intent
RECORD_AUDIO Manifest Permission
IBinder API call signature
android.os.IBinder API call signature
createSubprocess API call signature
NFC Manifest Permission
ACCESS_LOCATION_EXTRA_COMMANDS Manifest Permission
URLClassLoader API call signature
WRITE_APN_SETTINGS Manifest Permission
abortBroadcast API call signature
BIND_REMOTEVIEWS Manifest Permission
android.intent.action.TIME_SET Intent
READ_PROFILE Manifest Permission
TelephonyManager.getDeviceId API call signature
MODIFY_AUDIO_SETTINGS Manifest Permission
getCallingPid API call signature
READ_SYNC_STATS Manifest Permission
BROADCAST_STICKY Manifest Permission
android.intent.action.PACKAGE_REMOVED Intent
android.intent.action.TIMEZONE_CHANGED Intent
WAKE_LOCK Manifest Permission
RECEIVE_BOOT_COMPLETED Manifest Permission
RESTART_PACKAGES Manifest Permission
Ljava.lang.Class.getPackage API call signature
chmod Commands signature
Ljava.lang.Class.getDeclaredClasses API call signature
android.intent.action.ACTION_POWER_DISCONNECTED Intent
android.intent.action.PACKAGE_ADDED Intent
PathClassLoader API call signature
TelephonyManager.getSimSerialNumber API call signature
Runtime.load API call signature
TelephonyManager.getCallState API call signature
BLUETOOTH Manifest Permission
READ_CALENDAR Manifest Permission
READ_CALL_LOG Manifest Permission
SUBSCRIBED_FEEDS_WRITE Manifest Permission
READ_EXTERNAL_STORAGE Manifest Permission
TelephonyManager.getSimCountryIso API call signature
sendMultipartTextMessage API call signature
PackageInstaller API call signature
VIBRATE Manifest Permission
remount Commands signature
android.intent.action.ACTION_SHUTDOWN Intent
sendDataMessage API call signature
ACCESS_NETWORK_STATE Manifest Permission
chown Commands signature
HttpPost.init API call signature
Ljava.lang.Class.getClasses API call signature
SUBSCRIBED_FEEDS_READ Manifest Permission
TelephonyManager.isNetworkRoaming API call signature
CHANGE_WIFI_MULTICAST_STATE Manifest Permission
WRITE_CALENDAR Manifest Permission
android.intent.action.PACKAGE_DATA_CLEARED Intent
MASTER_CLEAR Manifest Permission
HttpUriRequest API call signature
UPDATE_DEVICE_STATS Manifest Permission
WRITE_CALL_LOG Manifest Permission
DELETE_PACKAGES Manifest Permission
GET_TASKS Manifest Permission
GLOBAL_SEARCH Manifest Permission
DELETE_CACHE_FILES Manifest Permission
WRITE_USER_DICTIONARY Manifest Permission
android.intent.action.PACKAGE_CHANGED Intent
android.intent.action.NEW_OUTGOING_CALL Intent
REORDER_TASKS Manifest Permission
WRITE_PROFILE Manifest Permission
SET_WALLPAPER Manifest Permission
BIND_INPUT_METHOD Manifest Permission
divideMessage API call signature
READ_SOCIAL_STREAM Manifest Permission
READ_USER_DICTIONARY Manifest Permission
PROCESS_OUTGOING_CALLS Manifest Permission
CALL_PRIVILEGED Manifest Permission
Runtime.exec API call signature
BIND_WALLPAPER Manifest Permission
RECEIVE_WAP_PUSH Manifest Permission
DUMP Manifest Permission
BATTERY_STATS Manifest Permission
ACCESS_COARSE_LOCATION Manifest Permission
SET_TIME Manifest Permission
android.intent.action.SENDTO Intent
WRITE_SOCIAL_STREAM Manifest Permission
WRITE_SETTINGS Manifest Permission
REBOOT Manifest Permission
BLUETOOTH_ADMIN Manifest Permission
TelephonyManager.getNetworkOperator API call signature
/system/bin Commands signature
MessengerService API call signature
BIND_DEVICE_ADMIN Manifest Permission
WRITE_GSERVICES Manifest Permission
IRemoteService API call signature
KILL_BACKGROUND_PROCESSES Manifest Permission
SET_ALARM API call signature
ACCOUNT_MANAGER API call signature
/system/app Commands signature
android.intent.action.CALL Intent
STATUS_BAR Manifest Permission
TelephonyManager.getSimOperator API call signature
PERSISTENT_ACTIVITY Manifest Permission
CHANGE_NETWORK_STATE Manifest Permission
onBind API call signature
Process.start API call signature
android.intent.action.SCREEN_ON Intent
Context.bindService API call signature
RECEIVE_MMS Manifest Permission
SET_TIME_ZONE Manifest Permission
android.intent.action.BATTERY_OKAY Intent
CONTROL_LOCATION_UPDATES Manifest Permission
BROADCAST_WAP_PUSH Manifest Permission
BIND_ACCESSIBILITY_SERVICE Manifest Permission
ADD_VOICEMAIL Manifest Permission
CALL_PHONE Manifest Permission
ProcessBuilder API call signature
BIND_APPWIDGET Manifest Permission
FLASHLIGHT Manifest Permission
READ_LOGS Manifest Permission
Ljava.lang.Class.getResource API call signature
defineClass API call signature
SET_PROCESS_LIMIT Manifest Permission
android.intent.action.PACKAGE_RESTARTED Intent
MOUNT_UNMOUNT_FILESYSTEMS Manifest Permission
BIND_TEXT_SERVICE Manifest Permission
INSTALL_LOCATION_PROVIDER Manifest Permission
android.intent.action.CALL_BUTTON Intent
android.intent.action.SCREEN_OFF Intent
findClass API call signature
SYSTEM_ALERT_WINDOW Manifest Permission
MOUNT_FORMAT_FILESYSTEMS Manifest Permission
CHANGE_CONFIGURATION Manifest Permission
CLEAR_APP_USER_DATA Manifest Permission
intent.action.RUN Intent
android.intent.action.SET_WALLPAPER Intent
CHANGE_WIFI_STATE Manifest Permission
READ_FRAME_BUFFER Manifest Permission
ACCESS_SURFACE_FLINGER Manifest Permission
Runtime.loadLibrary API call signature
BROADCAST_SMS Manifest Permission
EXPAND_STATUS_BAR Manifest Permission
INTERNAL_SYSTEM_WINDOW Manifest Permission
android.intent.action.BATTERY_LOW Intent
SET_ACTIVITY_WATCHER Manifest Permission
WRITE_CONTACTS Manifest Permission
android.intent.action.ACTION_POWER_CONNECTED Intent
BIND_VPN_SERVICE Manifest Permission
DISABLE_KEYGUARD Manifest Permission
ACCESS_MOCK_LOCATION Manifest Permission
GET_PACKAGE_SIZE Manifest Permission
MODIFY_PHONE_STATE Manifest Permission
CHANGE_COMPONENT_ENABLED_STATE Manifest Permission
CLEAR_APP_CACHE Manifest Permission
SET_ORIENTATION Manifest Permission
READ_CONTACTS Manifest Permission
DEVICE_POWER Manifest Permission
HARDWARE_TEST Manifest Permission
ACCESS_WIFI_STATE Manifest Permission
WRITE_EXTERNAL_STORAGE Manifest Permission
ACCESS_FINE_LOCATION Manifest Permission
SET_WALLPAPER_HINTS Manifest Permission
SET_PREFERRED_APPLICATIONS Manifest Permission
WRITE_SECURE_SETTINGS Manifest Permission
class B=Benign; S=Malware
6.4 Output

37
6.5 Testing

We all have to agree that in today’s ever-changing and competitive world, the internet
has become an integral part of our lives. Most of us make our decisions by searching
the information on the internet these days, hence hosting a website is no[9] longer
optional but mandatory for all kind of businesses. It is the first step in becoming and
staying relevant in the market.
Just having a website is not enough. An organization is needed to develop a
website that is informative, accessible and user-friendly. To maintain all these
qualities, the website should be well tested, and this process of testing a website is
known as web testing.

What Is Web Testing?


Web testing is a software testing practice to test websites or web applications for
potential bugs. It’s a complete testing of web-based applications before making live. A
web-based system needs to be checked completely from end-to-end before it goes live
for end users.By performing website testing, an organization can make sure that the
web-based system is functioning properly and can be accepted by real-time users. The
UI design and functionality are the captains of website testing.

Web Testing Checklists


1) Functionality testing
2) Usability testing
3) Interface testing
4) Security testing

6.5.1 Functionality Testing

Test for – all the links in web pages, database connection, forms used for submitting or
getting information from the user in the web pages.

Check all the links:

• Test the outgoing links from all the pages to the specific domain under test.

• Test all internal links.

38
• Test links jumping on the same pages.

• Test links used to send email to admin or other users from web pages.

• Test to check if there are any orphan pages.

• Finally, link checking includes, check for broken links in all the above-mentioned
links.

Test forms on all pages:

Forms are an integral part of any website. Forms are used for receiving information
from users and to interact with them. So what should be checked in these forms?

• First, check all the validations on each field.

• Check for default values of the fields.

• Wrong inputs in the forms to the fields in the forms.

• Options to create forms if any, form delete, view or modify the forms.

Validate your HTML/CSS:

If you are optimizing your site for Search engines then HTML/CSS validation is the
most important one. Mainly validate the site for HTML syntax errors.

Database Testing:

Data consistency is also very important in a web application. Check for data integrity
and errors while you edit, delete, modify the forms or do any DB related functionality.
Check if all the database queries are executing correctly, data is retrieved and also
updated correctly.

6.5.2 Usability Testing

Usability testing is the process by which the human-computer interaction


characteristics of a system are measured, and weaknesses are identified for correction.

39
Usability Testing includes the following:

• The website should be easy to use.

• The instructions provided should be very clear.

• Check if the instructions provided are perfect to satisfy its purpose.

• The main menu should be provided on each page.

• It should be consistent enough.

6.5.3 Interface Testing

In web testing, the server-side interface should be tested. This is done by verifying that
communication is done properly. Compatibility of the server with software, hardware,
network, and the database should be tested.
The main interfaces are:

• Web server and application server interface

• Application server and Database server interface.

Check if all the interactions between these servers are executed and errors are handled
properly. If the database or web server returns an error message for any query by
application server then the application server should catch and display these error
messages appr

opriately to the users.

6.5.4 Security Testing

The primary reason for testing the security of a web is to identify potential
vulnerabilities and subsequently repair them.
Following are some of the test cases for web security testing:

40
• Test by pasting the internal URL directly into the browser address bar without
login. Internal pages should not open.

• Try some invalid inputs in input fields like login username, password, input text
boxes, etc. Check the system’s reaction to all invalid inputs.

41
CONCLUSION
We conclude that we have built a model which is used to detect the malware in android
applications using machine learning algorithms specifically the Decision Tree
Classifier, to analyze and predict malware presence in Android apps. The project
encompasses data preprocessing, model training, evaluation, and visualization,
providing valuable insights into the performance of the classifier. The visualization of
the confusion matrix heatmap enhances user understanding of model predictions and
aids in assessing the model’s effectiveness. Enhance model accuracy by implementing
ensemble techniques like Random Forests, veraging the strength of multiple models
for better predictions. Improve the Decision Tree Classifier’s performance by
fine-tuning its hyperparameters using techniques like Grid Search or Random Search.
Enable continuous learning by integrating online learning methods, allowing the
model to adapt to new data and enhance accuracy over time

42
REFERENCES

[1] A Survey of Malware Detection in Android Apps:

[2] Intelligent Mobile Malware Detection Using Permission Requests and API Calls:

[3] DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket:

[4] Android Malware Dataset (AndroZoo):

[5] MIT Mobile Security Initiative:

[6] National Institute of Standards and Technology (NIST) Mobile Malware Dataset:

[7] Intelligent Mobile Malware Detection by Xiangliang Zhou and Dongqing Zhang:

43

You might also like