Major Report
Major Report
Bachelor of Technology
in
Computer Science and Engineering
Submitted by
B. Mounika (20SS1A0505)
Ch. Sailaja (20SS1A0509)
S. Sony (20SS1A0545)
J. Raja Sreekanth (20SS1A0523)
Certificate
This is to certify that the Major Project report work entitled “MOBILE MALWARE
DETECTION”through Machine Learning Techniques is a bonafide work carried out
by a team consisting of B. Mounika bearing Roll no.20SS1A0505, Ch. Sailaja
bearing Roll no.20SS1A0509, S. Sony bearing Roll no.20SS1A0545, J. Raja
Sreekanth bearing Roll no.20SS1A0523, in partial fulfillment of the requirements for
the degree of BACHELOR OF TECHNOLOGY in COMPUTER SCIENCE AND
ENGINEERING discipline to Jawaharlal Nehru Technological University Hyderabad
University College of Engineering Sultanpur during the academic year 2023- 2024.
The results embodied in this report have not been submitted to any other University
or Institution for the award of any degree or diploma.
Guide Head
Dr. G. Narsimha Dr. G. Narsimha
Professor and Principal Professor and Principal
EXTERNAL EXAMINER
i
Declaration
B.Mounika (20SS1A0505)
Ch.sailaja (20SS1A0509)
S.Sony (20SS1A0545)
ii
Acknowledgment
We wish to take this opportunity to express our deep gratitude to all those who helped
us in various ways during our Major Project report work. It is our pleasure to
acknowledge the help of all those individuals who were responsible for foreseeing the
successful completion of our Major Project report.
Finally,we express our gratitude with great admiration and respect to our faculty
for their moral support and encouragement throughout the course.
B.Mounika (20SS1A0505)
Ch.sailaja (20SS1A0509)
S.Sony (20SS1A0545)
J.Raja Sreekanth (20SS1A0523)
iii
Contents
Certificate i
Declaration ii
Acknowledgement iii
Abstract viii
List Of Figures ix
1 INTRODUCTION 1
1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4.1 Limitations : . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
iv
1.5 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.6 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 LITERATURE SURVEY 4
2.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 ANALYSIS 6
3.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.1 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.8 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 SYSTEM ARCHITECTURE 10
v
4.1 System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 DESIGN 14
6 IMPLEMENTATION 19
6.1 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2.1 Classificaion . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.2.5 Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
vi
6.4 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.5 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
CONCLUSION 42
REFERENCES 43
vii
Abstract
viii
List of Figures
ix
Chapter 1
INTRODUCTION
The project’s core objective is to construct a machine learning model dedicated to the
detection of Android malware in mobile applications. The methodology involves the
implementation of cutting-edge machine learning techniques, ensuring a reliable and
efficient system for Android malware detection. Overall, the project addresses the
critical need for enhanced security measures within the dynamic landscape of mobile
applications.
1
1.3 Purpose
• Utilize machine learning for early detection of malicious behavior, allowing users
to preemptively address security risks
• Employ decision trees and support vector machines to enhance the accuracy of
malware detection.
The existing system for the project involves analyzing an Android malware dataset
using various data analysis and machine learning techniques. The code provided in the
project loads the dataset, explores the data, handles class imbalance, performs feature
selection, trains a Decision Tree Classifier, and evaluates the model’s performance. It
provides insights into the dataset and predicts malware based on the selected features.
1.4.1 Limitations :
The proposed system aims to enhance the existing project by incorporating additional
features to improve user experience and result interpretation. The proposed system
will include a user-friendly interface where users can easily load their own datasets,
visualize the data, and obtain insights through interactive visualizations. It will also
provide more advanced evaluation metrics and visualization techniques to help users
better understand the model’s performance. Additionally, the proposed system will
allow users to export the results in various formats for further analysis and reporting.
2
Overall, the proposed system aims to provide a more comprehensive and user-centric
approach to Android malware dataset analysis
1.6 Scope
The scope of Android malware detection is vast and encompasses a wide range of
activities, technologies, and research areas aimed at identifying, mitigating, and
preventing malicious software threats targeting Android devices. Here are some key
aspects that fall within the scope of Android malware detection
1.7 Conclusion
3
Chapter 2
LITERATURE SURVEY
4
through an easy-to-use interface In 2012, Sarma et. al. [12] presented a paper in
proceedings ACM named Android Permissions: A Perspective Combining Risks and
Benefits. In this paper, they investigate the feasibility of using both the permissions an
app requests, the category of the app, and what permissions are requested by other
apps in the same category to better inform users whether the risks of installing an app
is commensurate with its expected benefit. In 2012, Kern et. al. [9] presented a paper
in proceedings UBICOMM named Permission Tracking in Android. In this paper, they
have a closer look at permissions that users grant to apps in Android, a wide-spread
operating system for mobile devices like smart phones. They developed tool that
allows users to administer permissions of their applications. They enable users to
allow or deny permissions at any time. In 2012, Yarn et. al. [10] presented a paper in
proceedings ACM named Short paper: enhancing users’ comprehension of android
permissions. In this paper, they propose to help Android users better understand
application permissions through crowdsourcing. In this approach, collections of users
of the same application use our Literature Analysis on Malware Detection 721 tool to
help each other on permission understanding by sharing their permission reviews.
They developed a tool called Droidganger which is constructed using two techniques:
record/replay permission suppression.
2.1 Conclusion
The literature survey on heart stroke prediction reveals a dynamic landscape marked
by the increasing application of advanced machine learning techniques and a focus on
diverse datasets, including electronic health records and wearable device data. The
studies emphasize the ongoing challenges of interpretability and generalization across
diverse populations. Despite advancements, there is a continued need for collaborative
interdisciplinary research to address these challenges and further refine predictive
models for heart stroke risk assessment, ensuring their practical implementation in
real-world healthcare scenarios.
5
Chapter 3
ANALYSIS
• Behavioral Data: Capture app behavior, including network activities and system
calls.
• Permissions and Features: Analyze app permissions and network traffic for
potential risks.
• Continuous Updates: Regularly update the dataset with the latest threat
intelligence for evolving risks.
3.2 Reliability
• Machine Learning Quality:Quality feature extraction and diverse training data for
machine learning models.
6
• Low False Positives:Minimizing false positives to enhance reliability.
3.3 Availability
Mobile malware detection systems provide continuous monitoring for real-time threat
identification.
Efficient design minimizes resource usage, ensuring mobile malware detection without
hindering device performance.
3.3.1 Constraints
3.4 Portability
7
3.5 Performance
• Low Latency and Resource Usage:Optimized performance with low latency and
minimal impact on device resources.
• Language : Python
• IDE : Vs code
• Memory : 8GB
• Storage : SSD
3.8 Objectives
The basic objective of developing this project is: ”In the vibrant digital gastronomy
landscape, we employ logistic and random forest regression to predict recipes that
magnetize online audiences, enhancing strategic content curation for website
managers.”
8
3.9 Conclusion
In conclusion, the analysis of heart stroke prediction models underscores the progress
made in leveraging advanced methodologies, particularly machine learning
techniques, to enhance predictive accuracy. The exploration of diverse datasets,
including electronic health records and wearable device data, signifies a commitment
to comprehensive risk assessment. However, challenges persist in terms of
interpretability and generalizability across varied populations. The findings suggest
that ongoing efforts should focus on refining models for practical implementation in
real-world healthcare settings, considering the importance of transparency and
adaptability to diverse patient profiles.
9
Chapter 4
SYSTEM ARCHITECTURE
• In the first step, the dataset containing Android app data is input.
• A Decision Tree Classifier is trained using the selected features and preprocessed
data.
10
• The trained model’s performance is evaluated using a separate test dataset, with
accuracy calculated and a confusion matrix generated.
• User interaction comes into play, as the user triggers a request to view the
confusion matrix results.
• Finally, the system displays the heatmap of the confusion matrix to the user
4.1 System
The system, in this context, refers to the set of code and libraries designed to perform the
task of malware prediction based on the provided dataset. It includes several modules,
functionalities, and components working together to achieve the following tasks:
1. Data Preprocessing: The system performs data preprocessing to clean and prepare
the dataset for model training. This step may involve shuffling the data to avoid biases,
handling class imbalance, and selecting relevant features.
2. Model Training: The system utilizes the Decision Tree Classifier to train a machine
learning model on the preprocessed dataset. During training, the model learns patterns
and relationships between features to make predictions.
3. Model Prediction: After model training, the system uses the trained classifier to
predict the target class labels for unseen data (the test set). The model’s predictions are
compared with the actual labels to evaluate its accuracy and performance.
11
6. Feedback and Reporting: The system provides feedback and reports on the model’s
performance to the user. This feedback allows the user to make informed decisions
about the model’s reliability and suitability for the malware prediction task.
7. Warnings and Error Handling: The system employs the ‘warnings‘ module to manage
warning messages during execution. It handles any potential errors gracefully to ensure
smooth code execution.
4.2 User
1. Data Collection: The user may be responsible for collecting the dataset required
for the malware prediction task. This dataset could contain various features related to
Android applications, including performance metrics and behavior characteristics.
2. Data Preparation: Before feeding the data into the system, the user may preprocess
and clean the dataset to ensure it is in a suitable format for analysis and model training.
This step may involve handling missing values, encoding categorical variables, and
scaling numerical features.
3. Input and Interaction: The user interacts with the system by initiating the execution
of the provided code. This may involve running the Python script using an IDE like
Visual Studio Code or executing it via the command line.
4. Monitoring and Feedback: During the execution of the code, the user may monitor
the progress and receive feedback, such as the status of data loading, model training,
and evaluation metrics. The feedback provided by the system allows the user to assess
the effectiveness of the model in predicting malware presence.
5. Interpretation of Results: After the code execution, the user interprets the results
and visualizations generated by the system. This includes analyzing metrics such as
accuracy, precision, recall, and the confusion matrix to evaluate the model’s
performance
12
4.3 Conclusion
13
Chapter 5
DESIGN
What is UML?
UML is an acronym that stands for Unified Modelling Language. Simply put, UML
is[7] a modern approach to modelling and documenting software. In fact, it’s one of
the most popular business process modelling techniques.
It is based on diagrammatic representations of software components. As the old
proverb says: “a picture is worth a thousand words”.
Building Blocks of the UML: The vocabulary of the UML encompasses three kinds
of building blocks.
• Things: Things are the abstractions that are first-class citizens in a model
14
5.1 Use Case Diagram
Use case diagrams are a set of use cases, actors, and their relationships. They represent
the use case view of a system.
A use case represents a particular functionality of a system. Hence, use case
diagram is used to describe the relationships among the functionalities and their
internal/external controllers. These controllers are known as actors. In this project,
faculty and student are the actors.
15
5.2 Activity Diagram
Activity diagrams are used to document workflows in a system, from the business level
down to the operational level. The general purpose of Activity diagrams is to focus on
flows driven by internal processing vs. external events.
Activities are nothing but the functions of a system. Numbers of activity
diagrams are prepared to capture the entire flow in a system.
16
5.3 Class Diagram
Class diagram is a static diagram. It represents the static view of an application. Class
diagram is not only used for visualizing, describing, and documenting different aspects
of a system but also for constructing executable code of the software application.
Class diagram describes the attributes and operations of a class and also the
constraints imposed on the system. The class diagrams are widely used in the modelling
of object-oriented systems because they are the only UML diagrams, which can be
mapped directly with object-oriented languages.Class diagram shows a collection of
classes, interfaces, associations, collaborations, and constraints. It is also known as a
structural diagram.
17
5.4 Sequence Diagram
18
Chapter 6
IMPLEMENTATION
This website helps to conduct examinations to students. It stores all the questions in
the database in separate modules divided according to the subject, topic and difficulty
of questions i.e. easy, medium, hard. Input for number of questions and topics must
be given. Question paper will be generated according to the input. It picks questions
randomly from the database by including the combination of all levels of difficulty.
Questions once generated in one question paper will not be repeated in the other. Results
will be evaluated.
Admin handles the database and faculty.
6.1 Code
19
6.2 Mobile Malware Detection
6.2.1 Classificaion
1 from sklearn . ensemble import R a n d o m F o r e s t C l a s s i f i e r
2 from sklearn . svm import SVC
3 from sklearn . metrics import accuracy_score , precision_score
, recall_score
4 from sklearn . svm import LinearSVC
5 from plt import plotting
6 import warnings
7 from sklearn . linear_model import Lo gis ti cR eg re ss io n
8 from sklearn . neighbors import K N e i g h b o r s C l a s s i f i e r
9 from sklearn . tree import D e c i s i o n T r e e C l a s s i f i e r
10 from sklearn . ensemble import G r a d i e n t B o o s t i n g C l a s s i f i e r
11 import random
12 from utils import export_to_json
13 from tqdm import tqdm
14 from setting import config
15 warnings . simplefilter ( action = ’ ignore ’ , category =
FutureWarning )
16
17
22 Inputs :
23 X_train ( numpy . ndarray ) : The feature values for the
training data .
24 y_train ( numpy . ndarray ) : The labels for the
training data .
25 n_estimators ( int ) : The number of decision trees in
the ensemble .
26 learning_rate ( float ) : The learning rate of the
algorithm .
27 random_state_val ( int ) : The seed used by the random
number generator .
28
29 Returns :
30 model ( sklearn . ensemble . G r a d i e n t B o o s t i n g C l a s s i f i e r )
: The trained Gradient Boosting model .
31 """
20
32
39
44 Inputs :
45 X_train ( numpy . ndarray ) : The feature values for the
training data .
46 y_train ( numpy . ndarray ) : The labels for the
training data .
47 max_depth ( int ) : The maximum depth of the tree .
48 random_state_val ( int ) : The seed used by the random
number generator .
49
50 Returns :
51 model ( sklearn . tree . D e c i s i o n T r e e C l a s s i f i e r ) : The
trained Decision Tree model .
52 """
53
60
65 Inputs :
66 X_train ( numpy . ndarray ) : The feature values for the
training data .
21
67 y_train ( numpy . ndarray ) : The labels for the
training data .
68 n_neighbors ( int ) : The number of nearest neighbors
used for the classification .
69
70 Returns :
71 model ( sklearn . neighbors . K N e i g h b o r s C l a s s i f i e r ) : The
trained KNN model .
72 """
73
79
84 Inputs :
85 X_train ( numpy . ndarray ) : The feature values for the
training data .
86 y_train ( numpy . ndarray ) : The labels for the
training data .
87 C ( float ) : The regularization parameter .
88 epsilon ( float ) : A small constant used to determine
when to stop the training .
89
90 Returns :
91 model ( sklearn . linear_model . Lo gi st ic Re gr es si on ) :
The trained Sec - SVM model .
92 """
93
100
22
102 """
103 This function trains a Sec - SVM classifier model on
the input training data .
104
105 Inputs :
106 X_train ( numpy . ndarray ) : The feature values for the
training data .
107 y_train ( numpy . ndarray ) : The labels for the
training data .
108 C ( float ) : The regularization parameter .
109 epsilon ( float ) : A small constant used to determine
when to stop the training .
110
111 Returns :
112 model ( sklearn . svm . LinearSVC ) : The trained Sec - SVM
model .
113 """
114
120
125 Inputs :
126 model ( sklearn . svm . LinearSVC ) : The trained model .
127 X_test ( numpy . ndarray ) : The feature values for the
test data .
128 y_test ( numpy . ndarray ) : The labels for the test
data .
129
130 Returns :
131 accuracy ( float ) : The accuracy score of the model .
132 precision ( float ) : The precision score of the model
.
133 recall ( float ) : The recall score of the model .
134 """
135 # Predict the labels on the test data using the model
136 y_pred = model . predict ( X_test )
137
23
139 accuracy = accuracy_score ( y_test , y_pred )
140 precision = precision_score ( y_test , y_pred )
141 recall = recall_score ( y_test , y_pred )
142 # plotting ( y_pred [:100] , y_test [:100] , modelName )
143
146
151 Inputs :
152 model ( sklearn . svm . LinearSVC ) : The trained model .
153 apk_features ( numpy . ndarray ) : The feature values
for the new APK .
154
155 Returns :
156 label ( int ) : The predicted label for the new APK .
157 """
158 return model . predict ( apk_features )
Training Functions
• Decision Tree Classifier: train model Decision Tree Classifier trains a decision
tree model. things together
• Linear SVM: train model LinearSVC trains a linear support vector machine.
• Evaluate Model: evaluate model computes accuracy, precision, and recall for a
trained model on test data.
24
6.2.2 Counting Malacious And Benign Apps
1 import json
2 from setting import config
3 from utils import export_to_json , exportToTextFile ,
sort _by_ma liciou s
4
6 def c o u n t _ m a l i c i o u s _ a n d _ b e n i g n () :
7 json_file = config [ ’ apksResultJsonPath ’]
8 resultPath = ’{ path }/ Features /
countMaliciousAndBenignFeatures - Output . txt ’. format (
9 path = config [ ’ resultApksPath ’])
10 with open ( json_file , " r ") as f :
11 data = json . load ( f )
12 arr = []
13 result = {}
14 for app in data :
15 for key , value in app . items () :
16 if key == " label ":
17 continue
18 if key not in result :
19 result [ key ] = {" malicious ": 0 , " benign ": 0}
20 if app [" label "] == 1:
21 result [ key ][" malicious "] += 1
22 else :
23 result [ key ][" benign "] += 1
24
29
30 c o u n t _ m a l i c i o u s _ a n d _ b e n i g n ()
The code defines a function count malicious and benign that processes a JSON file,
specified in a configuration setting, to count occurrences of features in malicious versus
benign applications. It reads the JSON data, initializes a result dictionary to track counts
of each feature, and iterates over each application’s data. For each feature (excluding the
”label”), it increments counters based on whether the application is labeled as malicious
or benign. The features are then sorted by the number of malicious occurrences using
a utility function, and the sorted data is exported to a specified output path in JSON
format. The function returns the final counts of features categorized by their occurrence
in malicious and benign apps.
25
6.2.3 Extracting Best Results
1 from setting import config
2 from utils import r e a d _ i n f o r m a t i o n _ f r o m _ r e s u l t _ f r o m _ m o d e l s
3
4 import warnings
5 from glob import glob
6
10 def main () :
11
17 data = r e a d _ i n f o r m a t i o n _ f r o m _ r e s u l t _ f r o m _ m o d e l s ( files ,
resultModelsPath , resultApksPath )
18
19
20 if __name__ == ’ __main__ ’:
21 main ()
The provided script reads and processes JSON files containing model results for
Android application packages (APKs). It begins by importing necessary configuration
settings and utility functions, along with suppressing future warnings to keep the
output clean. The main function sets up paths using configuration settings to locate the
JSON files within the specified directory (resultModelsPath). It then uses the glob
module to gather all JSON files in that directory. The collected files and relevant paths
are passed to the read information from result from models utility function for further
processing. The script ensures that the main function is executed when the script is run
directly, facilitating organized and clean data processing of model results for APKs.
26
6.2.4 Main code
1 from sklearn . exceptions import Con ve rg en ce Wa rn in g
18 # need to
19 global c_val , epsilon_val , test_size_val , random_state_val
20 global c_val_max , epsilon_val_max , r a n d o m _ s t a t e _ v a l _ m a x
21 global accuracy_max , precision_max , recall_max
22 global num_benign_apps , num _m al ic io us _a pp s
23
24
25 def main () :
26 """"
27 This is the main function that executes the entire
process of loading the data , preprocessing it ,
28 training a model , and evaluating its performance . It
also allows for the classification of a new APK .
29
30 Inputs :
31 None .
32 Returns :
33 None .
34 """
35
27
36 path = config [ ’ apksResultJsonPath ’]
37 resultApksPath = config [ ’ resultApksPath ’]
38 resultModelsPath = f ’{ resultApksPath }/ resultModels ’
39 # # Load data
40 df , malicious_count , benign_count = load_data (
41 path )
42
49 # Preprocess data
50 X_scaled , y_encoded = preprocess_data (X , y )
51
65 # Train model
66 r es u l ts T r ai n A n dT e s t = trainAndEval ( X_train , X_test ,
y_train ,
67 y_test ,
test_size [ idx ])
68
74
28
75 if __name__ == ’ __main__ ’:
76 main ()
main code This is the main function that executes the entire process of loading the data,
preprocessing it, training a model, and evaluating its performance. It also allows for
the classification of a new APK.Dataset consisting of feature vectors of 215 attributes
extracted from 1817 applications (165 malware apps from Drebin project and 1652
benign apps). The dataset has been used to develop and evaluate multilevel classifier
fusion approach for Android malware detection, published in the IEEE Transactions on
Cybernetics paper ’DroidFusion: A Novel Multilevel Classifier Fusion Approach for
Android Malware Detection. The supporting file contains the description of the feature
vectors/attributes obtained via static code analysis of the Android apps.
29
6.2.5 Plot
1 import json
2 import matplotlib . pyplot as plt
3 import warnings
4 warnings . simplefilter ( action = ’ ignore ’ , category =
FutureWarning )
5
30
6.3 Input
Dataset consisting of feature vectors of 215 attributes extracted from 1817 applications
(165 malware apps from Drebin project and 1652 benign apps). The dataset has been
used to develop and evaluate multilevel classifier fusion approach for Android
malware detection, published in the IEEE Transactions on Cybernetics paper
’DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware
Detection. The supporting file contains the description of the feature vectors/attributes
obtained via static code analysis of the Android apps.
31
transact API call signature
onServiceConnected API call signature
bindService API call signature
attachInterface API call signature
ServiceConnection API call signature
android.os.Binder API call signature
SEND_SMS Manifest Permission
Ljava.lang.Class.getCanonicalName API call signature
Ljava.lang.Class.getMethods API call signature
Ljava.lang.Class.cast API call signature
Ljava.net.URLDecoder API call signature
android.content.pm.Signature API call signature
android.telephony.SmsManager API call signature
READ_PHONE_STATE Manifest Permission
getBinder API call signature
ClassLoader API call signature
Landroid.content.Context.registerReceiver API call signature
Ljava.lang.Class.getField API call signature
Landroid.content.Context.unregisterReceiver API call signature
GET_ACCOUNTS Manifest Permission
RECEIVE_SMS Manifest Permission
Ljava.lang.Class.getDeclaredField API call signature
READ_SMS Manifest Permission
getCallingUid API call signature
Ljavax.crypto.spec.SecretKeySpec API call signature
android.intent.action.BOOT_COMPLETED Intent
USE_CREDENTIALS Manifest Permission
MANAGE_ACCOUNTS Manifest Permission
android.content.pm.PackageInfo API call signature
KeySpec API call signature
TelephonyManager.getLine1Number API call signature
DexClassLoader API call signature
HttpGet.init API call signature
SecretKey API call signature
Ljava.lang.Class.getMethod API call signature
System.loadLibrary API call signature
android.intent.action.SEND API call signature
Ljavax.crypto.Cipher API call signature
WRITE_SMS Manifest Permission
READ_SYNC_SETTINGS Manifest Permission
AUTHENTICATE_ACCOUNTS Manifest Permission
android.telephony.gsm.SmsManager API call signature
WRITE_HISTORY_BOOKMARKS Manifest Permission
TelephonyManager.getSubscriberId API call signature
mount Commands signature
INSTALL_PACKAGES Manifest Permission
Runtime.getRuntime API call signature
CAMERA Manifest Permission
Ljava.lang.Object.getClass API call signature
WRITE_SYNC_SETTINGS Manifest Permission
READ_HISTORY_BOOKMARKS Manifest Permission
Ljava.lang.Class.forName API call signature
INTERNET Manifest Permission
android.intent.action.PACKAGE_REPLACED Intent
Binder API call signature
android.intent.action.SEND_MULTIPLE Intent
RECORD_AUDIO Manifest Permission
IBinder API call signature
android.os.IBinder API call signature
createSubprocess API call signature
NFC Manifest Permission
ACCESS_LOCATION_EXTRA_COMMANDS Manifest Permission
URLClassLoader API call signature
WRITE_APN_SETTINGS Manifest Permission
abortBroadcast API call signature
BIND_REMOTEVIEWS Manifest Permission
android.intent.action.TIME_SET Intent
READ_PROFILE Manifest Permission
TelephonyManager.getDeviceId API call signature
MODIFY_AUDIO_SETTINGS Manifest Permission
getCallingPid API call signature
READ_SYNC_STATS Manifest Permission
BROADCAST_STICKY Manifest Permission
android.intent.action.PACKAGE_REMOVED Intent
android.intent.action.TIMEZONE_CHANGED Intent
WAKE_LOCK Manifest Permission
RECEIVE_BOOT_COMPLETED Manifest Permission
RESTART_PACKAGES Manifest Permission
Ljava.lang.Class.getPackage API call signature
chmod Commands signature
Ljava.lang.Class.getDeclaredClasses API call signature
android.intent.action.ACTION_POWER_DISCONNECTED Intent
android.intent.action.PACKAGE_ADDED Intent
PathClassLoader API call signature
TelephonyManager.getSimSerialNumber API call signature
Runtime.load API call signature
TelephonyManager.getCallState API call signature
BLUETOOTH Manifest Permission
READ_CALENDAR Manifest Permission
READ_CALL_LOG Manifest Permission
SUBSCRIBED_FEEDS_WRITE Manifest Permission
READ_EXTERNAL_STORAGE Manifest Permission
TelephonyManager.getSimCountryIso API call signature
sendMultipartTextMessage API call signature
PackageInstaller API call signature
VIBRATE Manifest Permission
remount Commands signature
android.intent.action.ACTION_SHUTDOWN Intent
sendDataMessage API call signature
ACCESS_NETWORK_STATE Manifest Permission
chown Commands signature
HttpPost.init API call signature
Ljava.lang.Class.getClasses API call signature
SUBSCRIBED_FEEDS_READ Manifest Permission
TelephonyManager.isNetworkRoaming API call signature
CHANGE_WIFI_MULTICAST_STATE Manifest Permission
WRITE_CALENDAR Manifest Permission
android.intent.action.PACKAGE_DATA_CLEARED Intent
MASTER_CLEAR Manifest Permission
HttpUriRequest API call signature
UPDATE_DEVICE_STATS Manifest Permission
WRITE_CALL_LOG Manifest Permission
DELETE_PACKAGES Manifest Permission
GET_TASKS Manifest Permission
GLOBAL_SEARCH Manifest Permission
DELETE_CACHE_FILES Manifest Permission
WRITE_USER_DICTIONARY Manifest Permission
android.intent.action.PACKAGE_CHANGED Intent
android.intent.action.NEW_OUTGOING_CALL Intent
REORDER_TASKS Manifest Permission
WRITE_PROFILE Manifest Permission
SET_WALLPAPER Manifest Permission
BIND_INPUT_METHOD Manifest Permission
divideMessage API call signature
READ_SOCIAL_STREAM Manifest Permission
READ_USER_DICTIONARY Manifest Permission
PROCESS_OUTGOING_CALLS Manifest Permission
CALL_PRIVILEGED Manifest Permission
Runtime.exec API call signature
BIND_WALLPAPER Manifest Permission
RECEIVE_WAP_PUSH Manifest Permission
DUMP Manifest Permission
BATTERY_STATS Manifest Permission
ACCESS_COARSE_LOCATION Manifest Permission
SET_TIME Manifest Permission
android.intent.action.SENDTO Intent
WRITE_SOCIAL_STREAM Manifest Permission
WRITE_SETTINGS Manifest Permission
REBOOT Manifest Permission
BLUETOOTH_ADMIN Manifest Permission
TelephonyManager.getNetworkOperator API call signature
/system/bin Commands signature
MessengerService API call signature
BIND_DEVICE_ADMIN Manifest Permission
WRITE_GSERVICES Manifest Permission
IRemoteService API call signature
KILL_BACKGROUND_PROCESSES Manifest Permission
SET_ALARM API call signature
ACCOUNT_MANAGER API call signature
/system/app Commands signature
android.intent.action.CALL Intent
STATUS_BAR Manifest Permission
TelephonyManager.getSimOperator API call signature
PERSISTENT_ACTIVITY Manifest Permission
CHANGE_NETWORK_STATE Manifest Permission
onBind API call signature
Process.start API call signature
android.intent.action.SCREEN_ON Intent
Context.bindService API call signature
RECEIVE_MMS Manifest Permission
SET_TIME_ZONE Manifest Permission
android.intent.action.BATTERY_OKAY Intent
CONTROL_LOCATION_UPDATES Manifest Permission
BROADCAST_WAP_PUSH Manifest Permission
BIND_ACCESSIBILITY_SERVICE Manifest Permission
ADD_VOICEMAIL Manifest Permission
CALL_PHONE Manifest Permission
ProcessBuilder API call signature
BIND_APPWIDGET Manifest Permission
FLASHLIGHT Manifest Permission
READ_LOGS Manifest Permission
Ljava.lang.Class.getResource API call signature
defineClass API call signature
SET_PROCESS_LIMIT Manifest Permission
android.intent.action.PACKAGE_RESTARTED Intent
MOUNT_UNMOUNT_FILESYSTEMS Manifest Permission
BIND_TEXT_SERVICE Manifest Permission
INSTALL_LOCATION_PROVIDER Manifest Permission
android.intent.action.CALL_BUTTON Intent
android.intent.action.SCREEN_OFF Intent
findClass API call signature
SYSTEM_ALERT_WINDOW Manifest Permission
MOUNT_FORMAT_FILESYSTEMS Manifest Permission
CHANGE_CONFIGURATION Manifest Permission
CLEAR_APP_USER_DATA Manifest Permission
intent.action.RUN Intent
android.intent.action.SET_WALLPAPER Intent
CHANGE_WIFI_STATE Manifest Permission
READ_FRAME_BUFFER Manifest Permission
ACCESS_SURFACE_FLINGER Manifest Permission
Runtime.loadLibrary API call signature
BROADCAST_SMS Manifest Permission
EXPAND_STATUS_BAR Manifest Permission
INTERNAL_SYSTEM_WINDOW Manifest Permission
android.intent.action.BATTERY_LOW Intent
SET_ACTIVITY_WATCHER Manifest Permission
WRITE_CONTACTS Manifest Permission
android.intent.action.ACTION_POWER_CONNECTED Intent
BIND_VPN_SERVICE Manifest Permission
DISABLE_KEYGUARD Manifest Permission
ACCESS_MOCK_LOCATION Manifest Permission
GET_PACKAGE_SIZE Manifest Permission
MODIFY_PHONE_STATE Manifest Permission
CHANGE_COMPONENT_ENABLED_STATE Manifest Permission
CLEAR_APP_CACHE Manifest Permission
SET_ORIENTATION Manifest Permission
READ_CONTACTS Manifest Permission
DEVICE_POWER Manifest Permission
HARDWARE_TEST Manifest Permission
ACCESS_WIFI_STATE Manifest Permission
WRITE_EXTERNAL_STORAGE Manifest Permission
ACCESS_FINE_LOCATION Manifest Permission
SET_WALLPAPER_HINTS Manifest Permission
SET_PREFERRED_APPLICATIONS Manifest Permission
WRITE_SECURE_SETTINGS Manifest Permission
class B=Benign; S=Malware
6.4 Output
37
6.5 Testing
We all have to agree that in today’s ever-changing and competitive world, the internet
has become an integral part of our lives. Most of us make our decisions by searching
the information on the internet these days, hence hosting a website is no[9] longer
optional but mandatory for all kind of businesses. It is the first step in becoming and
staying relevant in the market.
Just having a website is not enough. An organization is needed to develop a
website that is informative, accessible and user-friendly. To maintain all these
qualities, the website should be well tested, and this process of testing a website is
known as web testing.
Test for – all the links in web pages, database connection, forms used for submitting or
getting information from the user in the web pages.
• Test the outgoing links from all the pages to the specific domain under test.
38
• Test links jumping on the same pages.
• Test links used to send email to admin or other users from web pages.
• Finally, link checking includes, check for broken links in all the above-mentioned
links.
Forms are an integral part of any website. Forms are used for receiving information
from users and to interact with them. So what should be checked in these forms?
• Options to create forms if any, form delete, view or modify the forms.
If you are optimizing your site for Search engines then HTML/CSS validation is the
most important one. Mainly validate the site for HTML syntax errors.
Database Testing:
Data consistency is also very important in a web application. Check for data integrity
and errors while you edit, delete, modify the forms or do any DB related functionality.
Check if all the database queries are executing correctly, data is retrieved and also
updated correctly.
39
Usability Testing includes the following:
In web testing, the server-side interface should be tested. This is done by verifying that
communication is done properly. Compatibility of the server with software, hardware,
network, and the database should be tested.
The main interfaces are:
Check if all the interactions between these servers are executed and errors are handled
properly. If the database or web server returns an error message for any query by
application server then the application server should catch and display these error
messages appr
The primary reason for testing the security of a web is to identify potential
vulnerabilities and subsequently repair them.
Following are some of the test cases for web security testing:
40
• Test by pasting the internal URL directly into the browser address bar without
login. Internal pages should not open.
• Try some invalid inputs in input fields like login username, password, input text
boxes, etc. Check the system’s reaction to all invalid inputs.
41
CONCLUSION
We conclude that we have built a model which is used to detect the malware in android
applications using machine learning algorithms specifically the Decision Tree
Classifier, to analyze and predict malware presence in Android apps. The project
encompasses data preprocessing, model training, evaluation, and visualization,
providing valuable insights into the performance of the classifier. The visualization of
the confusion matrix heatmap enhances user understanding of model predictions and
aids in assessing the model’s effectiveness. Enhance model accuracy by implementing
ensemble techniques like Random Forests, veraging the strength of multiple models
for better predictions. Improve the Decision Tree Classifier’s performance by
fine-tuning its hyperparameters using techniques like Grid Search or Random Search.
Enable continuous learning by integrating online learning methods, allowing the
model to adapt to new data and enhance accuracy over time
42
REFERENCES
[2] Intelligent Mobile Malware Detection Using Permission Requests and API Calls:
[3] DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket:
[6] National Institute of Standards and Technology (NIST) Mobile Malware Dataset:
[7] Intelligent Mobile Malware Detection by Xiangliang Zhou and Dongqing Zhang:
43