Phishing Detection
Phishing Detection
(Autonomous)
Accredited by NAAC with A+ Grade
Ramanthapur, Hyderabad, Telangana – 500013
PROJECT REPORT ON
AT
Project Report submitted in partial fulfilment of the requirements for the award of the Degree of
Submitted by
THAKUR VINEETHA BAI
(H.T No. 1302-23-862-133)
MR.DEVENDER RAO
CERTIFICATE
This is to certify that the project work entitled
I, Thakur Vineetha Bai here by declare that the project entitled “Phishing
Detection System Through Hybrid Machine Learning Based on URL”
has been carried out by me in the Manac Infotech. This project is submitted
to Osmania University Hyderabad in partial fulfillment of requirements for the
award of the degree of “MASTERS OF COMPUTER APPLICATIONS”. The
results embodied in this dissertation have not been submitted to any other
University or institution for the award of Degree or Diploma.
I would like to express deep gratitude and respect to all those people behind the
scene who guided, inspired in the completion of this project work.
Last but not least I am very thankful to the faculty members of my college and friends
for their suggestions and help me successfully completing this project.
1302-23-862-133
TABLE OF CONTENTS
TITLE Page No’s
2. Abstract 4-5
6. IMPLEMENTATION 29-79
6.1 MODULES
6.1.1 Module Description
6.2 SOFTWARE ENVIRONMENT
6.2.1 PYTHON
6.2.2 Source Code
7. TESTING AND RESULT 80-88
7.1 System Test
7.2 Output Screens
8. CONCLUSION 89-91
9. REFERENCES/BIBLIOGRAPHY 92-94
ORGANIZATION PROFILE
1
1. ORGANIZATION PROFILE:
2
internationally. His recent work includes cloud-based applications and data-secure platforms for
educational institutions, fintech, and other sectors.
Associated Organizations:
VISIONARY GROUP OF COLLEGES https://visionaryedu.in
TOMEINTERNATIONALSCHOOL https://tomeinternationalschool.com
BLUEX TECHNO https://bluextechno.com
REPX https://repx.in
INFINITE INNOVATIVE https://infiniteinnovative.com
INTELLIVISION https://intellivisioninternational.com These projects
showcase his ability to blend academic knowledge with practical software solutions tailored to
organizational needs.
Role in Project:
As the lead developer and domain expert, Mr. Rahman has overseen the secure design, encryption
model, and cloud architecture involved in this project, ensuring both innovation and data protection.
3
ABSTRACT
4
2. ABSTRACT
PHISHING DETECTION SYSTEM THROUGH HYBRID
MACHINE LEARNING BASED ON URL
Currently, numerous types of cybercrime are organized through the internet. Hence, this
study mainly focuses on phishing attacks. Although phishing was first used in 1996, it has
become the most severe and dangerous cybercrime on the internet. Phishing utilizes email
distortion as its underlying mechanism for tricky correspondences, followed by mock sites, to
obtain the required data from people in question. Different studies have presented their work on
the precaution, identification, and knowledge of phishing attacks; however, there is currently no
complete and proper solution for frustrating them. Therefore, machine learning plays a vital role
in defending against cybercrimes involving phishing attacks. The proposed study is based on the
phishing URL-based dataset extracted from the famous dataset repository, which consists of
phishing and legitimate URL attributes collected from 11000+ website datasets in vector form.
After preprocessing, many machine learning algorithms have been applied and designed to
prevent phishing URLs and provide protection to the user. This study uses machine learning
models such as decision tree (DT), linear regression (LR), random forest (RF), naive Bayes
(NB), gradient boosting classifier (GBM), K-neighbors classifier (KNN), support vector
classifier (SVC), and proposed hybrid LSD model, which is a combination of logistic regression,
support vector machine, and decision tree (LR+SVC+DT) with soft and hard voting, to defend
against phishing attacks with high accuracy and efficiency. The canopy feature selection
technique with cross fold valoidation and Grid Search Hyperparameter Optimization techniques
are used with proposed LSD model. Furthermore, to evaluate the proposed approach, different
evaluation parameters were adopted, such as the precision, accuracy, recall, F1-score, and
specificity, to illustrate the effects and efficiency of the models. The results of the comparative
analyses demonstrate that the proposed approach outperforms the other models and achieves the
best results.
5
SYSTEM ANALYSIS
6
3. SYSTEMAN ALYSIS
7
and glare affect individual SEN students to different extents, while they felt tired and irritated
because of lighting discomfort, in general [14]. However, teachers and therapists often have no
control over lighting characteristics except switching on or off (p.105).
Emotion can affect learning and engagement in students with and without SEN. In
particular, students with ID often exhibit anxiety due to internal stress. Blood pressure, body
temperature, and heart rate are physiological markers for stress that hinder learning [15]. It was
shown that mild conditions could reduce these inhibitors in SEN students [16]. It is known that
abnormally high or low levels of skin conductance (measured through galvanic skin response,
GSR) hindered the learning performance of SEN students [17]. Besides, a study also found that
body movement facilitated by motion-based technology positively impacted SEN students’
short-termmemory skills.
MMLA employs multiple sources and formats of educational data such as activity logs,
audio, video and biosensors to enrich learning analytics [19]. MMLA is significantly enhanced
by the Internet of Things (IoT) technologies because the latter allows convenient capturing of
multimodal data from the complex learning environment [20]. Multimodal educational data
collected by IoT sensors include those detecting learners’ motion (e.g., head and body) and
physiological (e,g., heart, brain, and skin) behavior, as well as those measuring the ambient
learning environment (e.g., light, humidity, temperature, and noise). These data were collected
from physical objects or human bodies, then encoded into a machine-interpretable format and
served as input to MMLA [21]. Possible interpretations of the observed learning process can be
assigned based on validated learning theories.
Disadvantages:
1. The complexity of data: Most of the existing machine learning models must be able to
accurately interpret large and complex datasets to detect phishing urls.
2. Data availability: Most machine learning models require large amounts of data to create
accurate predictions. If data is unavailable in sufficient quantities, then model accuracy
may suffer.
3. Incorrect labeling: The existing machine learning models are only as accurate as the data
trained using the input dataset. If the data has been incorrectly labeled, the model cannot
make accurate predictions.
8
3.2 PROPOSED SYSTEM:
Phishing URL-based cyberattack detection is proposed in this study to prevent crime and
protect people’s privacy.
The dataset consists of 11000+ phishing URL attributes that help classify phishing URLs
based on these attributes.
Machine learning models have been applied, such as decision tree (DT), linear regression
(LR), naive Bayes (NB), random forest (RF), gradient boosting machine (GBM), support
vector classifier (SVC), K-Neighbors classifier (KNN), and the proposed hybrid model
(LR+SVC+DT) LSD with soft and hard voting, which can accurately classify the threats
of phishing URLs.
Cross-fold validation with a grid search parameter based on the canopy feature selection
technique was used with the proposed LSD hybrid model to improve prediction results.
The proposed methodology must be evaluated using evaluation parameters, such as
accuracy, precision, recall, specificity, and F1-score.
Advantages:
The classification of phishing URLs was implemented using machine learning
algorithms. Cybercrimes are growing with the growth of Internet architecture
worldwide, which needs to provide a security mechanism to prevent an attacker from
getting confidential content by breaching the network through fake and malicious
URLs. A phishing dataset was used to perform the experiments.
The dataset is in the form of data vectors that require null-value removal to remove
unnecessary empty values. Multiple machine learning algorithms, such as decision
tree (DT), linear regression (LR), naive Bayes (NB), random forest (RF), gradient
boosting machine (GBM), support vector classifier (SVC), K-neighbors classifier,
and the proposed hybrid model (LR+SVC+DT) LSD with soft and hard voting were
used based on functional features.
9
3.3 FEASIBILITY STUDY
FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business proposal is put
forth with a very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the major
requirements for the system is essential.
ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY
ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development of
the system is limited. The expenditures must be justified. Thus, the developed system was well
within the budget, and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased.
TECHNICAL FEASIBILITY
This study is carried out to check the technical feasibility, that is, the technical requirements of
the system. Any system developed must not have a high demand on the available technical
resources. This will lead to high demands being placed on the client. The developed system must
have modest requirements, as only minimal or no changes are required for implementing this
system.
10
SOCIAL FEASIBILITY
The aspect of the study is to check the level of acceptance of the system by the user. This includes
the process of training the user to use the system efficiently. The user must not feel threatened by the
system; instead, they must accept it as a necessity. The level of acceptance by the users solely
depends on the methods that are employed to educate them about the system and to make them
familiar with it. Their level of confidence must be raised so that they are also able to make some
constructive criticism, which is welcomed, as they are the final users of the system.
11
SYSTEM REQUIREMENTS
12
4.1 FUNCTIONAL REQUIREMENTS
Functional requirements will vary for different types of software. For example, functional
requirements for a website or mobile application should define user flows and various interaction
scenarios.
1. Resource manager
2. Interactive User
Nonfunctional requirements are not related to the system's functionality but rather define how
the system should perform. They are crucial for ensuring the system's usability, reliability, and
efficiency, often influencing the overall user experience. We’ll describe the main categories of
nonfunctional requirements in detail further on
HARDWARE REQUIREMENTS:
MINIMUM (Required for Execution) MY SYSTEM (Development)
Ram 1 Gb 4 Gb
13
SOFTWARE REQUIREMENTS
Operating System Windows 10/11
Database MySQL
Framework Django
14
SYSTEM DESIGN
15
5.1 SYSTEM ARCHITECTURE:
16
and was created by the Object Management Group (OMG).
The goal is for UML to become a common language for creating models of object-
oriented computer software. In its current form, UML is comprised of two major components: a
meta-model and a notation. In the future, some form of method or process may also be added
to, or associated with, UML.
The Unified Modeling Language is a standard language for specifying, visualizing,
constructing, and documenting the artifacts of software systems, as well as for business
modeling and other non-software systems.
UML represents a collection of best engineering practices that have proven successful in
the modeling of large and complex systems.
UML is a very important part of developing object-oriented software and the overall
software development process. It uses mostly graphical notations to express the design of
software projects.
.
GOALS:
The primary goals in the design of UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling language so that they can develop and
exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development processes.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of the object-oriented (OO) tools market.
6. Support higher-level development concepts such as collaborations, frameworks, patterns, and
components.
7. Integrate best practices.
17
5.2.1 USE CASEDIAGRAM:
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use
Use-case
case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented as
use cases), and any dependencies between those use cases.The main purpose of a use case
diagramss to show what system functions are performed for which actor. Roles of the actors in the
system can be depicted.
18
5.2.2 CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modeling Language(UML) is a
type of static structure diagram that describes the structure of a system by showing the system's
classes,their attributes , operations (or methods), and the relationships among the classes.It
explains which Class contains information.
19
5.2.3 SEQUENCE DIAGRAM:
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction
diagram that’s how show processes operate with one another and in what order. It is a construct
of a Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event
scenarios, and timing diagrams.
20
5.2.4 COLLABRATION DIAGRAM
21
5.2.5
.2.5 ACTIVITYDIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise activities and
a
actions with support for choice, iteration and concurrency. In the Unified Modeling Language
,activity diagrams can be used to describe the business and operational step
step-by-step
step workflows of
components in a system. An activity diagram shows the overall flow of control.
22
5.2.6COMPONENT
COMPONENT DIAGRAM:
23
5.2.7DEPLOYMENT DIAGRAM:
Deployment diagrams are used to visualize the topology of the physical components of a
system, where the software components are deployed. Deployment diagrams are used to describe the
static deployment view of a system. Deployment diagrams consist of nodes and their relationships.
24
5.2.8 E-R DIAGRAM
An Entity Relationship Diagram is a diagram that represents relationships among entities in a
database.. It is commonly known as an ER Diagram. An ER Diagram in DBMS plays a crucial role in
designing the database. Today's business world previews all the requirements demanded by the users in
the form of an ER Diagram.
25
5.2.9 DATA DICTIONARY
Database: phishing_detection_system
26
Table Name: auth_User
27
5.3 INPUT/OUTPUT DESIGN
Input design: considering the requirements, procedures to collect the necessary input data in
most efficiently designed. The input design has been done keeping in view that, the interaction
of the user with the system being the most effective and simplified way.
Also the measures are taken for the following
Controlling the amount of input
Avoid unauthorized access to the classroom.
Eliminate extra steps.
Keep the process simple.
At this stage, the input forms and screens are designed.
Output design: All the screens of the system are designed with a view to provide the user with easy
operations in a simpler and efficient way, with the minimum keystrokes possible. Instructions and
important information are emphasized on the screen. Almost every screen is provided with error-free
and important messages, and option selection facilities. Emphasis is given to speedy processing and
quick transactions between the screens. Each screen is designed to be as user-friendly as possible by
using interactive procedures. So to say, the user can operate the system without much help from the
operating manual.
28
IMPLEMENTATION
29
8.1 MODULE
1. Resource manager
2. Interactive User
MODULE DESCRIPTION
Operations Manager
In this module, the Service Provider has to login by using valid user name and password.
After login successful he can do some operations such as Browse URL Data Sets and
Train & Test, View Trained and Tested URL Data Sets Accuracy in Bar Chart, View
Trained and Tested URL Data Sets Accuracy Results, View Prediction Of URL Type
View URL Type Ratio, Download Predicted Data Sets, View URL Type Ratio Results,
View All Remote Users.
Interactive User
In this module, there are n numbers of users are present. User should register before
doing any operations. Once user registers, their details will be stored to the database.
After registration successful, he has to login by using authorized user name and
password. Once Login is successful user will do some operations Like Register And
Login, Predict Url Type, View Your Profile.
8.2
PYTHON
Python is Interpreted: Python is processed at runtime by the interpreter. You do not need to
compile your program before executing it. This is similar to PERL and PHP.
30
Python is Interactive: You can actually sit at a Python prompt and interact with the interpreter
directly to write your programs.
Python was developed by Guido van Rossum in the late eighties and early nineties at the National
Research Institute for Mathematics and Computer Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
SmallTalk, and Unix shell and other scripting languages.
Python is copyrighted. Like Perl, Python source code is now available under the GNU General Public
License (GPL).
Python is now maintained by a core development team at the institute, although Guido van Rossum
still holds a vital role in directing its progress.
Easy-to-learn: Python has few keywords, simple structure, and a clearly defined syntax. This
allows the student to pick up the language quickly.
Easy-to-read: Python code is more clearly defined and visible to the eyes.
A broad standard library: Python's bulk of the library is very portable and cross-platform
compatible on UNIX, Windows, and Macintosh.
31
Interactive Mode: Python has support for an interactive mode which allows interactive testing
and debugging of snippets of code.
Portable: Python can run on a wide variety of hardware platforms and has the same interface
on all platforms.
Extendable: You can add low-level modules to the Python interpreter. These modules enable
programmers to add to or customize their tools to be more efficient.
GUI Programming: Python supports GUI applications that can be created and ported to many
system calls, libraries and windows systems, such as Windows MFC, Macintosh, and the X
Window system of Unix.
Scalable: Python provides a better structure and support for large programs than shell scripting.
It can be used as a scripting language or can be compiled to byte-code for building large
applications.
It provides very high-level dynamic data types and supports dynamic type checking.
It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
32
2.1 ARITHMETIC OPERATORS
- Subtraction Subtracts right hand operand from left hand operand. a–b=-
10
33
% Modulus Divides left hand operand by right hand operand and b%a=
returns remainder 0
34
a
35
2.3 IDENTITY OPERATOR
& Binary AND Operator copies a bit to the result if it exists in both (a & b)
operands (means
0000 1100)
^ Binary XOR It copies the bit if it is set in one operand but not both. (a ^ b) = 49
(means
0011 0001)
~ Binary Ones It is unary and has the effect of 'flipping' bits. (~a ) = -61
(means
36
Complement 1100 0011
in 2's
complement
form due to
a signed
binary
number.
<< Binary Left Shift The left operands value is moved left by the number of bits a << 2 =
specified by the right operand. 240 (means
1111 0000)
>> Binary Right The left operands value is moved right by the number of a >> 2 = 15
Shift bits specified by the right operand. (means
0000 1111)
and Logical If both the operands are true then condition (a and b)
AND becomes true. is true.
not Logical Used to reverse the logical state of its operand. Not(a
NOT and b) is
false.
37
sequence y.
not in Evaluates to true if it does not finds a variable in the x not in y, here
specified sequence and false otherwise. not in results in a
1 if x is not a
member of
sequence y.
~+- Complement, unary plus and minus (method names for the last two are
+@ and -@)
38
**=
3.1 LIST
The list is a most versatile data type available in Python which can be written as a list of comma-
separated values (items) between square brackets. Important thing about a list is that items in a list need
not be of the same type.
Creating a list is as simple as putting different comma-separated values between square brackets. For
example −
list1 =['physics','chemistry',1997,2000];
list2 =[1,2,3,4,5];
list3 =["a","b","c","d"]
39
['Hi!'] * 4 ['Hi!', 'Hi!', 'Hi!', 'Hi!'] Repetition
1 cmp(list1, list2)
2 len(list)
3 max(list)
4 min(list)
5 list(seq)
40
Python includes following list methods
1 list.append(obj)
2 list.count(obj)
3 list. extend(seq)
4 list.index(obj)
5 list.insert(index, obj)
6 list.pop(obj=list[-1])
7 list.remove(obj)
8 list.reverse()
9 list.sort([func])
41
Sorts objects of list, use compare function if given
3.2 TUPLES
A tuple is a sequence of immutable Python objects. Tuples are sequences, just like lists. The
differences between tuples and lists are, the tuples cannot be changed unlike lists and tuples use
parentheses, whereas lists use square brackets.
Creating a tuple is as simple as putting different comma-separated values. Optionally we can put these
comma-separated values between parentheses also. For example −
tup1 =('physics','chemistry',1997,2000);
tup2 =(1,2,3,4,5);
tup3 ="a","b","c","d";
tup1 =();
To write a tuple containing a single value you have to include a comma, even though there is only one
value −
tup1 =(50,);
Like string indices, tuple indices start at 0, and they can be sliced, concatenated, and so on.
tup1 =('physics','chemistry',1997,2000);
tup2 =(1,2,3,4,5,6,7);
tup1[0]: physics
42
tup2[1:5]: [2, 3, 4, 5]
Updating Tuples:
Tuples are immutable which means you cannot update or change the values of tuple elements. We are
able to take portions of existing tuples to create new tuples as the following example demonstrates −
tup1 =(12,34.56);
tup2 =('abc','xyz');
tup3 = tup1 + tup2;
print tup3
To explicitly remove an entire tuple, just use the del statement. For example:
tup =('physics','chemistry',1997,2000);
print tup
del tup;
print"After deleting tup : "
print tup
43
(1, 2, 3) + (4, 5, 6) (1, 2, 3, 4, 5, 6) Concatenation
1
cmp(tuple1, tuple2):Compares elements of both tuples.
2
len(tuple):Gives the total length of the tuple.
3
max(tuple):Returns item from the tuple with max value.
4
min(tuple):Returns item from the tuple with min value.
5
tuple(seq):Converts a list into tuple.
3.2 DICTIONARY
Each key is separated from its value by a colon (:), the items are separated by commas, and the whole
thing is enclosed in curly braces. An empty dictionary without any items is written with just two curly
braces, like this: {}.
Keys are unique within a dictionary while values may not be. The values of a dictionary can be of any
type, but the keys must be of an immutable data type such as strings, numbers, or tuples.
44
Accessing Values in Dictionary:
To access dictionary elements, you can use the familiar square brackets along with the key to obtain its
value. Following is a simple example −
dict ={'Name':'Zara','Age':7,'Class':'First'}
Result –
dict['Name']: Zara
dict['Age']: 7
Updating Dictionary
We can update a dictionary by adding a new entry or a key-value pair, modifying an existing entry, or
deleting an existing entry as shown below in the simple example −
dict ={'Name':'Zara','Age':7,'Class':'First'}
Result −
dict['Age']: 8
dict['School']: DPS School
To explicitly remove an entire dictionary, just use the del statement. Following is a simple example –
dict ={'Name':'Zara','Age':7,'Class':'First'}
45
del dict['Name'];# remove entry with key 'Name'
dict.clear();# remove all entries in dict
del dict ;# delete entire dictionary
1 cmp(dict1, dict2)
2 len(dict)
Gives the total length of the dictionary. This would be equal to the number of items in
the dictionary.
3 str(dict)
4 type(variable)
Returns the type of the passed variable. If passed variable is dictionary, then it would
46
return a dictionary type.
3 dict.fromkeys():Create a new dictionary with keys from seq and values set to value.
47
A function is a block of organized, reusable code that is used to perform a single, related action.
Functions provide better modularity for your application and a high degree of code reusing. Python
gives you many built-in functions like print(), etc. but you can also create your own functions. These
functions are called user-defined functions.
Defining a Function
Simple rules to define a function in Python.
Function blocks begin with the keyword def followed by the function name and parentheses ( (
) ).
Any input parameters or arguments should be placed within these parentheses. You can also
define parameters inside these parentheses.
The first statement of a function can be an optional statement - the documentation string of the
function or docstring.
The code block within every function starts with a colon (:) and is indented.
The statement return [expression] exits a function, optionally passing back an expression to the
caller. A return statement with no arguments is the same as return None.
function_suite
return[expression
Calling a Function
Defining a function only gives it a name, specifies the parameters that are to be included in the
function and structures the blocks of code.Once the basic structure of a function is finalized, you can
execute it by calling it from another function or directly from the Python prompt. Following is the
example to call printme() function −
48
"This prints a passed string into this function"
print str
return;
# Now you can call printme function
printme("I'm first call to user defined function!")
printme("Again second call to the same function")
Function Arguments
You can call a function by using the following types of formal arguments:
Required arguments
Keyword arguments
Default arguments
Variable-length arguments
Scope of Variables
All variables in a program may not be accessible at all locations in that program. This depends on
where you have declared a variable.
The scope of a variable determines the portion of the program where you can access a particular
identifier. There are two basic scopes of variables in Python −
49
Global vs. Local variables
Variables that are defined inside a function body have a local scope, and those defined outside have a
global scope.
This means that local variables can be accessed only inside the function in which they are declared,
whereas global variables can be accessed throughout the program body by all functions. When you call
a function, the variables declared inside it are brought into scope. Following is a simple example −
return total;
sum(10,20);
Result −
A module allows you to logically organize your Python code. Grouping related code into a module
makes the code easier to understand and use. A module is a Python object with arbitrarily named
attributes that you can bind and reference.Simply, a module is a file consisting of Python code. A
module can define functions, classes and variables. A module can also include runnable code.
Example:
The Python code for a module named aname normally resides in a file named aname.py. Here's an
example of a simple module, support.py
50
print"Hello : ", par
return
When the interpreter encounters an import statement, it imports the module if the module is present in
the search path. A search path is a list of directories that the interpreter searches before importing a
module. For example, to import the module support.py, you need to put the following command at the
top of the script −
A module is loaded only once, regardless of the number of times it is imported. This prevents the
module execution from happening over and over again if multiple imports occur.
Packages in Python
A package is a hierarchical file directory structure that defines a single Python application
environment that consists of modules and sub packages and sub-sub packages.
Consider a file Pots.py available in Phone directory. This file has following line of source code −
defPots():
Similar way, we have another two files having different functions with the same name as above −
51
Phone/__init__.py
To make all of your functions available when you've imported Phone,to put explicit import statements
in __init__.py as follows −
fromPotsimportPots
fromIsdnimportIsdn
from G3 import G3
After you add these lines to __init__.py, you have all of these classes available when you import the
Phone package.
importPhone
Phone.Pots()
Phone.Isdn()
Phone.G3()
RESULT:
I'm 3GPhone
In the above example, we have taken example of a single functions in each file, but you can keep
multiple functions in your files. You can also define different Python classes in those files and then
you can create your packages out of those classes. This chapter covers all the basic I/O functions
available in Python.
52
print"Python is really a great language,","isn't it?"
Result:
raw_input
input
The raw_input([prompt]) function reads one line from standard input and returns it as a string
(removing the trailing newline).
This prompts you to enter any string and it would display same string on the screen. When I typed
"Hello Python!", its output is like this −
The input([prompt]) function is equivalent to raw_input, except that it assumes the input is a valid
Python expression and returns the evaluated result to you.
53
This would produce the following result against the entered input −
Until now, you have been reading and writing to the standard input and output. Now, we will see how
to use actual data files. Python provides basic functions and methods necessary to manipulate files by
default. You can do most of the file manipulation using a file object.
Before you can read or write a file, you have to open it using Python's built-in open() function. This
function creates a file object, which would be utilized to call other support methods associated with it.
Syntax
file object= open(file_name [, access_mode][, buffering])
file_name: The file_name argument is a string value that contains the name of the file that you
want to access.
54
Modes Description
r Opens a file for reading only. The file pointer is placed at the beginning of the file. This
is the default mode.
rb Opens a file for reading only in binary format. The file pointer is placed at the
beginning of the file. This is the default mode.
r+ Opens a file for both reading and writing. The file pointer placed at the beginning of the
file.
rb+ Opens a file for both reading and writing in binary format. The file pointer placed at the
beginning of the file.
w Opens a file for writing only. Overwrites the file if the file exists. If the file does not
exist, creates a new file for writing.
wb Opens a file for writing only in binary format. Overwrites the file if the file exists. If the
file does not exist, creates a new file for writing.
w+ Opens a file for both writing and reading. Overwrites the existing file if the file exists. If
the file does not exist, creates a new file for reading and writing.
wb+ Opens a file for both writing and reading in binary format. Overwrites the existing file if
the file exists. If the file does not exist, creates a new file for reading and writing.
a Opens a file for appending. The file pointer is at the end of the file if the file exists. That
is, the file is in the append mode. If the file does not exist, it creates a new file for
writing.
ab Opens a file for appending in binary format. The file pointer is at the end of the file if
the file exists. That is, the file is in the append mode. If the file does not exist, it creates
a new file for writing.
a+ Opens a file for both appending and reading. The file pointer is at the end of the file if
the file exists. The file opens in the append mode. If the file does not exist, it creates a
new file for reading and writing.
55
access_mode: The access_mode determines the mode in which the file has to be opened, i.e.,
read, write, append, etc. A complete list of possible values is given below in the table. This is
optional parameter and the default file access mode is read (r).
buffering: If the buffering value is set to 0, no buffering takes place. If the buffering value is 1,
line buffering is performed while accessing a file. If you specify the buffering value as an
integer greater than 1, then buffering action is performed with the indicated buffer size. If
negative, the buffer size is the system default(default behavior).
Once a file is opened and you have one file object, you can get various information related to that file.
Attribute Description
file.softspace Returns false if space explicitly required with print, true otherwise.
Example
# Open a file
fo = open("foo.txt","wb")
print"Name of the file: ", fo.name
print"Closed or not : ", fo.closed
56
print"Opening mode : ", fo.mode
print"Softspace flag : ", fo.softspace
The close() method of a file object flushes any unwritten information and closes the file object, after
which no more writing can be done.Python automatically closes a file when the reference object of a
file is reassigned to another file. It is a good practice to use the close() method to close a file.
Syntax
fileObject.close();
Example
# Open a file
fo = open("foo.txt","wb")
print"Name of the file: ", fo.name
# Close opend file
fo.close()
Result −
The file object provides a set of access methods to make our lives easier. We would see how to
use read() and write() methods to read and write files.
57
The write() method writes any string to an open file. It is important to note that Python strings can
have binary data and not just text.The write() method does not add a newline character ('\n') to the end
of the string Syntax
fileObject.write(string);
Here, passed parameter is the content to be written into the opened file.Example
# Open a file
fo = open("foo.txt","wb")
fo.write("Python is a great language.\nYeah its great!!\n");
The above method would create foo.txt file and would write given content in that file and finally it
would close that file. If you would open this file, it would have following content.
The read() method reads a string from an open file. It is important to note that Python strings can have
binary data. apart from text data.
Syntax
fileObject.read([count]);
Here, passed parameter is the number of bytes to be read from the opened file. This method starts
reading from the beginning of the file and if count is missing, then it tries to read as much as possible,
maybe until the end of file.
Example
58
# Open a file
fo = open("foo.txt","r+")
str = fo.read(10);
print"Read String is : ", str
# Close opend file
fo.close()
ReadStringis:Pythonis
File Positions
The tell() method tells you the current position within the file; in other words, the next read or write
will occur at that many bytes from the beginning of the file.
32
The seek(offset[, from]) method changes the current file position. The offset argument indicates the
number of bytes to be moved. The from argument specifies the reference position from where the
bytes are to be moved.
If from is set to 0, it means use the beginning of the file as the reference position and 1 means use the
current position as the reference position and if it is set to 2 then the end of the file would be taken as
the reference position.
Example
# Open a file
fo = open("foo.txt","r+")
str = fo.read(10);
print"Read String is : ", str
59
# Check current position
position = fo.tell();
print"Current file position : ", position
ReadStringis:Pythonis
Current file position :10
Again read Stringis:Pythonis
Python os module provides methods that help you perform file-processing operations, such as
renaming and deleting files.
To use this module you need to import it first and then you can call any related functions.
The rename() method takes two arguments, the current filename and the new filename.
Syntax
os.rename(current_file_name, new_file_name)
Example
import os
60
# Rename a file from test1.txt to test2.txt
os.rename("test1.txt","test2.txt")
You can use the remove() method to delete files by supplying the name of the file to be deleted as the
argument.
Syntax
os.remove(file_name)
Example
#!/usr/bin/python
import os
Directories in Python
All files are contained within various directories, and Python has no problem handling these too.
The os module has several methods that help you create, remove, and change directories.
You can use the mkdir() method of the os module to create directories in the current directory. You
need to supply an argument to this method which contains the name of the directory to be created.
Syntax
os.mkdir("newdir")
Example
61
#!/usr/bin/python
import os
You can use the chdir() method to change the current directory. The chdir() method takes an argument,
which is the name of the directory that you want to make the current directory.
Syntax
os.chdir("newdir")
Example
#!/usr/bin/python
import os
Syntax
os.getcwd()
Example
import os
62
# This would give location of the current directory
os.getcwd()
The rmdir() method deletes the directory, which is passed as an argument in the method.
Syntax:
os.rmdir('dirname')
Example
Following is the example to remove "/tmp/test" directory. It is required to give fully qualified name of
the directory, otherwise it would search for that directory in the current directory.
import os
# This would remove "/tmp/test" directory.
os.rmdir("/tmp/test")
63
EXCEPTION NAME DESCRIPTION
Stop Iteration Raised when the next() method of an iterator does not point to
any object.
Standard Error Base class for all built-in exceptions except StopIteration and
SystemExit.
Arithmetic Error Base class for all errors that occur for numeric calculation.
Zero Division Error Raised when division or modulo by zero takes place for all
numeric types.
EOF Error Raised when there is no input from either the raw_input() or
input() function and the end of file is reached.
Keyboard Interrupt Raised when the user interrupts program execution, usually by
pressing Ctrl+c.
Key Error Raised when the specified key is not found in the dictionary.
Name Error Raised when an identifier is not found in the local or global
namespace.
64
Unbound Local Error Raised when trying to access a local variable in a function or
method but no value has been assigned to it.
Environment Error
Base class for all exceptions that occur outside the Python
environment.
IO Error Raised when an input/ output operation fails, such as the print
statement or the open() function when trying to open a file that
IO Error
does not exist.
System Error Raised when the interpreter finds an internal problem, but when
this error is encountered the Python interpreter does not exit.
System Exit Raised when Python interpreter is quit by using the sys.exit()
function. If not handled in the code, causes the interpreter to
exit.
Type Error Raised when an operation or function is attempted that is
invalid for the specified data type.
Value Error Raised when the built-in function for a data type has the valid
type of arguments, but the arguments have invalid values
specified.
Runtime Error Raised when a generated error does not fall into any category.
Not Implemented Raised when an abstract method that needs to be implemented
Error in an inherited class is not actually implemented.
File & Directory Related Methods
There are three important sources, which provide a wide range of utility methods to handle and
manipulate files & directories on Windows and Unix operating systems. They are as follows −
File Object Methods: The file object provides functions to manipulate files.
65
Python provides two very important features to handle any unexpected error in your
Python programs and to add debugging capabilities in them −
What is Exception?
An exception is an event, which occurs during the execution of a program that disrupts
the normal flow of the program's instructions. In general, when a Python script
encounters a situation that it cannot cope with, it raises an exception. An exception is a
Python object that represents an error.
When a Python script raises an exception, it must either handle the exception
immediately otherwise it terminates and quits.
Handling an exception
If you have some suspicious code that may raise an exception, you can defend your
program by placing the suspicious code in a try: block. After the try: block, include
an except: statement, followed by a block of code which handles the problem as
elegantly as possible.
The Python standard for database interfaces is the Python DB-API. Most Python database interfaces
adhere to this standard.
You can choose the right database for your application. Python Database API supports a wide range of
database servers such as −
GadFly
mSQL
66
MySQL
PostgreSQL
Informix
Interbase
Oracle
Sybase
The DB API provides a minimal standard for working with databases using Python structures and
syntax wherever possible. This API includes the following:
8.2.1 SOURCECODE:
Settings.py
import os
67
# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'm+1edl5m-5@u9u!b8-=4-4mq&o1%agco2xpl8c!7sn7!eowjk#'
ALLOWED_HOSTS = []
# Application definition
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'Remote_User',
'Service_Provider',
]
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]
ROOT_URLCONF = 'phishing_detection_system.urls'
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [(os.path.join(BASE_DIR,'Template/htmls'))],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
68
WSGI_APPLICATION = 'phishing_detection_system.wsgi.application'
# Database
# https://docs.djangoproject.com/en/3.0/ref/settings/#databases
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'phishing_detection_system',
'USER':'root',
'PASSWORD': '',
'HOST' :'127.0.0.1',
'PORT' :'3306',
}
}
# Password validation
# https://docs.djangoproject.com/en/3.0/ref/settings/#auth-password-validators
AUTH_PASSWORD_VALIDATORS = [
{
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
},
]
# Internationalization
# https://docs.djangoproject.com/en/3.0/topics/i18n/
LANGUAGE_CODE = 'en-us'
TIME_ZONE = 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True
69
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/3.0/howto/static-files/
STATIC_URL = '/static/'
STATICFILES_DIRS = [os.path.join(BASE_DIR,'Template/images')]
MEDIA_URL = '/media/'
MEDIA_ROOT = os.path.join(BASE_DIR, 'Template/media')
STATIC_ROOT = '/static/'
STATIC_URL = '/static/'
Views.py
def serviceproviderlogin(request):
70
if request.method == "POST":
admin = request.POST.get('username')
password = request.POST.get('password')
if admin == "Admin" and password =="Admin":
detection_accuracy.objects.all().delete()
return redirect('View_Remote_Users')
return render(request,'SProvider/serviceproviderlogin.html')
def View_Prediction_Of_URL_Type_Ratio(request):
detection_ratio.objects.all().delete()
rratio = ""
kword = 'Phishing URL'
print(kword)
obj = phishing_detection.objects.all().filter(Q(Prediction=kword))
obj1 = phishing_detection.objects.all()
count = obj.count();
count1 = obj1.count();
ratio = (count / count1) * 100
if ratio != 0:
detection_ratio.objects.create(names=kword, ratio=ratio)
ratio1 = ""
kword1 = 'Normal URL'
print(kword1)
obj1 = phishing_detection.objects.all().filter(Q(Prediction=kword1))
obj11 = phishing_detection.objects.all()
count1 = obj1.count();
count11 = obj11.count();
ratio1 = (count1 / count11) * 100
if ratio1 != 0:
detection_ratio.objects.create(names=kword1, ratio=ratio1)
71
obj = detection_ratio.objects.all()
return render(request, 'SProvider/View_Prediction_Of_URL_Type_Ratio.html', {'objs': obj})
def View_Remote_Users(request):
obj=ClientRegister_Model.objects.all()
return render(request,'SProvider/View_Remote_Users.html',{'objects':obj})
def ViewTrendings(request):
topic = phishing_detection.objects.values('topics').annotate(dcount=Count('topics')).order_by('-dcount')
return render(request,'SProvider/ViewTrendings.html',{'objects':topic})
def charts(request,chart_type):
chart1 = detection_ratio.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/charts.html", {'form':chart1, 'chart_type':chart_type})
def charts1(request,chart_type):
chart1 = detection_accuracy.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/charts1.html", {'form':chart1, 'chart_type':chart_type})
def View_Prediction_Of_URL_Type(request):
obj =phishing_detection.objects.all()
return render(request, 'SProvider/View_Prediction_Of_URL_Type.html', {'list_objects': obj})
def likeschart(request,like_chart):
charts =detection_accuracy.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/likeschart.html", {'form':charts, 'like_chart':like_chart})
def Download_Predicted_DataSets(request):
response = HttpResponse(content_type='application/ms-excel')
# decide file name
response['Content-Disposition'] = 'attachment; filename="Predicted_Data.xls"'
# creating workbook
wb = xlwt.Workbook(encoding='utf-8')
72
# adding sheet
ws = wb.add_sheet("sheet1")
# Sheet header, first row
row_num = 0
font_style = xlwt.XFStyle()
# headers are bold
font_style.font.bold = True
# writer = csv.writer(response)
obj = phishing_detection.objects.all()
data = obj # dummy method to fetch data.
for my_row in data:
row_num = row_num + 1
ws.write(row_num, 0, my_row.url, font_style)
ws.write(row_num, 1, my_row.Prediction, font_style)
wb.save(response)
return response
def train_model(request):
detection_accuracy.objects.all().delete()
data = pd.read_csv("Datasets.csv", encoding='latin-1')
def apply_results(results):
if (results == "benign"):
return 0
elif (results == "phishing"):
return 1
data['Results'] = data['type'].apply(apply_results)
x = data['url']
y = data['Results']
73
print(x)
print("Y")
print(y)
x = cv.fit_transform(x)
models = []
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.20)
X_train.shape, X_test.shape, y_train.shape
print("Naive Bayes")
NB = MultinomialNB()
NB.fit(X_train, y_train)
predict_nb = NB.predict(X_test)
naivebayes = accuracy_score(y_test, predict_nb) * 100
print("ACCURACY")
print(naivebayes)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, predict_nb))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, predict_nb))
detection_accuracy.objects.create(names="Naive Bayes", ratio=naivebayes)
# SVM Model
print("SVM")
from sklearn import svm
lin_clf = svm.LinearSVC()
lin_clf.fit(X_train, y_train)
predict_svm = lin_clf.predict(X_test)
74
svm_acc = accuracy_score(y_test, predict_svm) * 100
print("ACCURACY")
print(svm_acc)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, predict_svm))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, predict_svm))
detection_accuracy.objects.create(names="SVM", ratio=svm_acc)
print("Logistic Regression")
75
print("Gradient Boosting Classifier")
76
labeled = 'labeled_data.csv'
data.to_csv(labeled, index=False)
data.to_markdown
obj = detection_accuracy.objects.all()
return render(request,'SProvider/train_model.html', {'objs': obj})
Remote User
Predict_URL_Type.html
{% extends 'RUser/design.html' %}
{% block userblock %}
77
color: #FF0000;
font-weight: bold;
}
.style4 {color: #FFFF00; font-weight: bold; }
.style6 {
font-size: 24px;
color: #FFFF00;
font-weight: bold;
}
</style>
<body>
<div class="container-fluid">
<div class="container">
<div class="row">
<div class="col-md-5">
{% csrf_token %}
<table width="568" align="center">
<tr>
<td width="287" height="44" bgcolor="#FF0000"><div align="center"><span class="style4 style2">Enter URL
Here</span></div></td>
<td width="269"><textarea name="url" cols="40" rows="10"></textarea></td>
</tr>
<td><p> </p>
<p>
<input name="submit" type="submit" class="style1" value="Predict">
78
</p></td>
</tr>
</table>
</fieldset>
</form>
<form role="form" method="POST" >
{% csrf_token %}
<fieldset>
<hr>
<div>
<table height="85" border="0" align="center" >
<tr><td width="383" bgcolor="#FF0000"><div align="center"><span class="style6">URL TYPE</span><span
class="style4">:: ----></span></div></td>
<td width="227" bgcolor="#FFFFFF" style="color:red; font-size:20px; font-family:fantasy" ><div
align="center"><strong>{{objs}}</strong></div></td>
</tr>
</table>
</div>
</fieldset>
</form>
</div>
<div class="col-md-2">
<!-------null------>
</div>
</div>
</div>
</div>
{% endblock %}
<tr>
79
TESTING AND RESULT
80
What do you mean by software testing?
Testing involves operation of a system or application under controlled conditions
and evaluating the results. The controlled conditions should include both normal and
abnormal conditions. Testing should intentionally attempt to make things go wrong to
determine if things happen when they shouldn't or things don't happen when they should.
It is oriented to 'detection'.
Unit Testing:
Unit testing is a software development process in which the smallest testable parts
of an application, called units, are individually and independently scrutinized for proper
operation.
Unit testing is often automated but it can also be done manually. This testing mode is a
component of Extreme Programming (XP), a pragmatic method of software
development that takes a meticulous approach to building a product by means of
continual testing and revision.
Unit tests are written from a programmer's perspective. They ensure that a
particular method of a class successfully performs a set of specific tasks. Each test
confirms that a method produces the expected output when given a known input.
Performance Testing:
Performance testing is the process of determining the speed or effectiveness of a
computer, network, software program or device.
This process can involve quantitative tests done in a lab, such as measuring the response
time or the number of MIPS (millions of instructions per second) at which a system
functions.
Qualitative attributes such as reliability, scalability and interoperability may also be
evaluated. Performance testing is often done in conjunction with stress testing.
Performance testing can verify that a system meets the specifications claimed by
its manufacturer or vendor.
The process can compare two or more devices or programs in terms of parameters such
as speed, data transfer rate, bandwidth, throughput, efficiency or reliability.
Performance testing can also be used as a diagnostic aid in locating
communications bottlenecks.
Often a system will work much better if a problem is resolved at a single point or in a
single component.
For example, even the fastest computer will function poorly on today's Web if the
connection occurs at only 40 to 50 Kbps (kilobits per second).
Integration Testing:
Integration testing, also known as integration and testing (I&T), is a software
development process in which program units are combined and tested as groups in
81
multiple ways.
In this context, a unit is defined as the smallest testable part of an application.
Integration testing can expose problems with the interfaces among program components
before trouble occurs in real-world program execution.
Integration testing is a component of Extreme Programming (XP), a pragmatic method
of software development that takes a meticulous approach to building a product by
means of continual testing and revision.
Test Cases:
82
Module Functionality Test Case Expected Actual Result Priority
Results Results
e Login Navigate To A A P H
r Use case www.sample.com. validation validation ass igh
Click on Submit should be has been
button without as below populated
entering Username “Please as
and Password enter valid expected
Username
&
Password”
Test Navigate To A A Pass High
Username www.sample.com. validation validation
Field Click on Submit should be is shown
button without as below as
filling Password “Please expected
and with valid enter valid
Username Password
or
Password
field cannot
be empty”
Navigate To A A Pass High
www.sample.com. validation validation
Enter both shown as is shown
Username and below “The as
Password wrong username expected
and hit enter entered is
wrong”
Navigate To Vali Main Pass
www.sample.com. date Page /
Enter valid Username Home
Username and and Page has
Password and click Password been
on Submit in database displayed
and if
correct then
show main
page
83
SCREENSHOTS
84
Fig 3: Resource Manager Operation View Trained and Tested Url
Dataset Accuracy in Bar Chart
85
Fig 5: Resource Manager Operation View predicted Dataset
86
Fig 7: Resource Manager Operation Download Predicted Dataset
87
Fig 9: Resource Manager Operation View all remote users
88
CONCLUSION
89
The Internet consumes almost the whole world in the upcoming age, but it is still growing
rapidly. With the growth of the Internet, cybercrimes are also increasing daily using suspicious and
malicious URLs, which have a significant impact on the quality of services provided by the Internet and
industrial companies. Currently, privacy and confidentiality are essential issues on the internet. To
breach the security phases and interrupt strong networks, attackers use phishing emails or URLs that are
very easy and effective for intrusion into private or confidential networks. Phishing URLs simply act as
legitimate URLs. A machine-learning-based phishing system is proposed in this study. A dataset
consisting of 32 URL attributes and more than 11054 URLs was extracted from 11000+websites. This
dataset was extracted from the Kaggle repository and used as a benchmark for research. This dataset
has already been presented in the form of vectors used in machine learning models. Decision tree, linear
regression, random forest, support vector machine, gradient boosting machine, K-Neighbor classifier,
naive Bayes, and hybrid (LR+SVC+DT) with soft and hard voting were applied to perform the
experiments and achieve the highest performance results. The canopy feature selection with cross fold
validation and Grid search hyper parameter optimization techniques are used with LSD Ensemble
model. The proposed approach is evaluated in this study by experimenting with a separate machine
learning models, and then further evaluation of the study was carried out. The proposed approach
successfully achieves its aim with effective efficiency. Future phishing detection systems should
combine list-based machine learning-based systems to prevent and detect phishing URLs more
efficiently.
.
FUTURE SCOPE
The future scope of the proposed phishing detection system highlights several promising
directions for enhancing cybersecurity measures. Firstly, integrating list-based and machine-learning-
based systems can significantly improve the detection and prevention of phishing URLs. By combining
these methodologies, future systems can leverage the strengths of each approach, ensuring more
comprehensive coverage and reducing false positives and negatives. Additionally, the incorporation of
real-time data analysis and continuous learning capabilities will enable the system to adapt to evolving
phishing tactics rapidly. Furthermore, expanding the dataset to include a more diverse range of URLs
and attributes can enhance the model's robustness and generalizability. The integration of advanced
techniques like deep learning and neural networks may also provide more sophisticated detection
capabilities. Finally, developing user-friendly interfaces and automated response mechanisms will
90
ensure that even non-expert users can benefit from advanced phishing protection. By addressing these
areas, future systems can offer more reliable, efficient, and user-centric solutions to combat the ever-
growing threat of cybercrimes.
91
REFERENCES
92
[1] N. Z. Harun, N. Jaffar, and P. S. J. Kassim, ‘‘Physical attributes significant in preserving the social
sustainability of the traditional Malay settlement,’’ in Reframing the Vernacular: Politics, Semiotics,
and Representation. Springer, 2020, pp. 225–238.
[2] D. M. Divakaran and A. Oest, ‘‘Phishing detection leveraging machine learning and deep learning:
A review,’’ 2022, arXiv:2205.07411.
[3] A. Akanchha, ‘‘Exploring a robust machine learning classifier for detecting phishing domains using
SSL certificates,’’ Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada, Tech.
Rep. 10222/78875, 2020.
[4] H. Shahriar and S. Nimmagadda, ‘‘Network intrusion detection for TCP/IP packets with machine
learning techniques,’’ in Machine Intelligence and Big Data Analytics for Cybersecurity Applications.
Cham, Switzerland: Springer, 2020, pp. 231–247.
[5] J. Kline, E. Oakes, and P. Barford, ‘‘A URL-based analysis of WWW structure and dynamics,’’ in
Proc. Netw. Traffic Meas. Anal. Conf. (TMA), Jun. 2019, p. 800.
[6] A. K. Murthy and Suresha, ‘‘XML URL classification based on their semantic structure orientation
for web mining applications,’’ Proc. Comput. Sci., vol. 46, pp. 143–150, Jan. 2015.
[7] A. A. Ubing, S. Kamilia, A. Abdullah, N. Jhanjhi, and M. Supramaniam, ‘‘Phishing website
detection: An improved accuracy through feature selection and ensemble learning,’’ Int. J. Adv.
Comput. Sci. Appl., vol. 10, no. 1, pp. 252–257, 2019.
[8] A. Aggarwal, A. Rajadesingan, and P. Kumaraguru, ‘‘PhishAri: Automatic real-time phishing
detection on Twitter,’’ in Proc. eCrime Res. Summit, Oct. 2012, pp. 1–12.
[9] S. N. Foley, D. Gollmann, and E. Snekkenes, Computer Security—ESORICS 2017, vol. 10492.
Oslo, Norway: Springer, Sep. 2017.
[10] P. George and P. Vinod, ‘‘Composite email features for spam identification,’’ in Cyber Security.
Singapore: Springer, 2018, pp. 281–289.
[11] H. S. Hota, A. K. Shrivas, and R. Hota, ‘‘An ensemble model for detecting phishing attack with
proposed remove-replace feature selection technique,’’ Proc. Comput. Sci., vol. 132, pp. 900–907, Jan.
2018.
[12] G. Sonowal and K. S. Kuppusamy, ‘‘PhiDMA—A phishing detection model with multi-filter
approach,’’ J. King Saud Univ., Comput. Inf. Sci., vol. 32, no. 1, pp. 99–112, Jan. 2020.
[13] M. Zouina and B. Outtaj, ‘‘A novel lightweight URL phishing detection system using SVM and
similarity index,’’ Hum.-Centric Comput. Inf. Sci., vol. 7, no. 1, p. 17, Jun. 2017.
93
[14] R. Ø. Skotnes, ‘‘Management commitment and awareness creation—ICT safety and security in
electric power supply network companies,’’ Inf. Comput. Secur., vol. 23, no. 3, pp. 302–316, Jul. 2015.
[15] R. Prasad and V. Rohokale, ‘‘Cyber threats and attack overview,’’ in Cyber Security: The Lifeline
of Information and Communication Technology. Cham, Switzerland: Springer, 2020, pp. 15–31.
[16] T. Nathezhtha, D. Sangeetha, and V. Vaidehi, ‘‘WC-PAD: Web crawling-based phishing attack
detection,’’ in Proc. Int. Carnahan Conf. Secur. Technol. (ICCST), Oct. 2019, pp. 1–6.
[17] R. Jenni and S. Shankar, ‘‘Review of various methods for phishing detection,’’ EAI Endorsed
Trans. Energy Web, vol. 5, no. 20, Sep. 2018, Art. no. 155746.
[18] (2020). Accessed: Jan. 2020. [Online]. Available: https://catches-of-the-month-phishing-scams-
for-january-2020
[19] S. Bell and P. Komisarczuk, ‘‘An analysis of phishing blacklists: Google Safe Browsing,
OpenPhish, and PhishTank,’’ in Proc. Australas. Comput. Sci. Week Multiconf. (ACSW), Melbourne,
VIC, Australia. New York, NY, USA: Association for Computing Machinery, 2020, pp. 1–11, Art. no.
3, doi: 10.1145/3373017.3373020.
[20] A. K. Jain and B. Gupta, ‘‘PHISH-SAFE: URL features-based phishing detection system using
machine learning,’’ in Cyber Security. Switzerland: Springer, 2018, pp. 467–474.
[21] Y. Cao, W. Han, and Y. Le, ‘‘Anti-phishing based on automated individual white-list,’’ in Proc.
4th ACM Workshop Digit. Identity Manage., Oct. 2008, pp. 51–60.
[22] G. Diksha and J. A. Kumar, ‘‘Mobile phishing attacks and defence mechanisms: State of art and
open research challenges,’’ Comput. Secur., vol. 73, pp. 519–544, Mar. 2018.
[23] M. Khonji, Y. Iraqi, and A. Jones, ‘‘Phishing detection: A literature survey,’’ IEEE Commun.
Surveys Tuts., vol. 15, no. 4, pp. 2091–2121, 4th Quart, 2013.
[24] S. Sheng, M. Holbrook, P. Kumaraguru, L. F. Cranor, and J. Downs, ‘‘Who falls for phish? A
demographic analysis of phishing susceptibility and effectiveness of interventions,’’ in Proc. SIGCHI
Conf. Hum. Factors Comput. Syst., Apr. 2010, pp. 373–382.
[25] P. Prakash, M. Kumar, R. R. Kompella, and M. Gupta, ‘‘PhishNet: Predictive blacklisting to detect
phishing attacks,’’ in Proc. IEEE INFOCOM, Mar. 2010, pp. 1–5.
94