0% found this document useful (0 votes)
39 views101 pages

Phishing Detection

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views101 pages

Phishing Detection

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 101

AURORA’S PG COLLEGE (MBA)

(Autonomous)
Accredited by NAAC with A+ Grade
Ramanthapur, Hyderabad, Telangana – 500013

PROJECT REPORT ON

Phishing detection system through hybrid machine learning


based on URL

AT

Manac Infotech Pvt lmt.

Project Report submitted in partial fulfilment of the requirements for the award of the Degree of

MASTER OF COMPUTER APPLICATIONS

Submitted by
THAKUR VINEETHA BAI
(H.T No. 1302-23-862-133)

Under the Supervision of

MR.DEVENDER RAO

2023 – 2025 Batch


AURORA’S POST-GRADUATE COLLEGE (MCA)
Ramanthapur, Hyderabad – 500 013

CERTIFICATE
This is to certify that the project work entitled

Phishing Detection System Through Hybrid Machine Learning Based on URL

Is a bonafide work done by

Thakur Vineetha Bai


1302-23-862-133

as part of the curriculum in the

DEPARTMENT OF COMPUTER SCIENCE


AURORA’S POST GRADUATE COLLEGE (MBA)
Ramanthapur, Hyderabad – 500013

In partial fulfillment of requirement for award of

Master of Computer Applications

Osmania University, Hyderabad

This work has been carried out under our guidance.

Internal Guide Head of Department Principal

Internal Examiner External Examiner


CERTIFICATE
DECLARATION

I, Thakur Vineetha Bai here by declare that the project entitled “Phishing
Detection System Through Hybrid Machine Learning Based on URL”
has been carried out by me in the Manac Infotech. This project is submitted
to Osmania University Hyderabad in partial fulfillment of requirements for the
award of the degree of “MASTERS OF COMPUTER APPLICATIONS”. The
results embodied in this dissertation have not been submitted to any other
University or institution for the award of Degree or Diploma.

(Signature of the student)


Thakur Vineetha Bai
1302-23-862-133
ACKNOWLEDGEMENT

I would like to express deep gratitude and respect to all those people behind the
scene who guided, inspired in the completion of this project work.

I wish to convey my sincere thanks to Principal and Head of Department Computer


Science, our project in charge Mr.MOHAMMED ISMAIL and project guide
MR.G.SRIKANTH for giving me the required guidance during this project work.

Last but not least I am very thankful to the faculty members of my college and friends
for their suggestions and help me successfully completing this project.

THAKUR VINEETHA BAI

1302-23-862-133
TABLE OF CONTENTS
TITLE Page No’s

1. Organization Profile 1-3

2. Abstract 4-5

3. SYSTEM ANALYSIS 6-11


3.1 Existing System
3.2 Proposed System
3.3 Feasibility study
4. SYSTEM REQUIREMENTS 12-14
4.1 Functional Requirements
4.2 Non-functional Requirements

5. SYSTEM DESIGN 15-28


5.1 SYSTEM ARCHITECTURE
5.2 UML Diagrams
5.2.1 Usecase Diagram
5.2.2 Class Diagram
5.2.3 Sequence Diagram
5.2.4 Collabration Diagram
5.2.5 Activity Diagram
5.2.6 Component Diagram
5.2.7 Deployment Diagram
5.2.8 ER Diagram
5.2.9 Data dictionary

5.3 INPUT AND OUTPUT DESIGN

6. IMPLEMENTATION 29-79
6.1 MODULES
6.1.1 Module Description
6.2 SOFTWARE ENVIRONMENT
6.2.1 PYTHON
6.2.2 Source Code
7. TESTING AND RESULT 80-88
7.1 System Test
7.2 Output Screens

8. CONCLUSION 89-91

9. REFERENCES/BIBLIOGRAPHY 92-94
ORGANIZATION PROFILE

1
1. ORGANIZATION PROFILE:

Client Organization: MANAC Infotech Pvt. Ltd.


About the Organization:
MANAC Infotech Pvt. Ltd. (MIPL) is a forward-thinking and innovative IT company founded in 1998,
originally serving as a training partner for TCS iON, a subsidiary of Tata Consultancy Services (TCS
Ltd.). MANAC specializes in multiple domains, including:
 Software Development
 Staffing and Recruitment
 IT Training and Education
Core Focus Areas:
MIPL is renowned for its contributions to the education and training sector, offering technology-driven
solutions such as online assessments, analytics, ERP systems for training companies, and online course
delivery platforms. Their training programs are tailored to meet the needs of IT companies, with a
proven track record of producing skilled professionals now working in top IT roles globally.
Vision:
To become a globally recognized center of excellence in IT education and training, producing industry
ready professionals who lead with innovation and integrity.
Mission:
To empower individuals with essential technical knowledge, industry-relevant skills, and the
confidence to excel in a dynamic world through high-quality education and continuous learning
opportunities.
Developer Profile: Mr. Fazal Ur Rahman
Position: Senior Faculty and Software Developer
Experience: 14+ years in software development and training
Technical Expertise:
 Programming: Python, Java, C#.NET, ASP.NET
 Web Technologies: HTML5, CSS3, JavaScript
 Frameworks & Tools: Django, Android, SQL, Data Science
Project Experience:
Mr. Fazal Ur Rahman has developed real-time applications for various clients both in India and

2
internationally. His recent work includes cloud-based applications and data-secure platforms for
educational institutions, fintech, and other sectors.
Associated Organizations:
 VISIONARY GROUP OF COLLEGES https://visionaryedu.in
 TOMEINTERNATIONALSCHOOL https://tomeinternationalschool.com
 BLUEX TECHNO https://bluextechno.com
 REPX https://repx.in
 INFINITE INNOVATIVE https://infiniteinnovative.com
 INTELLIVISION https://intellivisioninternational.com These projects
showcase his ability to blend academic knowledge with practical software solutions tailored to
organizational needs.

Role in Project:
As the lead developer and domain expert, Mr. Rahman has overseen the secure design, encryption
model, and cloud architecture involved in this project, ensuring both innovation and data protection.

3
ABSTRACT

4
2. ABSTRACT
PHISHING DETECTION SYSTEM THROUGH HYBRID
MACHINE LEARNING BASED ON URL
Currently, numerous types of cybercrime are organized through the internet. Hence, this
study mainly focuses on phishing attacks. Although phishing was first used in 1996, it has
become the most severe and dangerous cybercrime on the internet. Phishing utilizes email
distortion as its underlying mechanism for tricky correspondences, followed by mock sites, to
obtain the required data from people in question. Different studies have presented their work on
the precaution, identification, and knowledge of phishing attacks; however, there is currently no
complete and proper solution for frustrating them. Therefore, machine learning plays a vital role
in defending against cybercrimes involving phishing attacks. The proposed study is based on the
phishing URL-based dataset extracted from the famous dataset repository, which consists of
phishing and legitimate URL attributes collected from 11000+ website datasets in vector form.
After preprocessing, many machine learning algorithms have been applied and designed to
prevent phishing URLs and provide protection to the user. This study uses machine learning
models such as decision tree (DT), linear regression (LR), random forest (RF), naive Bayes
(NB), gradient boosting classifier (GBM), K-neighbors classifier (KNN), support vector
classifier (SVC), and proposed hybrid LSD model, which is a combination of logistic regression,
support vector machine, and decision tree (LR+SVC+DT) with soft and hard voting, to defend
against phishing attacks with high accuracy and efficiency. The canopy feature selection
technique with cross fold valoidation and Grid Search Hyperparameter Optimization techniques
are used with proposed LSD model. Furthermore, to evaluate the proposed approach, different
evaluation parameters were adopted, such as the precision, accuracy, recall, F1-score, and
specificity, to illustrate the effects and efficiency of the models. The results of the comparative
analyses demonstrate that the proposed approach outperforms the other models and achieves the
best results.

5
SYSTEM ANALYSIS

6
3. SYSTEMAN ALYSIS

3.1 EXISTING SYSTEM:


Applied Behavior Analysis (ABA) is an intervention method in which pedagogical
strategies derived from the principles of behavior are systematically applied to promote socially
significant behaviors and reduce problem behaviors [4]. The set of basic principles, which are
statements about how environmental variables act as input to a function of behavior, have been
evaluated scientifically by experimental analyses of behaviors (p.155). In ABA, behavior is
viewed as the learner’s interaction with his or her surrounding environment and involves the
movement of some part(s) of the learner’s body. Learning behavior occurs within the
environmental context. At the same time, the learning environment is regarded as the full set of
physical circumstances in which the learner is situated.
The learning outcome of ABA lessons is the achievement of behavior changes that
improve learners’ quality of life in communication and daily living skills. A systematic and
measurable behavior assessment scheme is defined before the ABA lessons. The target behavior
is often broken down into smaller tasks, while positive reinforcements are often used to
encourage goal achievement. Assessment criteria include whether the target task is achieved
(plus) or not (minus), whether a prompt from the therapist (prompt) is needed to facilitate task
achievement, or if the student is behaving in a way that is unrelated to the task (off task).
Furthermore, behavior change is effective if it is durable over time [11]. Therefore, a subsequent
follow-up reassessment of the developed behavior is needed to ensure the effectiveness of the
therapy.
Students with special needs can be susceptible to ambient environmental conditions due
to their dysfunction in sensory processing. A previous study showed that high levels of CO2
content caused fatigue and difficulties in concentration in SEN students, especially those with
ADHD [12]. Another study performed with intellectually disabled preschool students revealed
that classroom thermal discomfort (e.g., high nearby ambient temperature) could distract them
from learning and influence their mood and health [13]. The same study also suggested that
students with intellectual disabilities (ID) are more vulnerable to acoustic discomforts due to
their psychologically stressful conditions (p.115). Researchers also studied the relationship
between classroom lighting and SEN students’ comfort. They found that inappropriate lighting

7
and glare affect individual SEN students to different extents, while they felt tired and irritated
because of lighting discomfort, in general [14]. However, teachers and therapists often have no
control over lighting characteristics except switching on or off (p.105).
Emotion can affect learning and engagement in students with and without SEN. In
particular, students with ID often exhibit anxiety due to internal stress. Blood pressure, body
temperature, and heart rate are physiological markers for stress that hinder learning [15]. It was
shown that mild conditions could reduce these inhibitors in SEN students [16]. It is known that
abnormally high or low levels of skin conductance (measured through galvanic skin response,
GSR) hindered the learning performance of SEN students [17]. Besides, a study also found that
body movement facilitated by motion-based technology positively impacted SEN students’
short-termmemory skills.
MMLA employs multiple sources and formats of educational data such as activity logs,
audio, video and biosensors to enrich learning analytics [19]. MMLA is significantly enhanced
by the Internet of Things (IoT) technologies because the latter allows convenient capturing of
multimodal data from the complex learning environment [20]. Multimodal educational data
collected by IoT sensors include those detecting learners’ motion (e.g., head and body) and
physiological (e,g., heart, brain, and skin) behavior, as well as those measuring the ambient
learning environment (e.g., light, humidity, temperature, and noise). These data were collected
from physical objects or human bodies, then encoded into a machine-interpretable format and
served as input to MMLA [21]. Possible interpretations of the observed learning process can be
assigned based on validated learning theories.

Disadvantages:

1. The complexity of data: Most of the existing machine learning models must be able to
accurately interpret large and complex datasets to detect phishing urls.
2. Data availability: Most machine learning models require large amounts of data to create
accurate predictions. If data is unavailable in sufficient quantities, then model accuracy
may suffer.
3. Incorrect labeling: The existing machine learning models are only as accurate as the data
trained using the input dataset. If the data has been incorrectly labeled, the model cannot
make accurate predictions.

8
3.2 PROPOSED SYSTEM:

 Phishing URL-based cyberattack detection is proposed in this study to prevent crime and
protect people’s privacy.
 The dataset consists of 11000+ phishing URL attributes that help classify phishing URLs
based on these attributes.
 Machine learning models have been applied, such as decision tree (DT), linear regression
(LR), naive Bayes (NB), random forest (RF), gradient boosting machine (GBM), support
vector classifier (SVC), K-Neighbors classifier (KNN), and the proposed hybrid model
(LR+SVC+DT) LSD with soft and hard voting, which can accurately classify the threats
of phishing URLs.
 Cross-fold validation with a grid search parameter based on the canopy feature selection
technique was used with the proposed LSD hybrid model to improve prediction results.
 The proposed methodology must be evaluated using evaluation parameters, such as
accuracy, precision, recall, specificity, and F1-score.
Advantages:
 The classification of phishing URLs was implemented using machine learning
algorithms. Cybercrimes are growing with the growth of Internet architecture
worldwide, which needs to provide a security mechanism to prevent an attacker from
getting confidential content by breaching the network through fake and malicious
URLs. A phishing dataset was used to perform the experiments.
 The dataset is in the form of data vectors that require null-value removal to remove
unnecessary empty values. Multiple machine learning algorithms, such as decision
tree (DT), linear regression (LR), naive Bayes (NB), random forest (RF), gradient
boosting machine (GBM), support vector classifier (SVC), K-neighbors classifier,
and the proposed hybrid model (LR+SVC+DT) LSD with soft and hard voting were
used based on functional features.

9
3.3 FEASIBILITY STUDY

FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is put
forth with a very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the major
requirements for the system is essential.

Three key considerations involved in the feasibility analysis are

 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development of
the system is limited. The expenditures must be justified. Thus, the developed system was well
within the budget, and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased.

TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that is, the technical requirements of
the system. Any system developed must not have a high demand on the available technical
resources. This will lead to high demands being placed on the client. The developed system must
have modest requirements, as only minimal or no changes are required for implementing this
system.

10
SOCIAL FEASIBILITY

The aspect of the study is to check the level of acceptance of the system by the user. This includes
the process of training the user to use the system efficiently. The user must not feel threatened by the
system; instead, they must accept it as a necessity. The level of acceptance by the users solely
depends on the methods that are employed to educate them about the system and to make them
familiar with it. Their level of confidence must be raised so that they are also able to make some
constructive criticism, which is welcomed, as they are the final users of the system.

11
SYSTEM REQUIREMENTS

12
4.1 FUNCTIONAL REQUIREMENTS

Functional requirements will vary for different types of software. For example, functional
requirements for a website or mobile application should define user flows and various interaction
scenarios.

Themajormodules of theproject are

1. Resource manager
2. Interactive User

4.2 NON-FUNCTIONAL REQUIREMENTS

Nonfunctional requirements are not related to the system's functionality but rather define how
the system should perform. They are crucial for ensuring the system's usability, reliability, and
efficiency, often influencing the overall user experience. We’ll describe the main categories of
nonfunctional requirements in detail further on

HARDWARE REQUIREMENTS:
MINIMUM (Required for Execution) MY SYSTEM (Development)

System Pentium IV 2.2 GHz i3 Processor 5th Gen

Hard Disk 20 Gb 512 Gb

Ram 1 Gb 4 Gb

13
SOFTWARE REQUIREMENTS
Operating System Windows 10/11

Development Software Python 3.7.0

Programming Language Python

Integrated Development Environment (IDE) Python IDE

Front End Technologies HTML5, CSS3, Java Script

Database Language SQL

Database MySQL

Database Software Wamp Server(MySQL)

Web Server or Deployment Server Apache tomcat

Design/Modelling Rational Rose

Framework Django

Graphical User Interface (Database) SQLYog 6.56 Enterprise

14
SYSTEM DESIGN

15
5.1 SYSTEM ARCHITECTURE:

5.2 UML DIAGRAMS:


UML stands for Unified Modeling Language
Language.. UML is a standardized, general-purpose
general
modeling language in the field of object
object-oriented
oriented software engineering. The standard is managed

16
and was created by the Object Management Group (OMG).
The goal is for UML to become a common language for creating models of object-
oriented computer software. In its current form, UML is comprised of two major components: a
meta-model and a notation. In the future, some form of method or process may also be added
to, or associated with, UML.
The Unified Modeling Language is a standard language for specifying, visualizing,
constructing, and documenting the artifacts of software systems, as well as for business
modeling and other non-software systems.
UML represents a collection of best engineering practices that have proven successful in
the modeling of large and complex systems.
UML is a very important part of developing object-oriented software and the overall
software development process. It uses mostly graphical notations to express the design of
software projects.
.

GOALS:
The primary goals in the design of UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling language so that they can develop and
exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development processes.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of the object-oriented (OO) tools market.
6. Support higher-level development concepts such as collaborations, frameworks, patterns, and
components.
7. Integrate best practices.

17
5.2.1 USE CASEDIAGRAM:
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use
Use-case
case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented as
use cases), and any dependencies between those use cases.The main purpose of a use case
diagramss to show what system functions are performed for which actor. Roles of the actors in the
system can be depicted.

18
5.2.2 CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modeling Language(UML) is a
type of static structure diagram that describes the structure of a system by showing the system's
classes,their attributes , operations (or methods), and the relationships among the classes.It
explains which Class contains information.

19
5.2.3 SEQUENCE DIAGRAM:
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction
diagram that’s how show processes operate with one another and in what order. It is a construct
of a Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event
scenarios, and timing diagrams.

20
5.2.4 COLLABRATION DIAGRAM

A collaboration diagram, also known as a communication diagram, is an illustration of the


relationships and interactions among software objects in the Unified Modeling Language (UML).
Developers can use these diagrams to portray the dynamic behavior of a particular use case and
define the role of each object.

21
5.2.5
.2.5 ACTIVITYDIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise activities and
a
actions with support for choice, iteration and concurrency. In the Unified Modeling Language
,activity diagrams can be used to describe the business and operational step
step-by-step
step workflows of
components in a system. An activity diagram shows the overall flow of control.

22
5.2.6COMPONENT
COMPONENT DIAGRAM:

Component diagrams are used in modeling the physical aspects of object


object-oriented
oriented systems that
are used for visualizing, specifying, and documenting component
component-based
based systems and also for
constructing executable systems through forward and reverse engineering. Component diagrams are
essentially class diagrams that focus on a system's components that often used to model the static
implementation view of a system.

23
5.2.7DEPLOYMENT DIAGRAM:

Deployment diagrams are used to visualize the topology of the physical components of a
system, where the software components are deployed. Deployment diagrams are used to describe the
static deployment view of a system. Deployment diagrams consist of nodes and their relationships.

24
5.2.8 E-R DIAGRAM
An Entity Relationship Diagram is a diagram that represents relationships among entities in a
database.. It is commonly known as an ER Diagram. An ER Diagram in DBMS plays a crucial role in
designing the database. Today's business world previews all the requirements demanded by the users in
the form of an ER Diagram.

25
5.2.9 DATA DICTIONARY

Database: phishing_detection_system

Table name: auth_group

Column Data Type Constraints Description

id Int(11) Primary key Unique Identifier

name Varchar(1000) Not null name

Table Name: Auth_group_Permission

Column Data Type Constraints Description

id Int(11) Primary key Unique Identifier

Group id Int(11) Primary Key Unique Identifier

Permission_id Int(11) Primary Key Unique Identifier

Table Name: auth_permission

Column Data Type Constraints Description

id Int(11) Primary key Unique Identifier

Name Int(255) Not Null name

Content_type_id Int(11) Primary Key Unique Identifier

Code name Int(100) Not Null name

26
Table Name: auth_User

Column Data Type Constraints Description

id Int(11) Primary key Unique Identifier

Password varchar(128) Not Null password

Last_Login Datetime(6) Not Null Last login

Is_superuser tinyint(1) Not Null Name

username Varchar(150) Not Null Username

lastname Varchar(30) Not Null Lastname

email Varchar(150) Not Null Email id

Is_staff Tinyint(1) Not Null Staff

Is_active Tinyint(1) Not Null Active

Date_joined Datetime(6) Not Null Date and time

Table Name: auth_user_groups


Column Data Type Constraints Description

id Int(11) Primary key Unique Identifier

User_id Int(11) Primary Key Unique Identifier

Group_id Int(11) Primary Key Unique Identifier

27
5.3 INPUT/OUTPUT DESIGN

Input design: considering the requirements, procedures to collect the necessary input data in
most efficiently designed. The input design has been done keeping in view that, the interaction
of the user with the system being the most effective and simplified way.
Also the measures are taken for the following
 Controlling the amount of input
 Avoid unauthorized access to the classroom.
 Eliminate extra steps.
 Keep the process simple.
 At this stage, the input forms and screens are designed.

Output design: All the screens of the system are designed with a view to provide the user with easy
operations in a simpler and efficient way, with the minimum keystrokes possible. Instructions and
important information are emphasized on the screen. Almost every screen is provided with error-free
and important messages, and option selection facilities. Emphasis is given to speedy processing and
quick transactions between the screens. Each screen is designed to be as user-friendly as possible by
using interactive procedures. So to say, the user can operate the system without much help from the
operating manual.

28
IMPLEMENTATION

29
8.1 MODULE

The major modules of the project are

1. Resource manager
2. Interactive User

MODULE DESCRIPTION

Operations Manager
In this module, the Service Provider has to login by using valid user name and password.
After login successful he can do some operations such as Browse URL Data Sets and
Train & Test, View Trained and Tested URL Data Sets Accuracy in Bar Chart, View
Trained and Tested URL Data Sets Accuracy Results, View Prediction Of URL Type
View URL Type Ratio, Download Predicted Data Sets, View URL Type Ratio Results,
View All Remote Users.

Interactive User
In this module, there are n numbers of users are present. User should register before
doing any operations. Once user registers, their details will be stored to the database.
After registration successful, he has to login by using authorized user name and
password. Once Login is successful user will do some operations Like Register And
Login, Predict Url Type, View Your Profile.

8.2
PYTHON

Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is


designed to be highly readable. It uses English keywords frequently where as other languages use
punctuation, and it has fewer syntactical constructions than other languages.

 Python is Interpreted: Python is processed at runtime by the interpreter. You do not need to
compile your program before executing it. This is similar to PERL and PHP.

30
 Python is Interactive: You can actually sit at a Python prompt and interact with the interpreter
directly to write your programs.

 Python is Object-Oriented: Python supports Object-Oriented style or technique of


programming that encapsulates code within objects.

 Python is a Beginner's Language: Python is a great language for the beginner-level


programmers and supports the development of a wide range of applications from simple text
processing to WWW browsers to games.

1.2 History of Python

Python was developed by Guido van Rossum in the late eighties and early nineties at the National
Research Institute for Mathematics and Computer Science in the Netherlands.

Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
SmallTalk, and Unix shell and other scripting languages.

Python is copyrighted. Like Perl, Python source code is now available under the GNU General Public
License (GPL).

Python is now maintained by a core development team at the institute, although Guido van Rossum
still holds a vital role in directing its progress.

1.3 Python Features


Python's features include:

 Easy-to-learn: Python has few keywords, simple structure, and a clearly defined syntax. This
allows the student to pick up the language quickly.

 Easy-to-read: Python code is more clearly defined and visible to the eyes.

 Easy-to-maintain: Python's source code is fairly easy-to-maintain.

 A broad standard library: Python's bulk of the library is very portable and cross-platform
compatible on UNIX, Windows, and Macintosh.

31
 Interactive Mode: Python has support for an interactive mode which allows interactive testing
and debugging of snippets of code.

 Portable: Python can run on a wide variety of hardware platforms and has the same interface
on all platforms.

 Extendable: You can add low-level modules to the Python interpreter. These modules enable
programmers to add to or customize their tools to be more efficient.

 Databases: Python provides interfaces to all major commercial databases.

 GUI Programming: Python supports GUI applications that can be created and ported to many
system calls, libraries and windows systems, such as Windows MFC, Macintosh, and the X
Window system of Unix.

 Scalable: Python provides a better structure and support for large programs than shell scripting.

Python has a big list of good features:

 It supports functional and structured programming methods as well as OOP.

 It can be used as a scripting language or can be compiled to byte-code for building large
applications.

 It provides very high-level dynamic data types and supports dynamic type checking.

 IT supports automatic garbage collection.

 It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.

32
2.1 ARITHMETIC OPERATORS

Operator Description Example

+ Addition Adds values on either side of the operator. a+b=


30

- Subtraction Subtracts right hand operand from left hand operand. a–b=-
10

* Multiplies values on either side of the operator a*b=


Multiplication 200

/ Division Divides left hand operand by right hand operand b/a=2

33
% Modulus Divides left hand operand by right hand operand and b%a=
returns remainder 0

** Exponent Performs exponential (power) calculation on operators a**b =10


to the
power 20

// Floor Division - The division of operands where the 9//2 = 4


result is the quotient in which the digits after the and
decimal point are removed. But if one of the operands 9.0//2.0
is negative, the result is floored, i.e., rounded away = 4.0, -
from zero (towards negative infinity): 11//3 = -
4, -
11.0//3 =
-4.0

2.2 ASSIGNMENT OPERATOR

Operator Description Example

= Assigns values from right side operands to left c=a+b


side operand assigns
value of a
+ b into c

+= Add AND It adds right operand to the left operand and c += a is


assign the result to left operand equivalent
to c = c +
a

-= Subtract It subtracts right operand from the left operand c -= a is


AND and assign the result to left operand equivalent
to c = c -

34
a

*= Multiply It multiplies right operand with the left operand c *= a is


AND and assign the result to left operand equivalent
to c = c *
a

/= Divide It divides left operand with the right operand and c /= a is


AND assign the result to left operand equivalent
to c = c /
ac /= a is
equivalent
to c = c /
a

%= Modulus It takes modulus using two operands and c %= a is


AND assign the result to left operand equivalent to c = c
%a

**= Performs exponential (power) calculation c **= a is


Exponent on operators and assign value to the left equivalent to c = c
AND operand ** a

//= Floor It performs floor division on operators c //= a is


Division and assign value to the left operand equivalent to c = c
// a

35
2.3 IDENTITY OPERATOR

Operator Description Example

is Evaluates to true if the variables on x is y,


either side of the operator point to the here is results
same object and false otherwise. in 1 if id(x)
equals id(y).

is not Evaluates to false if the variables on x is not y,


either side of the operator point to the here is
same object and true otherwise. not results in
1 if id(x) is
not equal to
id(y

2.4 COMPARISON OPERATOR

Operator Description Example

& Binary AND Operator copies a bit to the result if it exists in both (a & b)
operands (means
0000 1100)

| Binary OR It copies a bit if it exists in either operand. (a | b) = 61


(means
0011 1101)

^ Binary XOR It copies the bit if it is set in one operand but not both. (a ^ b) = 49
(means
0011 0001)

~ Binary Ones It is unary and has the effect of 'flipping' bits. (~a ) = -61
(means

36
Complement 1100 0011
in 2's
complement
form due to
a signed
binary
number.

<< Binary Left Shift The left operands value is moved left by the number of bits a << 2 =
specified by the right operand. 240 (means
1111 0000)

>> Binary Right The left operands value is moved right by the number of a >> 2 = 15
Shift bits specified by the right operand. (means
0000 1111)

2.5 LOGICAL OPERATOR

Operator Description Example

and Logical If both the operands are true then condition (a and b)
AND becomes true. is true.

or Logical OR If any of the two operands are non-zero then (a or b)


condition becomes true. is true.

not Logical Used to reverse the logical state of its operand. Not(a
NOT and b) is
false.

2.6 Membership Operators

Operator Description Example

in Evaluates to true if it finds a variable in the specified x in y, here in


sequence and false otherwise. results in a 1 if x
is a member of

37
sequence y.

not in Evaluates to true if it does not finds a variable in the x not in y, here
specified sequence and false otherwise. not in results in a
1 if x is not a
member of
sequence y.

Python Operators Precedence


Operator Description

** Exponentiation (raise to the power)

~+- Complement, unary plus and minus (method names for the last two are
+@ and -@)

* / % // Multiply, divide, modulo and floor division

+- Addition and subtraction

>><< Right and left bitwise shift

& Bitwise 'AND'

^| Bitwise exclusive `OR' and regular `OR'

<= <>>= Comparison operators

<> == != Equality operators

= %= /= //= -= += *= Assignment operators

38
**=

is is not Identity operators

in not in Membership operators

not or and Logical operators

3.1 LIST

The list is a most versatile data type available in Python which can be written as a list of comma-
separated values (items) between square brackets. Important thing about a list is that items in a list need
not be of the same type.

Creating a list is as simple as putting different comma-separated values between square brackets. For
example −

list1 =['physics','chemistry',1997,2000];
list2 =[1,2,3,4,5];

list3 =["a","b","c","d"]

Basic List Operations


Lists respond to the + and * operators much like strings; they mean concatenation and repetition here
too, except that the result is a new list, not a string.

Python Expression Results Description

len([1, 2, 3]) 3 Length

[1, 2, 3] + [4, 5, 6] [1, 2, 3, 4, 5, 6] Concatenation

39
['Hi!'] * 4 ['Hi!', 'Hi!', 'Hi!', 'Hi!'] Repetition

3 in [1, 2, 3] True Membership

for x in [1, 2, 3]: print x, 123 Iteration

Built-in List Functions & Methods:


Python includes the following list functions −

SN Function with Description

1 cmp(list1, list2)

Compares elements of both lists.

2 len(list)

Gives the total length of the list.

3 max(list)

Returns item from the list with max value.

4 min(list)

Returns item from the list with min value.

5 list(seq)

Converts a tuple into list.

40
Python includes following list methods

SN Methods with Description

1 list.append(obj)

Appends object obj to list

2 list.count(obj)

Returns count of how many times obj occurs in list

3 list. extend(seq)

Appends the contents of seq to list

4 list.index(obj)

Returns the lowest index in list that obj appears

5 list.insert(index, obj)

Inserts object obj into list at offset index

6 list.pop(obj=list[-1])

Removes and returns last object or obj from list

7 list.remove(obj)

Removes object obj from list

8 list.reverse()

Reverses objects of list in place

9 list.sort([func])

41
Sorts objects of list, use compare function if given

3.2 TUPLES
A tuple is a sequence of immutable Python objects. Tuples are sequences, just like lists. The
differences between tuples and lists are, the tuples cannot be changed unlike lists and tuples use
parentheses, whereas lists use square brackets.

Creating a tuple is as simple as putting different comma-separated values. Optionally we can put these
comma-separated values between parentheses also. For example −

tup1 =('physics','chemistry',1997,2000);

tup2 =(1,2,3,4,5);

tup3 ="a","b","c","d";

The empty tuple is written as two parentheses containing nothing −

tup1 =();

To write a tuple containing a single value you have to include a comma, even though there is only one
value −

tup1 =(50,);

Like string indices, tuple indices start at 0, and they can be sliced, concatenated, and so on.

 Accessing Values in Tuples:


To access values in tuple, use the square brackets for slicing along with the index or indices to obtain
value available at that index. For example –

tup1 =('physics','chemistry',1997,2000);

tup2 =(1,2,3,4,5,6,7);

print"tup1[0]: ", tup1[0]

print"tup2[1:5]: ", tup2[1:5]

When the code is executed, it produces the following result −

tup1[0]: physics

42
tup2[1:5]: [2, 3, 4, 5]

Updating Tuples:
Tuples are immutable which means you cannot update or change the values of tuple elements. We are
able to take portions of existing tuples to create new tuples as the following example demonstrates −

tup1 =(12,34.56);
tup2 =('abc','xyz');
tup3 = tup1 + tup2;
print tup3

When the above code is executed, it produces the following result −

(12, 34.56, 'abc', 'xyz')

Delete Tuple Elements:


Removing individual tuple elements is not possible. There is, of course, nothing wrong with putting
together another tuple with the undesired elements discarded.

To explicitly remove an entire tuple, just use the del statement. For example:

tup =('physics','chemistry',1997,2000);
print tup
del tup;
print"After deleting tup : "
print tup

Basic Tuples Operations:

Python Expression Results Description

len((1, 2, 3)) 3 Length

43
(1, 2, 3) + (4, 5, 6) (1, 2, 3, 4, 5, 6) Concatenation

('Hi!',) * 4 ('Hi!', 'Hi!', 'Hi!', 'Hi!') Repetition

3 in (1, 2, 3) True Membership

for x in (1, 2, 3): print x, 123 Iteration

Built-in Tuple Functions

SN Function with Description

1
cmp(tuple1, tuple2):Compares elements of both tuples.

2
len(tuple):Gives the total length of the tuple.

3
max(tuple):Returns item from the tuple with max value.

4
min(tuple):Returns item from the tuple with min value.

5
tuple(seq):Converts a list into tuple.

3.2 DICTIONARY
Each key is separated from its value by a colon (:), the items are separated by commas, and the whole
thing is enclosed in curly braces. An empty dictionary without any items is written with just two curly
braces, like this: {}.

Keys are unique within a dictionary while values may not be. The values of a dictionary can be of any
type, but the keys must be of an immutable data type such as strings, numbers, or tuples.

44
Accessing Values in Dictionary:
To access dictionary elements, you can use the familiar square brackets along with the key to obtain its
value. Following is a simple example −

dict ={'Name':'Zara','Age':7,'Class':'First'}

print"dict['Name']: ", dict['Name']


print"dict['Age']: ", dict['Age']

Result –

dict['Name']: Zara
dict['Age']: 7

Updating Dictionary
We can update a dictionary by adding a new entry or a key-value pair, modifying an existing entry, or
deleting an existing entry as shown below in the simple example −

dict ={'Name':'Zara','Age':7,'Class':'First'}

dict['Age']=8;# update existing entry


dict['School']="DPS School";# Add new entry
print"dict['Age']: ", dict['Age']
print"dict['School']: ", dict['School']

Result −

dict['Age']: 8
dict['School']: DPS School

Delete Dictionary Elements


We can either remove individual dictionary elements or clear the entire contents of a dictionary. You
can also delete entire dictionary in a single operation.

To explicitly remove an entire dictionary, just use the del statement. Following is a simple example –

dict ={'Name':'Zara','Age':7,'Class':'First'}

45
del dict['Name'];# remove entry with key 'Name'
dict.clear();# remove all entries in dict
del dict ;# delete entire dictionary

print"dict['Age']: ", dict['Age']


print"dict['School']: ", dict['School']

Built-in Dictionary Functions & Methods –


Python includes the following dictionary functions −

SN Function with Description

1 cmp(dict1, dict2)

Compares elements of both dict.

2 len(dict)

Gives the total length of the dictionary. This would be equal to the number of items in
the dictionary.

3 str(dict)

Produces a printable string representation of a dictionary

4 type(variable)

Returns the type of the passed variable. If passed variable is dictionary, then it would

46
return a dictionary type.

Python includes following dictionary methods −

SN Methods with Description

1 dict.clear():Removes all elements of dictionary dict

2 dict. Copy():Returns a shallow copy of dictionary dict

3 dict.fromkeys():Create a new dictionary with keys from seq and values set to value.

4 dict.get(key, default=None):For key key, returns value or default if key not in


dictionary

5 dict.has_key(key):Returns true if key in dictionary dict, false otherwise

6 dict.items():Returns a list of dict's (key, value) tuple pairs

7 dict.keys():Returns list of dictionary dict's keys

8 dict.setdefault(key, default=None):Similar to get(), but will set dict[key]=default


if key is not already in dict

9 dict.update(dict2):Adds dictionary dict2's key-values pairs to dict

10 dict.values():Returns list of dictionary dict's values

47
A function is a block of organized, reusable code that is used to perform a single, related action.
Functions provide better modularity for your application and a high degree of code reusing. Python
gives you many built-in functions like print(), etc. but you can also create your own functions. These
functions are called user-defined functions.

Defining a Function
Simple rules to define a function in Python.

 Function blocks begin with the keyword def followed by the function name and parentheses ( (
) ).

 Any input parameters or arguments should be placed within these parentheses. You can also
define parameters inside these parentheses.

 The first statement of a function can be an optional statement - the documentation string of the
function or docstring.

 The code block within every function starts with a colon (:) and is indented.

 The statement return [expression] exits a function, optionally passing back an expression to the
caller. A return statement with no arguments is the same as return None.

def functionname( parameters ):


"function_docstring"

function_suite
return[expression

Calling a Function
Defining a function only gives it a name, specifies the parameters that are to be included in the
function and structures the blocks of code.Once the basic structure of a function is finalized, you can
execute it by calling it from another function or directly from the Python prompt. Following is the
example to call printme() function −

# Function definition is here


def printme( str ):

48
"This prints a passed string into this function"

print str
return;
# Now you can call printme function
printme("I'm first call to user defined function!")
printme("Again second call to the same function")

When the above code is executed, it produces the following result −

I'm first call to user defined function!


Again second call to the same function

Function Arguments
You can call a function by using the following types of formal arguments:

 Required arguments

 Keyword arguments

 Default arguments

 Variable-length arguments

Scope of Variables
All variables in a program may not be accessible at all locations in that program. This depends on
where you have declared a variable.

The scope of a variable determines the portion of the program where you can access a particular
identifier. There are two basic scopes of variables in Python −

Global variablesLocal variables

49
Global vs. Local variables
Variables that are defined inside a function body have a local scope, and those defined outside have a
global scope.

This means that local variables can be accessed only inside the function in which they are declared,
whereas global variables can be accessed throughout the program body by all functions. When you call
a function, the variables declared inside it are brought into scope. Following is a simple example −

total =0;# This is global variable.

# Function definition is here

def sum( arg1, arg2 ):

# Add both the parameters and return them."

total = arg1 + arg2;# Here total is local variable.

print"Inside the function local total : ", total

return total;

sum(10,20);

print"Outside the function global total : ", total

Result −

Inside the functionlocal total :30

Outside the functionglobal total :0

A module allows you to logically organize your Python code. Grouping related code into a module
makes the code easier to understand and use. A module is a Python object with arbitrarily named
attributes that you can bind and reference.Simply, a module is a file consisting of Python code. A
module can define functions, classes and variables. A module can also include runnable code.

Example:
The Python code for a module named aname normally resides in a file named aname.py. Here's an
example of a simple module, support.py

def print_func( par ):

50
print"Hello : ", par

return

The import Statement


The import has the following syntax:

import module1[, module2[,... moduleN]

When the interpreter encounters an import statement, it imports the module if the module is present in
the search path. A search path is a list of directories that the interpreter searches before importing a
module. For example, to import the module support.py, you need to put the following command at the
top of the script −

A module is loaded only once, regardless of the number of times it is imported. This prevents the
module execution from happening over and over again if multiple imports occur.

Packages in Python
A package is a hierarchical file directory structure that defines a single Python application
environment that consists of modules and sub packages and sub-sub packages.

Consider a file Pots.py available in Phone directory. This file has following line of source code −

defPots():

print"I'm Pots Phone"

Similar way, we have another two files having different functions with the same name as above −

 Phone/Isdn.py file having function Isdn()

 Phone/G3.py file having function G3()

Now, create one more file __init__.py in Phone directory −

51
 Phone/__init__.py

To make all of your functions available when you've imported Phone,to put explicit import statements
in __init__.py as follows −

fromPotsimportPots

fromIsdnimportIsdn

from G3 import G3

After you add these lines to __init__.py, you have all of these classes available when you import the
Phone package.

# Now import your Phone Package.

importPhone

Phone.Pots()

Phone.Isdn()

Phone.G3()

RESULT:

I'm Pots Phone

I'm 3GPhone

I'm ISDN Phone

In the above example, we have taken example of a single functions in each file, but you can keep
multiple functions in your files. You can also define different Python classes in those files and then
you can create your packages out of those classes. This chapter covers all the basic I/O functions
available in Python.

Printing to the Screen


The simplest way to produce output is using the print statement where you can pass zero or more
expressions separated by commas. This function converts the expressions you pass into a string and
writes the result to standard output as follows −

52
print"Python is really a great language,","isn't it?"

Result:

Python is really a great language, isn't it?

Reading Keyboard Input


Python provides two built-in functions to read a line of text from standard input, which by default
comes from the keyboard. These functions are −

 raw_input

 input

The raw_input Function

The raw_input([prompt]) function reads one line from standard input and returns it as a string
(removing the trailing newline).

str = raw_input("Enter your input: ");


print"Received input is : ", str

This prompts you to enter any string and it would display same string on the screen. When I typed
"Hello Python!", its output is like this −

Enter your input:HelloPython


Received input is:HelloPython

The input Function

The input([prompt]) function is equivalent to raw_input, except that it assumes the input is a valid
Python expression and returns the evaluated result to you.

str = input("Enter your input: ");


print"Received input is : ", str

53
This would produce the following result against the entered input −

Enter your input:[x*5for x in range(2,10,2)]


Recieved input is:[10,20,30,40]

Opening and Closing Files

Until now, you have been reading and writing to the standard input and output. Now, we will see how
to use actual data files. Python provides basic functions and methods necessary to manipulate files by
default. You can do most of the file manipulation using a file object.

The open Function

Before you can read or write a file, you have to open it using Python's built-in open() function. This
function creates a file object, which would be utilized to call other support methods associated with it.

Syntax
file object= open(file_name [, access_mode][, buffering])

Here are parameter details:

 file_name: The file_name argument is a string value that contains the name of the file that you
want to access.

54
Modes Description

r Opens a file for reading only. The file pointer is placed at the beginning of the file. This
is the default mode.

rb Opens a file for reading only in binary format. The file pointer is placed at the
beginning of the file. This is the default mode.

r+ Opens a file for both reading and writing. The file pointer placed at the beginning of the
file.

rb+ Opens a file for both reading and writing in binary format. The file pointer placed at the
beginning of the file.

w Opens a file for writing only. Overwrites the file if the file exists. If the file does not
exist, creates a new file for writing.

wb Opens a file for writing only in binary format. Overwrites the file if the file exists. If the
file does not exist, creates a new file for writing.

w+ Opens a file for both writing and reading. Overwrites the existing file if the file exists. If
the file does not exist, creates a new file for reading and writing.

wb+ Opens a file for both writing and reading in binary format. Overwrites the existing file if
the file exists. If the file does not exist, creates a new file for reading and writing.

a Opens a file for appending. The file pointer is at the end of the file if the file exists. That
is, the file is in the append mode. If the file does not exist, it creates a new file for
writing.

ab Opens a file for appending in binary format. The file pointer is at the end of the file if
the file exists. That is, the file is in the append mode. If the file does not exist, it creates
a new file for writing.

a+ Opens a file for both appending and reading. The file pointer is at the end of the file if
the file exists. The file opens in the append mode. If the file does not exist, it creates a
new file for reading and writing.

55
access_mode: The access_mode determines the mode in which the file has to be opened, i.e.,
read, write, append, etc. A complete list of possible values is given below in the table. This is
optional parameter and the default file access mode is read (r).

 buffering: If the buffering value is set to 0, no buffering takes place. If the buffering value is 1,
line buffering is performed while accessing a file. If you specify the buffering value as an
integer greater than 1, then buffering action is performed with the indicated buffer size. If
negative, the buffer size is the system default(default behavior).

Here is a list of the different modes of opening a file −

The file Object Attributes

Once a file is opened and you have one file object, you can get various information related to that file.

Here is a list of all attributes related to file object:

Attribute Description

file.closed Returns true if file is closed, false otherwise.

file.mode Returns access mode with which file was opened.

file.name Returns name of the file.

file.softspace Returns false if space explicitly required with print, true otherwise.

Example
# Open a file
fo = open("foo.txt","wb")
print"Name of the file: ", fo.name
print"Closed or not : ", fo.closed

56
print"Opening mode : ", fo.mode
print"Softspace flag : ", fo.softspace

This produces the following result −

Name of the file: foo.txt


Closed or not : False
Opening mode : wb
Softspace flag : 0

The close() Method

The close() method of a file object flushes any unwritten information and closes the file object, after
which no more writing can be done.Python automatically closes a file when the reference object of a
file is reassigned to another file. It is a good practice to use the close() method to close a file.

Syntax
fileObject.close();

Example
# Open a file
fo = open("foo.txt","wb")
print"Name of the file: ", fo.name
# Close opend file
fo.close()

Result −

Name of the file: foo.txt

Reading and Writing Files

The file object provides a set of access methods to make our lives easier. We would see how to
use read() and write() methods to read and write files.

The write() Method

57
The write() method writes any string to an open file. It is important to note that Python strings can
have binary data and not just text.The write() method does not add a newline character ('\n') to the end
of the string Syntax

fileObject.write(string);

Here, passed parameter is the content to be written into the opened file.Example

# Open a file
fo = open("foo.txt","wb")
fo.write("Python is a great language.\nYeah its great!!\n");

# Close opend file


fo.close()

The above method would create foo.txt file and would write given content in that file and finally it
would close that file. If you would open this file, it would have following content.

Python is a great language.


Yeah its great!!

The read() Method

The read() method reads a string from an open file. It is important to note that Python strings can have
binary data. apart from text data.

Syntax
fileObject.read([count]);

Here, passed parameter is the number of bytes to be read from the opened file. This method starts
reading from the beginning of the file and if count is missing, then it tries to read as much as possible,
maybe until the end of file.

Example

Let's take a file foo.txt, which we created above.

58
# Open a file
fo = open("foo.txt","r+")
str = fo.read(10);
print"Read String is : ", str
# Close opend file
fo.close()

This produces the following result −

ReadStringis:Pythonis

File Positions

The tell() method tells you the current position within the file; in other words, the next read or write
will occur at that many bytes from the beginning of the file.

32

The seek(offset[, from]) method changes the current file position. The offset argument indicates the
number of bytes to be moved. The from argument specifies the reference position from where the
bytes are to be moved.

If from is set to 0, it means use the beginning of the file as the reference position and 1 means use the
current position as the reference position and if it is set to 2 then the end of the file would be taken as
the reference position.

Example

Let us take a file foo.txt, which we created above.

# Open a file
fo = open("foo.txt","r+")
str = fo.read(10);
print"Read String is : ", str

59
# Check current position
position = fo.tell();
print"Current file position : ", position

# Reposition pointer at the beginning once again


position = fo.seek(0,0);
str = fo.read(10);
print"Again read String is : ", str
# Close opend file
fo.close()

This produces the following result −

ReadStringis:Pythonis
Current file position :10
Again read Stringis:Pythonis

Renaming and Deleting Files

Python os module provides methods that help you perform file-processing operations, such as
renaming and deleting files.

To use this module you need to import it first and then you can call any related functions.

The rename() Method

The rename() method takes two arguments, the current filename and the new filename.

Syntax
os.rename(current_file_name, new_file_name)

Example

Following is the example to rename an existing file test1.txt:

import os

60
# Rename a file from test1.txt to test2.txt
os.rename("test1.txt","test2.txt")

The remove() Method

You can use the remove() method to delete files by supplying the name of the file to be deleted as the
argument.

Syntax
os.remove(file_name)

Example

Following is the example to delete an existing file test2.txt −

#!/usr/bin/python
import os

# Delete file test2.txt


os.remove("text2.txt")

Directories in Python

All files are contained within various directories, and Python has no problem handling these too.
The os module has several methods that help you create, remove, and change directories.

The mkdir() Method

You can use the mkdir() method of the os module to create directories in the current directory. You
need to supply an argument to this method which contains the name of the directory to be created.

Syntax
os.mkdir("newdir")

Example

Following is the example to create a directory test in the current directory −

61
#!/usr/bin/python
import os

# Create a directory "test"


os.mkdir("test")

The chdir() Method

You can use the chdir() method to change the current directory. The chdir() method takes an argument,
which is the name of the directory that you want to make the current directory.

Syntax
os.chdir("newdir")

Example

Following is the example to go into "/home/newdir" directory −

#!/usr/bin/python
import os

# Changing a directory to "/home/newdir"


os.chdir("/home/newdir")

The getcwd() Method

The getcwd() method displays the current working directory.

Syntax
os.getcwd()

Example

Following is the example to give current directory −

import os

62
# This would give location of the current directory
os.getcwd()

The rmdir() Method

The rmdir() method deletes the directory, which is passed as an argument in the method.

Before removing a directory, all the contents in it should be removed.

Syntax:
os.rmdir('dirname')

Example

Following is the example to remove "/tmp/test" directory. It is required to give fully qualified name of
the directory, otherwise it would search for that directory in the current directory.

import os
# This would remove "/tmp/test" directory.
os.rmdir("/tmp/test")

63
EXCEPTION NAME DESCRIPTION

Exception Base class for all exceptions

Stop Iteration Raised when the next() method of an iterator does not point to
any object.

System Exit Raised by the sys.exit() function.

Standard Error Base class for all built-in exceptions except StopIteration and
SystemExit.

Arithmetic Error Base class for all errors that occur for numeric calculation.

Overflow Error Raised when a calculation exceeds maximum limit for a


numeric type.

Floating Point Error Raised when a floating point calculation fails.

Zero Division Error Raised when division or modulo by zero takes place for all
numeric types.

Assertion Error Raised in case of failure of the Assert statement.

Attribute Error Raised in case of failure of attribute reference or assignment.

EOF Error Raised when there is no input from either the raw_input() or
input() function and the end of file is reached.

Import Error Raised when an import statement fails.

Keyboard Interrupt Raised when the user interrupts program execution, usually by
pressing Ctrl+c.

Look up Error Base class for all lookup errors.

Index Error Raised when an index is not found in a sequence.

Key Error Raised when the specified key is not found in the dictionary.

Name Error Raised when an identifier is not found in the local or global
namespace.

64
Unbound Local Error Raised when trying to access a local variable in a function or
method but no value has been assigned to it.
Environment Error
Base class for all exceptions that occur outside the Python
environment.

IO Error Raised when an input/ output operation fails, such as the print
statement or the open() function when trying to open a file that
IO Error
does not exist.

Raised for operating system-related errors.

Syntax Error Raised when there is an error in Python syntax.

Indentation Error Raised when indentation is not specified properly.

System Error Raised when the interpreter finds an internal problem, but when
this error is encountered the Python interpreter does not exit.
System Exit Raised when Python interpreter is quit by using the sys.exit()
function. If not handled in the code, causes the interpreter to
exit.
Type Error Raised when an operation or function is attempted that is
invalid for the specified data type.
Value Error Raised when the built-in function for a data type has the valid
type of arguments, but the arguments have invalid values
specified.
Runtime Error Raised when a generated error does not fall into any category.
Not Implemented Raised when an abstract method that needs to be implemented
Error in an inherited class is not actually implemented.
File & Directory Related Methods

There are three important sources, which provide a wide range of utility methods to handle and
manipulate files & directories on Windows and Unix operating systems. They are as follows −

 File Object Methods: The file object provides functions to manipulate files.

 OS Object Methods: This provides methods to process files as well as directories .

65
Python provides two very important features to handle any unexpected error in your
Python programs and to add debugging capabilities in them −

 Exception Handling: This would be covered in this tutorial. Here is a list


standard Exceptions available in Python: Standard Exceptions.

 Assertions: This would be covered in Assertions in Python

List of Standard Exceptions −

What is Exception?
An exception is an event, which occurs during the execution of a program that disrupts
the normal flow of the program's instructions. In general, when a Python script
encounters a situation that it cannot cope with, it raises an exception. An exception is a
Python object that represents an error.

When a Python script raises an exception, it must either handle the exception
immediately otherwise it terminates and quits.

Handling an exception
If you have some suspicious code that may raise an exception, you can defend your
program by placing the suspicious code in a try: block. After the try: block, include
an except: statement, followed by a block of code which handles the problem as
elegantly as possible.

The Python standard for database interfaces is the Python DB-API. Most Python database interfaces
adhere to this standard.

You can choose the right database for your application. Python Database API supports a wide range of
database servers such as −

 GadFly

 mSQL

66
 MySQL

 PostgreSQL

 Microsoft SQL Server 2000

 Informix

 Interbase

 Oracle

 Sybase

The DB API provides a minimal standard for working with databases using Python structures and
syntax wherever possible. This API includes the following:

 Importing the API module.

 Acquiring a connection with the database.

 Issuing SQL statements and stored procedures.

 Closing the connection

8.2.1 SOURCECODE:

Settings.py

import os

# Build paths inside the project like this: os.path.join(BASE_DIR, ...)


BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

# Quick-start development settings - unsuitable for production


# See https://docs.djangoproject.com/en/3.0/howto/deployment/checklist/

67
# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'm+1edl5m-5@u9u!b8-=4-4mq&o1%agco2xpl8c!7sn7!eowjk#'

# SECURITY WARNING: don't run with debug turned on in production!


DEBUG = True

ALLOWED_HOSTS = []

# Application definition

INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'Remote_User',
'Service_Provider',
]

MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]

ROOT_URLCONF = 'phishing_detection_system.urls'

TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [(os.path.join(BASE_DIR,'Template/htmls'))],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]

68
WSGI_APPLICATION = 'phishing_detection_system.wsgi.application'

# Database
# https://docs.djangoproject.com/en/3.0/ref/settings/#databases

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'phishing_detection_system',
'USER':'root',
'PASSWORD': '',
'HOST' :'127.0.0.1',
'PORT' :'3306',
}
}

# Password validation
# https://docs.djangoproject.com/en/3.0/ref/settings/#auth-password-validators

AUTH_PASSWORD_VALIDATORS = [
{
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
},
]

# Internationalization
# https://docs.djangoproject.com/en/3.0/topics/i18n/

LANGUAGE_CODE = 'en-us'

TIME_ZONE = 'UTC'

USE_I18N = True

USE_L10N = True

USE_TZ = True

69
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/3.0/howto/static-files/

STATIC_URL = '/static/'
STATICFILES_DIRS = [os.path.join(BASE_DIR,'Template/images')]
MEDIA_URL = '/media/'
MEDIA_ROOT = os.path.join(BASE_DIR, 'Template/media')

STATIC_ROOT = '/static/'

STATIC_URL = '/static/'

Views.py

from django.db.models import Count, Avg


from django.shortcuts import render, redirect
from django.db.models import Count
from django.db.models import Q
import datetime
import xlwt
from django.http import HttpResponse
import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer


from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.metrics import accuracy_score

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

# Create your views here.


from Remote_User.models import ClientRegister_Model,phishing_detection,detection_accuracy,detection_ratio

def serviceproviderlogin(request):

70
if request.method == "POST":
admin = request.POST.get('username')
password = request.POST.get('password')
if admin == "Admin" and password =="Admin":
detection_accuracy.objects.all().delete()
return redirect('View_Remote_Users')

return render(request,'SProvider/serviceproviderlogin.html')

def View_Prediction_Of_URL_Type_Ratio(request):

detection_ratio.objects.all().delete()
rratio = ""
kword = 'Phishing URL'
print(kword)
obj = phishing_detection.objects.all().filter(Q(Prediction=kword))
obj1 = phishing_detection.objects.all()
count = obj.count();
count1 = obj1.count();
ratio = (count / count1) * 100
if ratio != 0:
detection_ratio.objects.create(names=kword, ratio=ratio)

ratio1 = ""
kword1 = 'Normal URL'
print(kword1)
obj1 = phishing_detection.objects.all().filter(Q(Prediction=kword1))
obj11 = phishing_detection.objects.all()
count1 = obj1.count();
count11 = obj11.count();
ratio1 = (count1 / count11) * 100
if ratio1 != 0:
detection_ratio.objects.create(names=kword1, ratio=ratio1)

71
obj = detection_ratio.objects.all()
return render(request, 'SProvider/View_Prediction_Of_URL_Type_Ratio.html', {'objs': obj})

def View_Remote_Users(request):
obj=ClientRegister_Model.objects.all()
return render(request,'SProvider/View_Remote_Users.html',{'objects':obj})

def ViewTrendings(request):
topic = phishing_detection.objects.values('topics').annotate(dcount=Count('topics')).order_by('-dcount')
return render(request,'SProvider/ViewTrendings.html',{'objects':topic})

def charts(request,chart_type):
chart1 = detection_ratio.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/charts.html", {'form':chart1, 'chart_type':chart_type})

def charts1(request,chart_type):
chart1 = detection_accuracy.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/charts1.html", {'form':chart1, 'chart_type':chart_type})

def View_Prediction_Of_URL_Type(request):
obj =phishing_detection.objects.all()
return render(request, 'SProvider/View_Prediction_Of_URL_Type.html', {'list_objects': obj})

def likeschart(request,like_chart):
charts =detection_accuracy.objects.values('names').annotate(dcount=Avg('ratio'))
return render(request,"SProvider/likeschart.html", {'form':charts, 'like_chart':like_chart})

def Download_Predicted_DataSets(request):

response = HttpResponse(content_type='application/ms-excel')
# decide file name
response['Content-Disposition'] = 'attachment; filename="Predicted_Data.xls"'
# creating workbook
wb = xlwt.Workbook(encoding='utf-8')

72
# adding sheet
ws = wb.add_sheet("sheet1")
# Sheet header, first row
row_num = 0
font_style = xlwt.XFStyle()
# headers are bold
font_style.font.bold = True
# writer = csv.writer(response)
obj = phishing_detection.objects.all()
data = obj # dummy method to fetch data.
for my_row in data:
row_num = row_num + 1
ws.write(row_num, 0, my_row.url, font_style)
ws.write(row_num, 1, my_row.Prediction, font_style)

wb.save(response)
return response

def train_model(request):
detection_accuracy.objects.all().delete()
data = pd.read_csv("Datasets.csv", encoding='latin-1')

def apply_results(results):
if (results == "benign"):
return 0
elif (results == "phishing"):
return 1

data['Results'] = data['type'].apply(apply_results)

x = data['url']
y = data['Results']

cv = CountVectorizer(lowercase=False, strip_accents='unicode', ngram_range=(1, 1))

73
print(x)
print("Y")
print(y)

x = cv.fit_transform(x)

models = []
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.20)
X_train.shape, X_test.shape, y_train.shape

print("Naive Bayes")

from sklearn.naive_bayes import MultinomialNB

NB = MultinomialNB()
NB.fit(X_train, y_train)
predict_nb = NB.predict(X_test)
naivebayes = accuracy_score(y_test, predict_nb) * 100
print("ACCURACY")
print(naivebayes)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, predict_nb))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, predict_nb))
detection_accuracy.objects.create(names="Naive Bayes", ratio=naivebayes)

# SVM Model
print("SVM")
from sklearn import svm

lin_clf = svm.LinearSVC()
lin_clf.fit(X_train, y_train)
predict_svm = lin_clf.predict(X_test)

74
svm_acc = accuracy_score(y_test, predict_svm) * 100
print("ACCURACY")
print(svm_acc)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, predict_svm))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, predict_svm))
detection_accuracy.objects.create(names="SVM", ratio=svm_acc)

print("Logistic Regression")

from sklearn.linear_model import LogisticRegression

reg = LogisticRegression(random_state=0, solver='lbfgs').fit(X_train, y_train)


y_pred = reg.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, y_pred) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, y_pred))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, y_pred))
detection_accuracy.objects.create(names="Logistic Regression", ratio=accuracy_score(y_test, y_pred) * 100)

print("Decision Tree Classifier")


dtc = DecisionTreeClassifier()
dtc.fit(X_train, y_train)
dtcpredict = dtc.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, dtcpredict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, dtcpredict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, dtcpredict))
detection_accuracy.objects.create(names="Decision Tree Classifier", ratio=accuracy_score(y_test, dtcpredict)
* 100)

75
print("Gradient Boosting Classifier")

from sklearn.ensemble import GradientBoostingClassifier


clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0).fit(
X_train,
y_train)
clfpredict = clf.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, clfpredict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, clfpredict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, clfpredict))
models.append(('GradientBoostingClassifier', clf))
detection_accuracy.objects.create(names="Gradient Boosting Classifier",
ratio=accuracy_score(y_test, clfpredict) * 100)

print("Random Forest Classifier")


from sklearn.ensemble import RandomForestClassifier
rf_clf = RandomForestClassifier()
rf_clf.fit(X_train, y_train)
rfpredict = rf_clf.predict(X_test)
print("ACCURACY")
print(accuracy_score(y_test, rfpredict) * 100)
print("CLASSIFICATION REPORT")
print(classification_report(y_test, rfpredict))
print("CONFUSION MATRIX")
print(confusion_matrix(y_test, rfpredict))
models.append(('RandomForestClassifier', rf_clf))
detection_accuracy.objects.create(names="Random Forest Classifier", ratio=accuracy_score(y_test, rfpredict)
* 100)

76
labeled = 'labeled_data.csv'
data.to_csv(labeled, index=False)
data.to_markdown

obj = detection_accuracy.objects.all()
return render(request,'SProvider/train_model.html', {'objs': obj})
Remote User
Predict_URL_Type.html
{% extends 'RUser/design.html' %}
{% block userblock %}

<link rel="icon" href="images/icon.png" type="image/x-icon" />

<link href="https://fonts.googleapis.com/css?family=Lobster" rel="stylesheet">


<link href="https://fonts.googleapis.com/css?family=Righteous" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Fredoka+One" rel="stylesheet">
<style>
body {background-color:#000000;}
.container-fluid {padding:50px;}
.container{background-color:white;padding:50px; }
#title{font-family: 'Fredoka One', cursive;}
.text-uppercase{
font-family: 'Righteous', cursive;}
.tweettext{

border: 2px solid yellowgreen;


width: 904px;
height: 202px;
overflow: scroll;
background-color:;
}
.style1 {

77
color: #FF0000;
font-weight: bold;
}
.style4 {color: #FFFF00; font-weight: bold; }
.style6 {
font-size: 24px;
color: #FFFF00;
font-weight: bold;
}
</style>

<body>
<div class="container-fluid">
<div class="container">

<div class="row">
<div class="col-md-5">

<form role="form" method="POST" >


{% csrf_token %}
<fieldset>
<p class="text-uppercase pull-center
style1">PREDICTION OF URL TYPE !!! </p>
<hr>

{% csrf_token %}
<table width="568" align="center">
<tr>
<td width="287" height="44" bgcolor="#FF0000"><div align="center"><span class="style4 style2">Enter URL
Here</span></div></td>
<td width="269"><textarea name="url" cols="40" rows="10"></textarea></td>
</tr>
<td><p>&nbsp; </p>
<p>
<input name="submit" type="submit" class="style1" value="Predict">

78
</p></td>
</tr>
</table>
</fieldset>
</form>
<form role="form" method="POST" >
{% csrf_token %}
<fieldset>

<hr>
<div>
<table height="85" border="0" align="center" >
<tr><td width="383" bgcolor="#FF0000"><div align="center"><span class="style6">URL TYPE</span><span
class="style4">:: ----&gt;</span></div></td>
<td width="227" bgcolor="#FFFFFF" style="color:red; font-size:20px; font-family:fantasy" ><div
align="center"><strong>{{objs}}</strong></div></td>
</tr>
</table>
</div>
</fieldset>
</form>
</div>

<div class="col-md-2">
<!-------null------>
</div>
</div>
</div>
</div>
{% endblock %}
<tr>

79
TESTING AND RESULT

80
What do you mean by software testing?
Testing involves operation of a system or application under controlled conditions
and evaluating the results. The controlled conditions should include both normal and
abnormal conditions. Testing should intentionally attempt to make things go wrong to
determine if things happen when they shouldn't or things don't happen when they should.
It is oriented to 'detection'.

Unit Testing:
Unit testing is a software development process in which the smallest testable parts
of an application, called units, are individually and independently scrutinized for proper
operation.
Unit testing is often automated but it can also be done manually. This testing mode is a
component of Extreme Programming (XP), a pragmatic method of software
development that takes a meticulous approach to building a product by means of
continual testing and revision.
Unit tests are written from a programmer's perspective. They ensure that a
particular method of a class successfully performs a set of specific tasks. Each test
confirms that a method produces the expected output when given a known input.

Performance Testing:
Performance testing is the process of determining the speed or effectiveness of a
computer, network, software program or device.
This process can involve quantitative tests done in a lab, such as measuring the response
time or the number of MIPS (millions of instructions per second) at which a system
functions.
Qualitative attributes such as reliability, scalability and interoperability may also be
evaluated. Performance testing is often done in conjunction with stress testing.
Performance testing can verify that a system meets the specifications claimed by
its manufacturer or vendor.
The process can compare two or more devices or programs in terms of parameters such
as speed, data transfer rate, bandwidth, throughput, efficiency or reliability.
Performance testing can also be used as a diagnostic aid in locating
communications bottlenecks.
Often a system will work much better if a problem is resolved at a single point or in a
single component.
For example, even the fastest computer will function poorly on today's Web if the
connection occurs at only 40 to 50 Kbps (kilobits per second).

Integration Testing:
Integration testing, also known as integration and testing (I&T), is a software
development process in which program units are combined and tested as groups in

81
multiple ways.
In this context, a unit is defined as the smallest testable part of an application.
Integration testing can expose problems with the interfaces among program components
before trouble occurs in real-world program execution.
Integration testing is a component of Extreme Programming (XP), a pragmatic method
of software development that takes a meticulous approach to building a product by
means of continual testing and revision.

Test Cases:

Test Case for Login Form:


 Function: LOGIN
 Expected Results: Should validate the user and check his existence in database
 Actual Results: Validate the user and checking the user against the database
 Low Priority: No
 High Priority: Yes

Test Case 2: Remote Access User Registration Form


 Function: USER REGISTRATION
 Expected Results: Should check if all the fields are filled by the user and saving
the user to database.
 Actual Results: Checking whether all the fields are filled by user or not through
validations and saving user.
 Low Priority: No
 High Priority: Yes

Test Case 3: Change Password


When the old password does not match with the new password, then this results in
displaying an error message as “OLD PASSWORD DOES NOT MATCH WITH THE
NEW PASSWORD”.
 Function: Change Password
 Expected Results: Should check if old password and new password fields are
filled by the user and saving the user to database.
 Actual Results: Checking whether all the fields are filled by user or not through
validations and saving user.
 Low Priority: No
 High Priority: Yes

Test Case 4: Forget Password


When a user forgets his password, he is asked to enter Login name, ZIP code, and
Mobile number.

82
Module Functionality Test Case Expected Actual Result Priority
Results Results
e Login Navigate To A A P H
r Use case www.sample.com. validation validation ass igh
Click on Submit should be has been
button without as below populated
entering Username “Please as
and Password enter valid expected
Username
&
Password”
Test Navigate To A A Pass High
Username www.sample.com. validation validation
Field Click on Submit should be is shown
button without as below as
filling Password “Please expected
and with valid enter valid
Username Password
or
Password
field cannot
be empty”
Navigate To A A Pass High
www.sample.com. validation validation
Enter both shown as is shown
Username and below “The as
Password wrong username expected
and hit enter entered is
wrong”
Navigate To Vali Main Pass
www.sample.com. date Page /
Enter valid Username Home
Username and and Page has
Password and click Password been
on Submit in database displayed
and if
correct then
show main
page

83
SCREENSHOTS

Fig 1: Resource Manager Login

Fig 2: Resource Manager Operation Browse Train and Test dataset

84
Fig 3: Resource Manager Operation View Trained and Tested Url
Dataset Accuracy in Bar Chart

Fig 4: Resource Manager Operation view Trained and Tested Url


Dataset Accuracy Results

85
Fig 5: Resource Manager Operation View predicted Dataset

Fig 6: Resource Manager Operation View Url Type ratio

86
Fig 7: Resource Manager Operation Download Predicted Dataset

Fig 8: Resource Manager Operation view url type ratio Results

87
Fig 9: Resource Manager Operation View all remote users

88
CONCLUSION

89
The Internet consumes almost the whole world in the upcoming age, but it is still growing
rapidly. With the growth of the Internet, cybercrimes are also increasing daily using suspicious and
malicious URLs, which have a significant impact on the quality of services provided by the Internet and
industrial companies. Currently, privacy and confidentiality are essential issues on the internet. To
breach the security phases and interrupt strong networks, attackers use phishing emails or URLs that are
very easy and effective for intrusion into private or confidential networks. Phishing URLs simply act as
legitimate URLs. A machine-learning-based phishing system is proposed in this study. A dataset
consisting of 32 URL attributes and more than 11054 URLs was extracted from 11000+websites. This
dataset was extracted from the Kaggle repository and used as a benchmark for research. This dataset
has already been presented in the form of vectors used in machine learning models. Decision tree, linear
regression, random forest, support vector machine, gradient boosting machine, K-Neighbor classifier,
naive Bayes, and hybrid (LR+SVC+DT) with soft and hard voting were applied to perform the
experiments and achieve the highest performance results. The canopy feature selection with cross fold
validation and Grid search hyper parameter optimization techniques are used with LSD Ensemble
model. The proposed approach is evaluated in this study by experimenting with a separate machine
learning models, and then further evaluation of the study was carried out. The proposed approach
successfully achieves its aim with effective efficiency. Future phishing detection systems should
combine list-based machine learning-based systems to prevent and detect phishing URLs more
efficiently.
.

FUTURE SCOPE
The future scope of the proposed phishing detection system highlights several promising
directions for enhancing cybersecurity measures. Firstly, integrating list-based and machine-learning-
based systems can significantly improve the detection and prevention of phishing URLs. By combining
these methodologies, future systems can leverage the strengths of each approach, ensuring more
comprehensive coverage and reducing false positives and negatives. Additionally, the incorporation of
real-time data analysis and continuous learning capabilities will enable the system to adapt to evolving
phishing tactics rapidly. Furthermore, expanding the dataset to include a more diverse range of URLs
and attributes can enhance the model's robustness and generalizability. The integration of advanced
techniques like deep learning and neural networks may also provide more sophisticated detection
capabilities. Finally, developing user-friendly interfaces and automated response mechanisms will

90
ensure that even non-expert users can benefit from advanced phishing protection. By addressing these
areas, future systems can offer more reliable, efficient, and user-centric solutions to combat the ever-
growing threat of cybercrimes.

91
REFERENCES

92
[1] N. Z. Harun, N. Jaffar, and P. S. J. Kassim, ‘‘Physical attributes significant in preserving the social
sustainability of the traditional Malay settlement,’’ in Reframing the Vernacular: Politics, Semiotics,
and Representation. Springer, 2020, pp. 225–238.
[2] D. M. Divakaran and A. Oest, ‘‘Phishing detection leveraging machine learning and deep learning:
A review,’’ 2022, arXiv:2205.07411.
[3] A. Akanchha, ‘‘Exploring a robust machine learning classifier for detecting phishing domains using
SSL certificates,’’ Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada, Tech.
Rep. 10222/78875, 2020.
[4] H. Shahriar and S. Nimmagadda, ‘‘Network intrusion detection for TCP/IP packets with machine
learning techniques,’’ in Machine Intelligence and Big Data Analytics for Cybersecurity Applications.
Cham, Switzerland: Springer, 2020, pp. 231–247.
[5] J. Kline, E. Oakes, and P. Barford, ‘‘A URL-based analysis of WWW structure and dynamics,’’ in
Proc. Netw. Traffic Meas. Anal. Conf. (TMA), Jun. 2019, p. 800.
[6] A. K. Murthy and Suresha, ‘‘XML URL classification based on their semantic structure orientation
for web mining applications,’’ Proc. Comput. Sci., vol. 46, pp. 143–150, Jan. 2015.
[7] A. A. Ubing, S. Kamilia, A. Abdullah, N. Jhanjhi, and M. Supramaniam, ‘‘Phishing website
detection: An improved accuracy through feature selection and ensemble learning,’’ Int. J. Adv.
Comput. Sci. Appl., vol. 10, no. 1, pp. 252–257, 2019.
[8] A. Aggarwal, A. Rajadesingan, and P. Kumaraguru, ‘‘PhishAri: Automatic real-time phishing
detection on Twitter,’’ in Proc. eCrime Res. Summit, Oct. 2012, pp. 1–12.
[9] S. N. Foley, D. Gollmann, and E. Snekkenes, Computer Security—ESORICS 2017, vol. 10492.
Oslo, Norway: Springer, Sep. 2017.
[10] P. George and P. Vinod, ‘‘Composite email features for spam identification,’’ in Cyber Security.
Singapore: Springer, 2018, pp. 281–289.
[11] H. S. Hota, A. K. Shrivas, and R. Hota, ‘‘An ensemble model for detecting phishing attack with
proposed remove-replace feature selection technique,’’ Proc. Comput. Sci., vol. 132, pp. 900–907, Jan.
2018.
[12] G. Sonowal and K. S. Kuppusamy, ‘‘PhiDMA—A phishing detection model with multi-filter
approach,’’ J. King Saud Univ., Comput. Inf. Sci., vol. 32, no. 1, pp. 99–112, Jan. 2020.
[13] M. Zouina and B. Outtaj, ‘‘A novel lightweight URL phishing detection system using SVM and
similarity index,’’ Hum.-Centric Comput. Inf. Sci., vol. 7, no. 1, p. 17, Jun. 2017.

93
[14] R. Ø. Skotnes, ‘‘Management commitment and awareness creation—ICT safety and security in
electric power supply network companies,’’ Inf. Comput. Secur., vol. 23, no. 3, pp. 302–316, Jul. 2015.
[15] R. Prasad and V. Rohokale, ‘‘Cyber threats and attack overview,’’ in Cyber Security: The Lifeline
of Information and Communication Technology. Cham, Switzerland: Springer, 2020, pp. 15–31.
[16] T. Nathezhtha, D. Sangeetha, and V. Vaidehi, ‘‘WC-PAD: Web crawling-based phishing attack
detection,’’ in Proc. Int. Carnahan Conf. Secur. Technol. (ICCST), Oct. 2019, pp. 1–6.
[17] R. Jenni and S. Shankar, ‘‘Review of various methods for phishing detection,’’ EAI Endorsed
Trans. Energy Web, vol. 5, no. 20, Sep. 2018, Art. no. 155746.
[18] (2020). Accessed: Jan. 2020. [Online]. Available: https://catches-of-the-month-phishing-scams-
for-january-2020
[19] S. Bell and P. Komisarczuk, ‘‘An analysis of phishing blacklists: Google Safe Browsing,
OpenPhish, and PhishTank,’’ in Proc. Australas. Comput. Sci. Week Multiconf. (ACSW), Melbourne,
VIC, Australia. New York, NY, USA: Association for Computing Machinery, 2020, pp. 1–11, Art. no.
3, doi: 10.1145/3373017.3373020.
[20] A. K. Jain and B. Gupta, ‘‘PHISH-SAFE: URL features-based phishing detection system using
machine learning,’’ in Cyber Security. Switzerland: Springer, 2018, pp. 467–474.
[21] Y. Cao, W. Han, and Y. Le, ‘‘Anti-phishing based on automated individual white-list,’’ in Proc.
4th ACM Workshop Digit. Identity Manage., Oct. 2008, pp. 51–60.
[22] G. Diksha and J. A. Kumar, ‘‘Mobile phishing attacks and defence mechanisms: State of art and
open research challenges,’’ Comput. Secur., vol. 73, pp. 519–544, Mar. 2018.
[23] M. Khonji, Y. Iraqi, and A. Jones, ‘‘Phishing detection: A literature survey,’’ IEEE Commun.
Surveys Tuts., vol. 15, no. 4, pp. 2091–2121, 4th Quart, 2013.
[24] S. Sheng, M. Holbrook, P. Kumaraguru, L. F. Cranor, and J. Downs, ‘‘Who falls for phish? A
demographic analysis of phishing susceptibility and effectiveness of interventions,’’ in Proc. SIGCHI
Conf. Hum. Factors Comput. Syst., Apr. 2010, pp. 373–382.
[25] P. Prakash, M. Kumar, R. R. Kompella, and M. Gupta, ‘‘PhishNet: Predictive blacklisting to detect
phishing attacks,’’ in Proc. IEEE INFOCOM, Mar. 2010, pp. 1–5.

94

You might also like