0% found this document useful (0 votes)
49 views61 pages

Final Document

DOCUMENT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views61 pages

Final Document

DOCUMENT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 61

A MINI-PROJECT REPORT ON

CYBER THREAT DETECTION


Submitted in partial fulfillment of requirements
for the award of the degree of

MASTER OF COMPUTER APPLICATIONS

Submitted by:

ANDIRAJU KESHAVA KRISHNA

(22091F0019)

Under the Guidance of


Mr. M. Ravi Kumar, MCA, M. Tech.
Asst. Professor in Dept. of MCA

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


RAJEEV GANDHI MEMORIAL COLLEGE OF ENGINEERING &
TECHNOLOGY (AUTONOMOUS)
AFFILIATED TO J.N.T UNIVERSITY ANANTAPUR. ACCREDITED BY NBA (TIER-1) &
NAAC OF UGC. NEW DELHI, WITH A+ GRADE
RECOGNIZED UGC-DDU KAUSHAL
KENDRA NANDYAL-518501, (Estd-1995)
YEAR: 2023-2024
Rajeev Gandhi Memorial College of Engineering &Technology
(AUTONOMOUS)
AFFILIATED TO J.N.T UNIVERSITY ANANTAPUR. ACCREDITED BY NBA (TIER-1) &
NAAC OF UGC. NEW DELHI, WITH A+ GRADE
RECOGNIZED UGC-DDU KAUSHAL
KENDRA NANDYAL-518501, (Estd-1995)

(ESTD – 1995)

DEPARTMENT OF MASTER OF COMPUTER APPLICATIONS

CERTIFICATE
This is to certify that ANDIRAJU KESHAVA KRISHNA (22091F0019), of MCA III-
semester, has carried out the mini-project work entitled “CYBER THREAT DETECTION”
under the supervision and guidance of Mr. M. Ravi Kumar, Assistant. Professor, MCA
Department, in partial fulfillment of the requirements for the award of Degree of Master of
Computer Applications from Rajeev Gandhi Memorial College of Engineering &
Technology (Autonomous), Nandyal is a bonafied record of the work done by her during 2023-
2024.

Project Guide Date:

Mr. M. Ravi Kumar


MCA, M. tech.

Assistant
Professor, Dept. of
MCA

Place: Nandyal
Head of the Department

Dr. K. Subba Reddy MTech, Ph.D.

Professor, Dept. of CSE

External Examiner
ACKNOWLEDGEMENT

I manifest my heartier thankfulness pertaining to your contentment over my project guide


Mr. M. Ravi Kumar, Asst. Prof, M. Tech, (Ph.D) in Department of MCA with whose adroit
concomitance the excellence has been exemplified in bringing out this project to work with
artistry.

I express my gratitude to Dr. K. Subba Reddy garu, Head of the Department of Computer
Science Engineering & MCA departments, all teaching and non-teaching staff of the Computer
Science Engineering department of Rajeev Gandhi memorial College of Engineering and
Technology for providing continuous encouragement and cooperation at various steps of my
project.

Involuntarily, I am perspicuous to divulge my sincere gratefulness to our Principal, Dr.


T. Jaya Chandra Prasad garu, who has been observed posing virtue in abundance towards my
individuality to acknowledge my project work tangentially.

At the outset I thank to honorable Chairman Dr. M. SanthiRamudu garu, for providing
us with exceptional faculty and moral support throughout the course.

Finally, I extend my sincere thanks to all the Staff Members of MCA & CSE
Departments who have co-operated and encouraged us in making my project successful.

Whatever one does, whatever one achieves, the first credit goes to the Parents be it not
for their love and affection, nothing would have been responsible. I have seen every good that
happens to us their love and blessings.

BY
ANDIRAJU KESHAVA KRISHNA (22091F0019)
CONTENTS
CHAPTER PAGE NO.

1.INTRODUCTION

1.1. about the project

2. LITERATURE SURVEY

2.1. Existing System

2.1.1. Disadvantages

2.2. Proposed System

2.2.1. Advantages

3. SYSTEM DESIGN

3.1 Software development life cycle (SDLC)

3.2 Spiral Model

3.3 Modules of the project

3.3.1 System Module

3.3.2 Malicious Webpages

3.3.3 Identifying relevant static features

3.3.4 Detect malicious webpages

3.4 Algorithms

3.4.1 Navie Bayes

3.4.2 Random Forest

3.5 Requirement Analysis

3.5.1 Functional Requirements

3.5.2 Non-Functional requirements

3.6 Feasibility study

3.6.1 Technical Feasibility

3.6.2 Economical Feasibility

3.6.3 Operational Feasibility


3.7 UML Diagrams

3.7.1 why we use UML in projects

3.7. 2 Class Diagram

3.7.3 Use case Diagram

3.7.4 Sequence Diagram

3.7.5 Activity Diagram

3.7.6 Deployment Diagram

3.8 Hardware & Software Requirements

3.8.1 Software Requirements

3.8.2 Hardware Requirements

4. IMPLEMENTATION

4.1 Java Technology

4.1.1 Java Platform

4.1.2 ODBC

4.1.3 JDBC

4.2 Overview of DBMS

4.2.1 Data Abstraction

4.2.2 Instances and Schema

4.3 Data Models

4.3.1 The Entity Relationship Model

4.3.2 Relational Model

4.4 Database Languages

4.4.1 Data Definition Language

4.4.2 Data Manipulation Language

4.5 MYSQL

4.5.1 Features of MYSQL


5. TESTING

5.1 Software Testing techniques

5.1.1 Testing Objectives

5.1.2 Test Case Design

5.2 Software Testing strategies

5.2.1 Unit Testing

5.2.2 Integration Testing

5.2.3 Validation Testing

5.2.4 System Testing

5.2.5 Security Testing

5.2.6 Performance Testing

5.3 Test Cases

6. OUTPUT SCREENS

7. CONCLUSION

8. REFERENCE
LIST OF FIGURES

FIG NO. NAME OF THE FIGURE PAGE NO.

Fig: 1 System Architecture

Fig: 2 Software Development Life Cycle (SDLC)

Fig: 3 Spiral Model

Fig: 4 Class Diagram

Fig: 5 Use-case Diagram

Fig: 6 Sequence Diagram

Fig:7 Activity Diagram

Fig: 8 Deployment Diagram

Fig: 9 Java Execution


Fig: 10 Java Execution for different connection

TABLES PAGE NO.

1. Test Case Results


ABSTRACT

Mobile specific webpages differ significantly from their desktop counterparts in content,
layout and functionality. Accordingly, existing techniques to detect malicious websites are
unlikely to work for such webpages. In this paper, we design and implement KAYO, a
mechanism that distinguishes between malicious and benign mobile webpages. KAYO makes
this determination based on static features of a webpage ranging from the number of I frames
to the presence of known fraudulent phone numbers.

First, we experimentally demonstrate the need for mobile specific techniques and then
identify a range of new static features that highly correlate with mobile malicious webpages.
We then apply KAYO to a dataset of over 350,000 known benign and malicious mobile
webpages and demonstrate90% accuracy in classification. Moreover, we discover,
characterize and report a number of webpages missed by Google Safe Browsing and Virus
Total, but detected by KAYO. Finally, we build a browser extension using KAYO to protect
users from malicious mobile websites in real-time. In doing so, we provide the first static
analysis technique to detect malicious mobile webpages.
CYBER THREAT DETECTION

DEPT OF MCA, RGMCET, NANDYAL 1


CYBER THREAT DETECTION

CHAPTER-1

INTRODUCTION

1.1 ABOUT THE PROJECT:

With the emergence of artificial intelligence (AI) techniques, learning-based


approaches for detecting cyber-attacks, have become further improved, and they have
achieved significant results in many studies. However, owing to constantly evolving cyber-
attacks, it is still highly challenging to protect IT systems against threats and malicious
behaviours in networks. Because of various network intrusions and malicious activities,
effective defences and security considerations were given high priority for ending reliable
solutions. Traditionally, there are two primary systems for detecting cyber-threats and
network intrusions. An intrusion prevention system (IPS) is installed in the enterprise
network, and can examine the network protocols and signature-based methods primarily. It
generates appropriate intrusion alerts, called the security events, and reports the generating
alerts to another system, such as SIEM.

The security information and event management (SIEM) has been focusing on
collecting and managing the alerts of IPSs. The SIEM is the most common and dependable
solution among various security operations solutions to analyse the collected security events
and logs. Moreover, security analysts make an effort to investigate suspicious alerts by
policies and threshold, and to discover malicious behaviour by analysing correlations among
events, using knowledge related to attacks. For this, the proposed the AI-SIEM system
particularly includes an event pattern extraction method by aggregating together events with a
concurrency feature and correlating between event sets in collected data. Our event proles
have the potential to provide concise input data for various deep neural networks. Moreover,
it enables the analyst to handle all the data promptly and efficiently by comparison with long-
term history data.

DEPT OF MCA, RGMCET, NANDYAL 1


CYBER THREAT DETECTION

1.2 SYSTEM ARCHITECTURE

DEPT OF MCA, RGMCET, NANDYAL 2


CYBER THREAT DETECTION

Fig 1: System Architecture

The main contributions of our work can be summarized as follows:

Our proposed system aims at converting a large amount of security events to


individual event paroles for processing very large-scale data. We developed a generalizable
security event analysis method by learning normal and threat patterns from a large amount of
collected data, considering the frequency of their occurrence. In this study, we specially
propose the method to characterize the data sets using the base points in data preprocessing
step. This method can reduce the dimensionality space, which is often the main challenge
associated with traditional data mining techniques in log analysis. Our event prolong method
for applying artificial intelligence techniques, unlike typical sequence-based pattern
approaches, provides featured input data to employ various deep-learning techniques.

CHAPTER-2

LITERATURE REVIEW

2.1 EXISTING SYSTEM

Traditionally, there are two primary systems for detecting cyber-threats and network
intrusions. An intrusion prevention system (IPS) is installed in the enterprise network, and
can examine the network protocols and owe with signature-based methods primarily. It
generates appropriate intrusion The associate editor coordinating the review of this
manuscript and approving it for publication was Chi-Yuan Chen. alerts, called the security
events, and reports the generating alerts to another system, such as SIEM. The security

DEPT OF MCA, RGMCET, NANDYAL 3


CYBER THREAT DETECTION

information and event management (SIEM) has been focusing on collecting and managing
the alerts of IPSs.

2.1.1 DISADVANTAGES

 It is still difficult to recognize and detect intrusions against intelligent network attacks
owing to their high false alerts and the huge amount of security data

 These learning-based approaches require to learn the attack model from historical
threat data and use the trained models to detect intrusions for unknown cyber threats

DEPT OF MCA, RGMCET, NANDYAL 4


CYBER THREAT DETECTION

2.2 PROPOSED SYSTEM

Our proposed system aims at converting a large amount of security events to


individual event profiles for processing very large-scale data. We developed a generalizable
security event analysis method by learning normal and threat patterns from a large amount of
collected data, considering the frequency of their occurrence. In this study, we specially
propose the method to characterize the data sets using the base points in data preprocessing
step. This method can significantly reduce the dimensionality space, which is often the main
challenge associated with traditional data mining techniques in log analysis. Our event
profiling method for applying artificial intelligence techniques, unlike typical sequence-based
pattern approaches, provides featured input data to employ various deep-learning techniques.
Hence, because our technique is able to facilitate improved classification for true alerts when
compared with conventional machine-learning methods, it can remarkably reduce the number
of alerts practically provided to the analysts.

2.2.1 ADVANTAGES

 For cyber-threat detection, the SIEM analysts spend an immense amount of effort and
time to differentiate between true security alerts and false security alerts in collected
events.
 The Data security is more since data co-owners can renew the cipher texts by
appending their access policies as the dissemination conditions.
 The system is more secured due to Continuous policy enforcement in which the data
owner’s access policy is enforced in the initial cipher text as well as the renewed
cipher text.

DEPT OF MCA, RGMCET, NANDYAL 5


CYBER THREAT DETECTION

CHAPTER-3

SYSTEM DESIGN

Analysis is a logical process. The objective of this phase is to determine exactly what
must be done to solve the problem. Tools such as Class Diagrams, Sequence Diagrams, data
flow diagrams and data dictionary are used in developing a logical model of system.

3.1 SOFTWER DEVELOPMENT LIFE CYCLE

Fig 2: SOFTWER DEVELOPMENT LIFE CYCLE

DEPT OF MCA, RGMCET, NANDYAL 6


CYBER THREAT DETECTION

Programming improvement life cycle (SDLC) is a movement of stages that give an


average understanding of the item assembling process. How the item will be perceived and
made from the business understanding and necessities elicitation stage to change over these
business contemplations and requirements into limits and features until its utilization and
movement to achieve the business needs. The extraordinary computer developer should have
adequate data on the most capable technique to pick the SDLC model taking into account the
endeavour setting and the business requirements.

Thus, it may be normal to pick the right SDLC model as shown by the specific
concerns and necessities of the endeavour to ensure its flourishing. I composed one more on
the most proficient method to pick the right SDLC, it can follow this connection for more
data. Besides, to dive more deeply into programming life testing and SDLC stages are follow
the connections featured here.

It will investigate the various kinds of SDLC models and the benefits and disservices
of every one and when to utilize them.

That can imagine SDLC models as devices that can use to all the more likely convey
product project. Thusly, knowing and seeing each model and when to utilize it, the benefits
and drawbacks of every one is essential to know which one is appropriate for the undertaking
setting.

Types of Software developing life cycles (SDLC)

 Waterfall Model
 V-Shaped Model
 Evolutionary Prototyping Model
 Spiral Method (SDM)
 Iterative and Incremental Method
 Agile development

DEPT OF MCA, RGMCET, NANDYAL 7


CYBER THREAT DETECTION

3.2 Spiral Model (SDM)

It is joining parts of both arrangement and prototyping-in-stages, with an ultimate


objective to merge advantages of progressive and base up thoughts. This model of headway
joins the features of the prototyping model and the fountain model. The winding model is
leaned toward for tremendous, exorbitant, and tangled projects. This model purposes
enormous quantities of comparable stages as the outpouring model, in fundamentally a
comparative solicitation, disconnected by orchestrating, risk assessment, and the construction
of models and entertainments.

Fig 3: Spiral Model

DEPT OF MCA, RGMCET, NANDYAL 8


CYBER THREAT DETECTION

3.3 MODULES OF THE PROJECT

3.3.1 SYSTEM MODULE

In the first module, we develop the System environment model. Website providers use
JavaScript rouser agent strings to identify and then redirect mobile users to a mobile specific
version. We note that not all static features used in existing techniques differ when measured
on mobile and desktop webpages. Mobile websites enable access to a user’s personal
information and advanced capabilities of mobile devices through weapons. Existing static
analysis techniques do not consider these mobile specific functionalities in their feature set.
We argue and later demonstrate that accounting further mobile specific functionalities helps
identify new threats specific to the mobile web. For example, the presence of a known ‘bank’
fraud number on a website might indicate that the webpage is a phishing webpage imitating
the same bank.

3.3.2 MALICIOUS WEBPAGES

We argue that benign webpage writers take effort to provide good user experience,
whereas the goal for malicious webpage authors is to trick users into performing
unintentional actions with minimal effort. We therefore examine whether a webpage has no
script content admeasure the number of no script. Intuitively, a benign webpage writer will
have more no script in the code tonsure good experience even for a security savvy user.

3.3.3 IDENTIFYING RELEVANT STATIC FEATURES

We extract static features from a webpage and make predictions about its potential
maliciousness. We first discuss the feature set used in kayo followed by the collection process
of the dataset. Structural and lexical properties of a URL have been used to differentiate
between malicious and benign Webpages. However, using only URL features for such
differentiation leads to a high false positive rate.

DEPT OF MCA, RGMCET, NANDYAL 9


CYBER THREAT DETECTION

Our data gathering process included acculating label benign and malicious mobile specific
webpages. First, we describe an experiment that identifies and defines ‘mobile specific
webpage’s. We then conduct the data collection process. We use these crawls specifically

because they are closet the publication of the related work, making them as close to
equivalent as possible.

3.3.4 DETECT MALICIOUS WEBPAGES

We describe the machine learning techniques we considered to tackle the problem of


classifying mobile specific webpages as malicious or benign. We then discuss the strengths
and weaknesses of each classification technique, and the process for selecting the best model
for kayo. We build and evaluate our chosen model for accuracy, false positive rate and true
positive rate. Finally, we compare kayo to existing techniques and empirically demonstrate
the significance of kayo’s features. We note that where automated analysis is possible, we use
our full datasets; however, as is commonly done in the research community, we use randomly
selected subsets of our data when extensive manual analysis and verification is required.

DEPT OF MCA, RGMCET, NANDYAL 10


CYBER THREAT DETECTION

3.4 Algorithms
3.4.1 Navie Bayes
 Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
 It is mainly used in text classification that includes a high-dimensional training
dataset.
 Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can make
quick predictions.
 It is a probabilistic classifier, which means it predicts on the basis of the probability of
an object.
 Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.

Step 1: Calculate the prior probability for given class labels


Step 2: Find Likelihood probability with each attribute for each class
Step 3: Put this value in Bayes Formula and calculate posterior probability.
Step 4: See which class has a higher probability, given the input belongs to the higher
probability class.
3.4.2 Random Forest
 Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in
ML. It is based on the concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the performance of the
model.

 As the name suggests, "Random Forest is a classifier that contains a number of


decision trees on various subsets of the given dataset and takes the average to improve
the predictive accuracy of that dataset." Instead of relying on one decision tree, the
random forest takes the prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.

DEPT OF MCA, RGMCET, NANDYAL 11


CYBER THREAT DETECTION

3.5 REQUIREMENT ANALYSIS

A requirement is a relatively short and concise piece of information, expressed as a


fact. It can be written as a sentence or can be expressed using some kind of diagram.

3.5.1 FUNCTIONAL REQUIREMENTS

Functional requirements describe what the system should do. The functional
requirements can be further categorized as follows:

1. What inputs the system should accept?

2. What outputs the system should produce?

3. What data the system must store?

4. What are the computations to be done?

The input design is the link between the information system and the user. It comprises
the developing specification and procedures for data preparation and the steps are necessary
to put transaction data in to a usable form for processing that can be achieved by inspecting
the computer to read data from a written or printed document or it can occur by having
people keying the data directly into the system. The design of input focuses on controlling the
amount of input required, controlling the errors, avoiding delay, avoiding extra steps and
keeping the process simple. The input is designed in such a way so that it provides security
and ease of use with retaining the privacy. Input Design considered the following things:

1. What data should be given as input?

2. How the data should be arranged or coded?

3. The dialog to guide the operating personnel in providing input.

4. Methods for preparing input validations and steps to follow when error occur.

DEPT OF MCA, RGMCET, NANDYAL 12


CYBER THREAT DETECTION

3.5.2 NON-FUNCTIONAL REQUIREMENTS

Non-functional requirements are the constraints that must be adhered during


development. They limit what resources can be used and set bounds on aspects of the
software’s quality.

User Interfaces

The User Interface is a GUI developed using Java.

Software Interfaces

The main processing is done in Java and console application.

3.6 FEASIBILITY STUDY

All projects are feasible when provided with unlimited resources and infinite time.
Unfortunately, the development of computer-based system or product is more likely plagued
by a scarcity of resources and difficult delivery dates. It is both necessary and prudent to
evaluate the feasibility of a project

at the earliest possible time. Months or years of effort, thousands or millions of


dollars, and untold professional embarrassment can be averted if an ill-conceived system is
recognized early in the definition phase.

Feasibility and risk analysis are related in many ways. If project risk is great the
feasibility of producing quality software is reduced. During product engineering, however, we
concentrate our attention on four primary areas of interest.

3.6.1 TECHNICAL FEASIBILITY

This application in going to be used in an Internet environment called www (World


Wide Web). So, it is necessary to use a technology that is capable of providing the
networking facility to the application. This application as also able to work on distributed
environment. Application on developed with DOT NET Technology. One major advantage in
application is platform neutral. We can deploy and used it in any operating system.

DEPT OF MCA, RGMCET, NANDYAL 13


CYBER THREAT DETECTION

GUI is developed using HTML to capture the information from the customer. HTML
is used to display the content on the browser. It uses TCP/IP protocol. It is an interpreted
language. It is very easy to develop a page/document using HTML some RAD (Rapid
Application Development) tools are provided to quickly design/develop our application. So
many objects such as button, text fields, and text area etc. are provided to capture the
information from the customer.

3.6.2 ECONOMICAL FEASIBILITY

The economic issues usually arise during the economic feasibility stage are whether
the system will be used if it is developed and implemented, whether the financial benefits are
equal are exceeds the costs. The cost for developing the project will include cost conducts full
system investigation, cost of hardware and software for the class of being considered, the
benefits in the form of reduced costs or fewer costly errors.

The project is economically feasible if it is developed and installed. It reduces the


work load. Keep the class of application in the view, the cost of hardware and software is
considered to be economically feasible.

3.6.3 OPERATIONAL FEASIBILITY

In our application front end is developed using GUI. So, it is very easy to the customer
to enter the necessary information. But customer must have some knowledge on using web
applications before going to use our application.

DEPT OF MCA, RGMCET, NANDYAL 14


CYBER THREAT DETECTION

3.7 UML DIAGRAMS

3.7.1 Why We Use UML in projects?

As the strategic value of software increases for many companies, the industry looks
for techniques to automate the production of software and to improve quality and reduce cost
and time-to-market. These techniques include component technology, visual programming,
patterns and frameworks. Businesses also seek techniques to manage the complexity of
systems as they increase in scope and scale. In particular, they recognize the need to solve
recurring architectural problems, such as physical distribution, concurrency, replication,
security, load balancing and fault tolerance. Additionally, the development for the World
Wide Web, while making some things simpler, has exacerbated these architectural problems.
The Unified Modelling Language (UML) was designed to respond to these needs. Simply,
Systems design refers to the process of defining the architecture, components, modules,
interfaces, and data for a system to satisfy specified requirements which can be done easily
through UML diagrams.

In the project four basic UML diagrams have been explained among the following list:

 Class Diagram

 Use Case Diagram

 Sequence Diagram

 Activity Diagram

 Deployment Diagram

DEPT OF MCA, RGMCET, NANDYAL 15


CYBER THREAT DETECTION

3.7.2 CLASS DIAGRAM

In software engineering, a class diagram in the Unified Modelling Language (UML)


is a type of static structure diagram that describes the structure of a system by showing the
system's classes, their attributes, and the relationships between the classes.

This is one of the most important of the diagrams in development. The diagram
breaks the class into three layers. One has the name, the second describes its attributes and
the third its methods. A padlock to left of the name represents the private attributes.

The relationships are drawn between the classes. Developers use the Class Diagram
to develop the classes. Analyses use it to show the details of the system. Architects look at
class diagrams to see if any class has too many functions and see if they are required to be
split.

Fig 4: Class

Diagram

DEPT OF MCA, RGMCET, NANDYAL 16


CYBER THREAT DETECTION

3.7.3 USE CASE DIAGRAM

In software engineering, a use case diagram in the Unified Modeling Language


(UML) is a type of behavioral diagram defined by and created from a Use-case analysis. Its
purpose is to present a graphical overview of the functionality provided by a system in terms
of actors, their goals (represented as use cases), and any dependencies between those use
cases. The main purpose of a use case diagram is to show what system functions are
performed for which actor. Roles of the actors in the system can be depicted. Use cases are
used during requirements elicitation and analysis to represent the functionality of the system.
Use cases focus on the behavior of the system from the external point of view. The actors are
outside the boundary of the system, whereas the use cases are inside the boundary of the
system.

Fig 5: Use Case Diagram

DEPT OF MCA, RGMCET, NANDYAL 17


CYBER THREAT DETECTION

3.7.4 SEQUENCE DIAGRAM

A sequence diagram in Unified Modelling Language (UML) is a kind of interaction


diagram that shows how processes operate with one another and in what order. It is a
construct of a Message Sequence Chart. Sequence diagrams are sometimes called Event-trace
diagrams, event scenarios, and timing diagrams.

Fig 6: Sequence Diagram

3.7.5 ACTIVITY DIAGRAM

Activity diagrams are a loosely defined diagram technique for showing workflows of
stepwise activities and actions, with support for choice, iteration and concurrency. In the
Unified Modelling Language, activity diagrams can be used to describe the business and
operational step-by-step workflows of components in a system. An activity diagram shows
the overall flow of control.

DEPT OF MCA, RGMCET, NANDYAL 18


CYBER THREAT DETECTION

Fig 7: Activity Diagram

3.7.6 DEPLOYMENT DIAGRAM

A deployment diagram in the Unified Modelling Language models the physical


deployment of artifacts on nodes. To describe a web site, for example, a deployment diagram
would show what hardware components ("nodes") exist (e.g., a web server, an application
server, and a database server), what software components ("artifacts") run on each node (e.g.,
web application, database), and how the different pieces are connected eg. JDBC, REST.

Fig 8: Deployment
diagram

3.7.7 Component Diagram


In the Unified Modelling Language, a component diagram depicts how components
are wired together to form larger components and or software systems. They are used to
illustrate the structure of arbitrarily complex systems

Fig 9: Component Diagram

DEPT OF MCA, RGMCET, NANDYAL 19


CYBER THREAT DETECTION

3.8 HARDWARW & SOFTWARE REQUIREMENTS

3.8.1 SOFTWARE REQUIREMENTS

Operating System : Windows XP/7/8

Front End : JSP

Database : MYSQL

Programming : Java

3.8.2 HARDWARE REQUIREMENTS

Processor : Pentium Dual Core/ Core to Duo/ ICore with

Minimum 1.2 GHZ Speed

RAM : 2 GB

Hard Disk : 120 GB

DEPT OF MCA, RGMCET, NANDYAL 20


CYBER THREAT DETECTION

DEPT OF MCA, RGMCET, NANDYAL 21


CYBER THREAT DETECTION

4.IMPLEMENTATION

4.1 Java Technology

Java technology is both a programming language and a platform.

The Java Programming Language:

The Java programming language is a high-level language that can be characterized by


all of the following buzzwords:

➢ Simple

➢ Architecture neutral

➢ Object oriented

➢ Portable

➢ Distributed

➢ High performance

➢ Interpreted

➢ Multi-threaded

➢ Robust

➢ Dynamic

➢ Secure

With most programming languages, you either compile or interpret a program so that
you can run it on your computer. The Java programming language is unusual in that a
program is both compiled and interpreted. With the compiler, first you translate a
program into an intermediate language called Java byte codes the platform-independent
codes interpreted by the interpreter on the Java platform.
The interpreter parses and runs each Java byte code instruction on the computer.
Compilation happens just once; interpretation occurs each time the program is executed.
The following figure illustrates how this works.

DEPT OF MCA, RGMCET, NANDYAL 22


CYBER THREAT DETECTION

Fig 9: Java Execution

If we think of Java byte codes as the machine code instructions for the Java Virtual
Machine (Java VM). Every Java interpreter, whether it’s a development tool or a Web
browser that can run applets, is an implementation of the Java VM. Java byte codes help
make “write once, run anywhere” possible. You can compile your program into byte
codes on any platform that has a Java compiler. The byte codes can then be run on any
implementation of the Java VM. That means that as long as a computer has a Java VM,
the same program written in the Java programming language can run on Windows 2000,
a Solaris workstation, or on an iMac.

Fig 10: Java Execution for different connections

DEPT OF MCA, RGMCET, NANDYAL 23


CYBER THREAT DETECTION

4.1.1 Java Platform

A platform is the hardware or software environment in which a program runs. We’ve


already mentioned some of the most popular platforms like Windows 2000, Linux,
Solaris, and MacOS. Most platforms can be described as a combination of the operating
system and hardware. The Java platform differs from most other platforms in that it’s a
software-only platform that runs on top of other hardware-based platforms.

The Java platform has two components:

• The Java Virtual Machine (Java VM)

• The Java Application Programming Interface (Java API)


You’ve already been introduced to the Java VM. It’s the base for the Java platform and
is ported onto various hardware-based platforms.

The Java API is a large collection of ready-made software components that


provide many useful capabilities, such as graphical user interface (GUI) widgets. The
Java API is grouped into libraries of related classes and interfaces; these libraries are
known as packages. In the next section, What Can Java Technology Do? Highlights
what functionality some of the packages in the Java API provide.

What Can Java Technology Do?

The most common types of programs written in the Java programming language are
applets and applications. If you’ve surfed the Web, you’re probably already familiar
with applets. An applet is a program that adheres to certain conventions that allow it to
run within a Java-enabled browser.
However, the Java programming language is not just for writing cute,
entertaining applets for the Web. The general-purpose, high-level Java programming
language is also a powerful software platform. Using the generous API, you can write
many types of programs.
An application is a standalone program that runs directly on the Java platform. A
special kind of application known as a server serves and supports clients on a network.
Examples of servers are Web servers, proxy servers, mail servers, and print servers.
Another specialized program is a Servlet. A Servlet can almost be thought of as an applet
that runs on the server side. Java Servlets are a popular choice for building interactive

DEPT OF MCA, RGMCET, NANDYAL 24


CYBER THREAT DETECTION

web applications, replacing the use of CGI scripts. Servlets are similar to applets in that
they are runtime extensions of applications. Instead of working in browsers, though,
Servlets run within Java Web servers, configuring or tailoring the server.
How does the API support all these kinds of programs? It does so with packages
of software components that provides a wide range of functionality. Every full
implementation of the Java platform gives you the following features:
How does the API support all these kinds of programs? It does so with packages of
software components that provides a wide range of functionality. Every full
implementation of the Java platform gives you the following features:
 The essentials: Objects, strings, threads, numbers, input and output, data
structures, system properties, date and time, and so on.
 Applets: The set of conventions used by applets.

 Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data


gram

Protocol) sockets, and IP (Internet Protocol) addresses.

 Internationalization: Help for writing programs that can be localized for users
worldwide. Programs can automatically adapt to specific locales and be displayed
in the appropriate language.
 Security: Both low level and high level, including electronic signatures, public
and private key management, access control, and certificates.
 Software components: Known as JavaBeansTM, can plug into existing
component architectures.
 Object serialization: Allows lightweight persistence and communication via
Remote

Method Invocation (RMI).

 Java Database Connectivity (JDBC): Provides uniform access to a wide range


of relational databases.
The Java platform also has API s for 2D and 3D graphics, accessibility, servers,
collaboration, telephony, speech, animation, and more. The following figure depicts
what is included in the Java 2 SDK.

DEPT OF MCA, RGMCET, NANDYAL 25


CYBER THREAT DETECTION

4.1.2 ODBC

Microsoft Open Database Connectivity (ODBC) is a standard programming interface for


application developers and database systems providers. Before ODBC became a de facto
standard for Windows programs to interface with database systems, programmers had to
use proprietary languages for each database they wanted to connect to. Now, ODBC has
made the choice of the database system almost irrelevant from a coding perspective,
which is as it should be. Application developers have much more important things to
worry about than the syntax that is needed to port this program from one database to
another when business needs suddenly change.
Through the ODBC Administrator in Control Panel, you can specify the
particular database that is associated with a data source that an ODBC application
program is written to use. Think of an ODBC data source as a door with a name on it.
Each door will lead you to a particular database. For example, the data source named
Sales Figures might be a SQL Server database, whereas the Accounts Payable data
source could refer to an Access database. The physical database referred to by a data
source can reside anywhere on the LAN.

4.1.3 JDBC

In an effort to set an independent database standard API for Java; Sun Microsystems
developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database
access mechanism that provides a consistent interface to a variety of RDBMSs. This
consistent interface is achieved through the use of “plug-in” database connectivity
modules, or drivers. If a database vendor wishes to have JDBC support, he or she must
provide the driver for each platform that the database and Java run on.
To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC.
As you discovered earlier in this chapter, ODBC has widespread support on a variety of
platforms. Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market
much faster than developing a completely new connectivity solution.
JDBC Goals
Few software packages are designed without goals in mind. JDBC is one that, because
of its many goals, drove the development of the API. These goals, in conjunction with
early reviewer feedback, have finalized the JDBC class library into a solid framework
for building database applications in Java.

DEPT OF MCA, RGMCET, NANDYAL 26


CYBER THREAT DETECTION

The goals that were set for JDBC are important. They will give you some insight as to
why certain classes and functionalities behave the way they do. The design goals for
JDBC are as follows:
SQL Level API

The designers felt that their main goal was to define a SQL interface for Java. Although
not the lowest database interface level possible, it is at a low enough level for higher-
level tools and APIs to be created. Conversely, it is at a high enough level for application
programmers to use it confidently. Attaining this goal allows for future tool vendors to
“generate” JDBC code and to hide many of JDBC’s complexities from the end user.

JDBC must be implemental on top of common database interfaces:

The JDBC SQL API must “sit” on top of other common SQL level APIs. This goal
allows JDBC to use existing ODBC level drivers by the use of a software interface. This
interface would translate JDBC calls to ODBC and vice versa.
Provide a Java interface that is consistent with the rest of the Java system:

Because of Java’s acceptance in the user community thus far, the designers feel that they
should not stray from the current design of the core Java system.

Keep it simple
This goal probably appears in all software design goal listings. JDBC is no exception.
Sun felt that the design of JDBC should be very simple, allowing for only one method of
completing a task per mechanism. Allowing duplicate functionality only serves to
confuse the users of the API.

4.2 Overview of DBMS

A Database Management System (DBMS) is a collection of interrelated data and set of


programs to access those data. The primary goal of DBMS is to provide a way to store
and retrieve database information.
4.2.1 Data Abstraction

Abstraction means to provide necessary information without considering the


background details. There are three levels of abstraction for a DBMS.
• Physical level: It is lowest level of abstraction, which describes how the data
was actually store on secondary device such as disks and tapes.

DEPT OF MCA, RGMCET, NANDYAL 27


CYBER THREAT DETECTION

• Logical level: It is a second level of abstraction, which describes what data are
stored in the database, and what relationships exist among those data. Database
Administrators decide what data is to be kept in the database.
• View level: It is the highest level of abstraction, which describes only a part of
the entire database. The view level of abstraction exists to simplify their
interaction with the system. The system may provide many views for the same
database.
4.2.2 Instances and Schema

The collection of information stored in a database at a particular moment is called an


instance.

The overall design of a database is called a schema.

4.3 Data Models

A Data Model is a collection of conceptual tools for describing data, data relationship,
data semantic and consistency constraints. Various data models available are discussed
below.
4.3.1 The Entity Relationship Model
E-R model is a data model used to describe the data involved in a real-world enterprise.
It describes the data in the form of entities and relationships. An entity is a ‘thing’ (or
‘object’) in the real world that can be easily distinguishable from other things. A
relationship is an association among several entities.
4.3.2 Relational Model
The Relational Model uses a collection of tables to represent both data and the
relationships among the data. Each table has multiple columns, and each column has a
unique name.
4.4 Database Languages

A database system provides data definition language and data manipulation language.

4.4.1 Data Definition Language

Data Definition Language (DDL) consists of a set of definitions used to specify data
base schema. Execution of DDL statement results in a set of tables. These tables are

DEPT OF MCA, RGMCET, NANDYAL 28


CYBER THREAT DETECTION

stored in a specific area known as data dictionary or data directory. A data directory
contains Meta data.

Meta data is data about data.

4.4.2 Data Manipulation Language

• Retrieval of information stored in the database.

• Insertion of new information into the database.

• Deletion of information from the database.

• Modification of information stored in the database.


Data Manipulation Language (DML) is a language that enables users to access or
manipulate data. There are basically two types.
• Procedural DMLs require a user to specify what data are needed and how to get those
data.
• Declarative DMLs require user to specify what data needed without specifying how to
get those data.
4.5 MYSQL

MYSQL is a relational database management system, which organizes data in the form
of tables. MYSQL is one of many database servers based on RDBMS model, which
manages a series of data that attends three specific things-data structures, data integrity
and data manipulation. With MYSQL cooperative server technology we can realize the
benefits of open, relational systems for all the applications. MYSQL makes efficient use
of all systems resources, on all hardware architecture to deliver unmatched performance,
price performance and scalability. Any DBMS to be called as RDBMS has to satisfy
Dr.E.F. Codd’s rules.
 MYSQL is portable
The MYSQL RDBMS is available on wide range of platforms ranging from PCs to
super computers and as a multi user loadable module for Novel NetWare, if you develop
application on system, you can run the same application on other systems without any
modifications.

DEPT OF MCA, RGMCET, NANDYAL 29


CYBER THREAT DETECTION

 MYSQL is compatible
MYSQL commands can be used for communicating with IBM DB2 mainframe
RDBMS that is different from MYSQL, that is MYSQL compatible with DB2. MYSQL
RDBMS is a high-performance fault tolerant DBMS, which is specially designed for
online transaction processing and for handling large database applications.

DEPT OF MCA, RGMCET, NANDYAL 30


CYBER THREAT DETECTION

 Multithreaded server architecture

MYSQL adaptable multithreaded server architecture delivers scalable high performance


for very large number of users on all hardware architecture including symmetric
multiprocessors (sumps) and loosely coupled multiprocessors. Performance is achieved
by eliminating CPU, I/O, memory and operating system bottlenecks and by optimizing
the SQL Server 2005, DBMS server code to eliminate all internal bottlenecks.

4.5.1 Features of MYSQL

Most popular RDBMS in the market because of its ease of use

• Client/server architecture.

• Ensuring data integrity and data security.

• Parallel processing support for speed up data entry and online transaction
processing used for applications.
• DB procedures, functions and packages.

MYSQL Supports the following the Codd’s rule:

➢ Rule 1: Information Rule (Representation of information)-YES.

➢ Rule 2: Guaranteed Access-YES.

➢ Rule 3: Systematic treatment of Null values-YES.

➢ Rule 4: Dynamic on-line catalogue-based Relational Model-YES.

➢ Rule 5: Comprehensive data sub language-YES.

➢ Rule 6: View Updating-PARTIAL.

➢ Rule 7: High-level Update, Insert and Delete-YES.

➢ Rule 8: Physical data Independence-PARTIAL.

➢ Rule 9: Logical data Independence-PARTIAL.

➢ Rule 10: Integrity Independence-PARTIAL.

➢ Rule 11: Distributed Independence-YES. ➢ Rule 12: Non-subversion-YES.

DEPT OF MCA, RGMCET, NANDYAL 31


CYBER THREAT DETECTION

5.TESTING

5.1 SOFTWARE TESTING TECHNIQUES

Software Testing is a critical element of software quality assurance and represents the
ultimate review of specification, design and coding, Testing presents an interesting anomaly
for the software engineer.

5.1.1 Testing Objectives

1. Testing is a process of executing a program with the intent of finding an error.

2. A good test case is one that has a probability of finding an as yet


undiscovered error.
3. A successful test is one that uncovers an undiscovered error.
These above objectives imply a dramatic change in view port.
Testing cannot show the absence of defects, it can only show that software errors are
present.
5.1.2 Test Case Design

Any engineering product can be tested in one of two ways:

White Box Testing

This testing is also called as glass box testing. In this testing, by knowing the
specified function that a product has been designed to perform test can be conducted that
demonstrates each function is fully operation at the same time searching for errors in each
function. It is a test case design method that uses the control structure of the procedural
design to derive test cases. Basis path testing is a white box testing.

Basis Path Testing

 Flow graph notation


 Cyclamate Complexity
Deriving test cases Control Structure Testing

 Condition testing
 Data flow testing
 Loop testing

DEPT OF MCA, RGMCET, NANDYAL 32


CYBER THREAT DETECTION

Black Box Testing


In this testing by knowing the internal operation of a product, tests can be conducted
to ensure that “all gears mesh”, that is the internal operation performs according to
specification and all internal components have been adequately exercised. It fundamentally
focuses on the functional requirements of the software.
The steps involved in black box test case design are:
 Graph based testing methods
 Equivalence partitioning
 Boundary value analysis
 Comparison testing
 graph matrices

5.2 SOFTWARE TESTING STRATEGIES


A Strategy for software testing integrates software test cases into a series of well-
planned steps that result in the successful construction of software. Software testing is a
broader topic for what is referred to as Verification and Validation. Verification refers to the
set of activities that ensure that the software correctly implements a specific function.
Validation refers the set of activities that ensure that the software that has been built is
traceable to customer’s requirements.

5.2.1 Unit Testing

Unit testing focuses verification effort on the smallest unit of software design that is
the module. Using procedural design description as a guide, important control paths are
tested to uncover errors within the boundaries of the module. The unit test is normally white
box testing oriented and the step can be conducted in parallel for multiple modules.

5.2.2 Integration Testing

Integration testing is a systematic technique for constructing the program structure,


while conducting test to uncover errors associated with the interface. The objective is to take
unit tested methods and build a program structure that has been dictated by design.

DEPT OF MCA, RGMCET, NANDYAL 33


CYBER THREAT DETECTION

 Top-Down Integration

Top-down integrations is an incremental approach for construction of program


structure. Modules are integrated by moving downward through the control hierarchy,
beginning with the main control program. Modules subordinate to the main program are
incorporated in the structure either in the breath-first or depth-first manner.

 Bottom-up Integration

This method as the name suggests, begins construction and testing with atomic
modules i.e., modules at the lowest level. Because the modules are integrated in the bottom
up manner the processing required for the modules subordinate to a given level is always
available and the need for stubs is eliminated.

 Regression Testing

In this contest of an integration test strategy, regression testing is the re execution of


some subset of test that have already been conducted to ensure that changes have not
propagate unintended side effects.

5.2.3 Validation Testing

At the end of integration testing software is completely assembled as a package.


Validation testing is the next stage, which can be defined as successful when the software
functions in the manner reasonably expected by the customer. Reasonable expectations are
those defined in the software requirements specifications. Information contained in those
sections form a basis for validation testing approach.

Validation Test Criteria

Software validation is achieved through a series of black-box tests that demonstrate


conformity with requirement. A test plan outlines the classes of tests to be conducted, and a
test procedure defines specific test cases that will be used in an attempt to uncover errors in
conformity with requirements. Both the plan and procedure are designed to ensure that all
functional requirements are satisfied, all performance requirements are achieved,
documentation is correct and human-engineered; and other requirements are met.

After each validation test case has been conducted, one of two possible conditions
exists: (1) The function or performance characteristics conform to specification and are
accepted, or (2) a deviation from specification is uncovered and a deficiency list is created.

DEPT OF MCA, RGMCET, NANDYAL 34


CYBER THREAT DETECTION

Deviation or error discovered at this stage in a project can rarely be corrected prior to
scheduled completion. It is often necessary to negotiate with the customer to establish a
method for resolving deficiencies.

Configuration Review

An important element of the validation process is a configuration review. The intent


of the review is to ensure that all elements of the software configuration have been properly
developed, are catalogued, and have the necessary detail to support the maintenance phase of
the software life cycle. The configuration review sometimes called an audit.

Alpha and Beta Testing

It is virtually impossible for a software developer to foresee how the customer will
really use a program. Instructions for use may be misinterpreted. Strange combination of data
may be regularly used; and output that seemed clear to the tester may be unintelligible to a
user in the field.

When custom software is built for one customer, a series of acceptance tests are
conducted to enable the customer to validate all requirements. Conducted by the end user
rather than the system developer, an acceptance test can range from an informal “test drive”
to a planned and systematically executed series of tests. In fact, acceptance testing can be
conducted over a period of weeks or months, thereby uncovering cumulative errors that might
degrade the system over time.

The beta test is conducted at one or more customer sites by the end user of the
software. Unlike alpha testing, the developer is generally not present. Therefore, the beta test
is a “live” application of the software in an environment that cannot be controlled by the
developer. The customer records all problems that are encountered during beta testing and
reports these to the developer at regular intervals. As a result of problems reported during
beta test, the software developer makes modification and then prepares for release of the
software product to the entire customer base.

5.2.4 System Testing

System testing is actually a series of different tests whose primary purpose is to fully
exercise the computer-based system. Although each test has a different purpose, all work to
verify that all system elements have been properly integrated to perform allocated functions.

DEPT OF MCA, RGMCET, NANDYAL 35


CYBER THREAT DETECTION

DEPT OF MCA, RGMCET, NANDYAL 36


CYBER THREAT DETECTION

5.2.5 Security Testing

Attempts to verify the protection mechanisms built into the system.

5.2.6 Performance Testing

This method is designed to test runtime performance of software within the context
of an integrated system.

5.3 TEST CASES

Table 1: Test Case Results

TEST EXPECTED ACTUAL


S. No. INPUT STATUS
CASES RESULT RESULT

User Enter all User gets Registration


1 pass
Registration fields registered is successful

User if user miss User not Registration is


2 fail
Registration any field registered un successful
Give the user Admin home Admin home
Admin
3 name and page should Page has been pass
Login
password be opened opened
Give User page
User page has
4 User Login Username and should be pass
been opened l
password opened
Give
User page User name and
Username
5 User Login should not be password is fail
without
opened invalid
Password
Upload Add Select the to Upload to the Post Upload
6 pass
file upload file Database Success Fully

DEPT OF MCA, RGMCET, NANDYAL 37


CYBER THREAT DETECTION

6.OUTPUT SCREENS

HOME PAGE:

Screen 1: Home Page of Project

DEPT OF MCA, RGMCET, NANDYAL 38


CYBER THREAT DETECTION

Screen 2: Login Page

DEPT OF MCA, RGMCET, NANDYAL 39


CYBER THREAT DETECTION

Screen 3: Admin Page

DEPT OF MCA, RGMCET, NANDYAL 40


CYBER THREAT DETECTION

Screen 4: Add Indexer

DEPT OF MCA, RGMCET, NANDYAL 41


CYBER THREAT DETECTION

Screen 5: View URL

DEPT OF MCA, RGMCET, NANDYAL 42


CYBER THREAT DETECTION

Screen 6: Attacker Information

DEPT OF MCA, RGMCET, NANDYAL 43


CYBER THREAT DETECTION

Screen 7: view malware URL

DEPT OF MCA, RGMCET, NANDYAL 44


CYBER THREAT DETECTION

Screen 8: Search Page

DEPT OF MCA, RGMCET, NANDYAL 45


CYBER THREAT DETECTION

Screen 9: Search Result

DEPT OF MCA, RGMCET, NANDYAL 46


CYBER THREAT DETECTION

7.CONCLUSION
In this paper, we have proposed the AI-SIEM system using event proles and articular neural
networks. The novelty of our work lies in condensing very large-scale data into event proles
and using the deep learning-based detection methods for enhanced cyber-threat detection
ability. The AI-SIEM system enables the security analysts to deal with Signiant security alerts
promptly and cogently by comparing long-term security data. By reducing false positive
alerts, it can also help the security analysts to rapidly respond to cyber threats dispersed
across a large number of security events.

DEPT OF MCA, RGMCET, NANDYAL 47


CYBER THREAT DETECTION

8.REFERENCES
[1] Gnu octave: high-level interpreted language. http://www.gnu.org/software/octave/.

[2] hp hosts, a community managed hosts file. http://hphosts.gt500.org/hosts.txt.

[3] Joewein.de LLC blacklist. http://www.joewein.net/dl/bl/dom-bl-base.txt.

[4] Lookout. https://play.google.com/store/apps/details?hl=en&id=com.lookout.

[5] Malware Domains List. http://mirror1.malwaredomains.com/files/domains.txt.

[6] Phish tank. http://www.phishtank.com/.

[7] Pindrop phone reputation service. http://pindropsecurity.com/phone-fraud-solutions/phone


reputation service prs/.

[8] Scrapy — an open-source web scraping framework for python. http://scrapy.org/.

[9] Virus Total. https://www.virustotal.com/en/.

[10] Google developers: Safe Browsing API. https://developers.google.com/safe-browsing/,


2012.

DEPT OF MCA, RGMCET, NANDYAL 48


CYBER THREAT DETECTION

DEPT OF MCA, RGMCET, NANDYAL 49


CYBER THREAT DETECTION

DEPT OF MCA, RGMCET, NANDYAL 50


CYBER THREAT DETECTION

DEPT OF MCA, RGMCET, NANDYAL 51

You might also like