Final Document
Final Document
Submitted by:
(22091F0019)
(ESTD – 1995)
CERTIFICATE
This is to certify that ANDIRAJU KESHAVA KRISHNA (22091F0019), of MCA III-
semester, has carried out the mini-project work entitled “CYBER THREAT DETECTION”
under the supervision and guidance of Mr. M. Ravi Kumar, Assistant. Professor, MCA
Department, in partial fulfillment of the requirements for the award of Degree of Master of
Computer Applications from Rajeev Gandhi Memorial College of Engineering &
Technology (Autonomous), Nandyal is a bonafied record of the work done by her during 2023-
2024.
Assistant
Professor, Dept. of
MCA
Place: Nandyal
Head of the Department
External Examiner
ACKNOWLEDGEMENT
I express my gratitude to Dr. K. Subba Reddy garu, Head of the Department of Computer
Science Engineering & MCA departments, all teaching and non-teaching staff of the Computer
Science Engineering department of Rajeev Gandhi memorial College of Engineering and
Technology for providing continuous encouragement and cooperation at various steps of my
project.
At the outset I thank to honorable Chairman Dr. M. SanthiRamudu garu, for providing
us with exceptional faculty and moral support throughout the course.
Finally, I extend my sincere thanks to all the Staff Members of MCA & CSE
Departments who have co-operated and encouraged us in making my project successful.
Whatever one does, whatever one achieves, the first credit goes to the Parents be it not
for their love and affection, nothing would have been responsible. I have seen every good that
happens to us their love and blessings.
BY
ANDIRAJU KESHAVA KRISHNA (22091F0019)
CONTENTS
CHAPTER PAGE NO.
1.INTRODUCTION
2. LITERATURE SURVEY
2.1.1. Disadvantages
2.2.1. Advantages
3. SYSTEM DESIGN
3.4 Algorithms
4. IMPLEMENTATION
4.1.2 ODBC
4.1.3 JDBC
4.5 MYSQL
6. OUTPUT SCREENS
7. CONCLUSION
8. REFERENCE
LIST OF FIGURES
Mobile specific webpages differ significantly from their desktop counterparts in content,
layout and functionality. Accordingly, existing techniques to detect malicious websites are
unlikely to work for such webpages. In this paper, we design and implement KAYO, a
mechanism that distinguishes between malicious and benign mobile webpages. KAYO makes
this determination based on static features of a webpage ranging from the number of I frames
to the presence of known fraudulent phone numbers.
First, we experimentally demonstrate the need for mobile specific techniques and then
identify a range of new static features that highly correlate with mobile malicious webpages.
We then apply KAYO to a dataset of over 350,000 known benign and malicious mobile
webpages and demonstrate90% accuracy in classification. Moreover, we discover,
characterize and report a number of webpages missed by Google Safe Browsing and Virus
Total, but detected by KAYO. Finally, we build a browser extension using KAYO to protect
users from malicious mobile websites in real-time. In doing so, we provide the first static
analysis technique to detect malicious mobile webpages.
CYBER THREAT DETECTION
CHAPTER-1
INTRODUCTION
The security information and event management (SIEM) has been focusing on
collecting and managing the alerts of IPSs. The SIEM is the most common and dependable
solution among various security operations solutions to analyse the collected security events
and logs. Moreover, security analysts make an effort to investigate suspicious alerts by
policies and threshold, and to discover malicious behaviour by analysing correlations among
events, using knowledge related to attacks. For this, the proposed the AI-SIEM system
particularly includes an event pattern extraction method by aggregating together events with a
concurrency feature and correlating between event sets in collected data. Our event proles
have the potential to provide concise input data for various deep neural networks. Moreover,
it enables the analyst to handle all the data promptly and efficiently by comparison with long-
term history data.
CHAPTER-2
LITERATURE REVIEW
Traditionally, there are two primary systems for detecting cyber-threats and network
intrusions. An intrusion prevention system (IPS) is installed in the enterprise network, and
can examine the network protocols and owe with signature-based methods primarily. It
generates appropriate intrusion The associate editor coordinating the review of this
manuscript and approving it for publication was Chi-Yuan Chen. alerts, called the security
events, and reports the generating alerts to another system, such as SIEM. The security
information and event management (SIEM) has been focusing on collecting and managing
the alerts of IPSs.
2.1.1 DISADVANTAGES
It is still difficult to recognize and detect intrusions against intelligent network attacks
owing to their high false alerts and the huge amount of security data
These learning-based approaches require to learn the attack model from historical
threat data and use the trained models to detect intrusions for unknown cyber threats
2.2.1 ADVANTAGES
For cyber-threat detection, the SIEM analysts spend an immense amount of effort and
time to differentiate between true security alerts and false security alerts in collected
events.
The Data security is more since data co-owners can renew the cipher texts by
appending their access policies as the dissemination conditions.
The system is more secured due to Continuous policy enforcement in which the data
owner’s access policy is enforced in the initial cipher text as well as the renewed
cipher text.
CHAPTER-3
SYSTEM DESIGN
Analysis is a logical process. The objective of this phase is to determine exactly what
must be done to solve the problem. Tools such as Class Diagrams, Sequence Diagrams, data
flow diagrams and data dictionary are used in developing a logical model of system.
Thus, it may be normal to pick the right SDLC model as shown by the specific
concerns and necessities of the endeavour to ensure its flourishing. I composed one more on
the most proficient method to pick the right SDLC, it can follow this connection for more
data. Besides, to dive more deeply into programming life testing and SDLC stages are follow
the connections featured here.
It will investigate the various kinds of SDLC models and the benefits and disservices
of every one and when to utilize them.
That can imagine SDLC models as devices that can use to all the more likely convey
product project. Thusly, knowing and seeing each model and when to utilize it, the benefits
and drawbacks of every one is essential to know which one is appropriate for the undertaking
setting.
Waterfall Model
V-Shaped Model
Evolutionary Prototyping Model
Spiral Method (SDM)
Iterative and Incremental Method
Agile development
In the first module, we develop the System environment model. Website providers use
JavaScript rouser agent strings to identify and then redirect mobile users to a mobile specific
version. We note that not all static features used in existing techniques differ when measured
on mobile and desktop webpages. Mobile websites enable access to a user’s personal
information and advanced capabilities of mobile devices through weapons. Existing static
analysis techniques do not consider these mobile specific functionalities in their feature set.
We argue and later demonstrate that accounting further mobile specific functionalities helps
identify new threats specific to the mobile web. For example, the presence of a known ‘bank’
fraud number on a website might indicate that the webpage is a phishing webpage imitating
the same bank.
We argue that benign webpage writers take effort to provide good user experience,
whereas the goal for malicious webpage authors is to trick users into performing
unintentional actions with minimal effort. We therefore examine whether a webpage has no
script content admeasure the number of no script. Intuitively, a benign webpage writer will
have more no script in the code tonsure good experience even for a security savvy user.
We extract static features from a webpage and make predictions about its potential
maliciousness. We first discuss the feature set used in kayo followed by the collection process
of the dataset. Structural and lexical properties of a URL have been used to differentiate
between malicious and benign Webpages. However, using only URL features for such
differentiation leads to a high false positive rate.
Our data gathering process included acculating label benign and malicious mobile specific
webpages. First, we describe an experiment that identifies and defines ‘mobile specific
webpage’s. We then conduct the data collection process. We use these crawls specifically
because they are closet the publication of the related work, making them as close to
equivalent as possible.
3.4 Algorithms
3.4.1 Navie Bayes
Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
It is mainly used in text classification that includes a high-dimensional training
dataset.
Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can make
quick predictions.
It is a probabilistic classifier, which means it predicts on the basis of the probability of
an object.
Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
Functional requirements describe what the system should do. The functional
requirements can be further categorized as follows:
The input design is the link between the information system and the user. It comprises
the developing specification and procedures for data preparation and the steps are necessary
to put transaction data in to a usable form for processing that can be achieved by inspecting
the computer to read data from a written or printed document or it can occur by having
people keying the data directly into the system. The design of input focuses on controlling the
amount of input required, controlling the errors, avoiding delay, avoiding extra steps and
keeping the process simple. The input is designed in such a way so that it provides security
and ease of use with retaining the privacy. Input Design considered the following things:
4. Methods for preparing input validations and steps to follow when error occur.
User Interfaces
Software Interfaces
All projects are feasible when provided with unlimited resources and infinite time.
Unfortunately, the development of computer-based system or product is more likely plagued
by a scarcity of resources and difficult delivery dates. It is both necessary and prudent to
evaluate the feasibility of a project
Feasibility and risk analysis are related in many ways. If project risk is great the
feasibility of producing quality software is reduced. During product engineering, however, we
concentrate our attention on four primary areas of interest.
GUI is developed using HTML to capture the information from the customer. HTML
is used to display the content on the browser. It uses TCP/IP protocol. It is an interpreted
language. It is very easy to develop a page/document using HTML some RAD (Rapid
Application Development) tools are provided to quickly design/develop our application. So
many objects such as button, text fields, and text area etc. are provided to capture the
information from the customer.
The economic issues usually arise during the economic feasibility stage are whether
the system will be used if it is developed and implemented, whether the financial benefits are
equal are exceeds the costs. The cost for developing the project will include cost conducts full
system investigation, cost of hardware and software for the class of being considered, the
benefits in the form of reduced costs or fewer costly errors.
In our application front end is developed using GUI. So, it is very easy to the customer
to enter the necessary information. But customer must have some knowledge on using web
applications before going to use our application.
As the strategic value of software increases for many companies, the industry looks
for techniques to automate the production of software and to improve quality and reduce cost
and time-to-market. These techniques include component technology, visual programming,
patterns and frameworks. Businesses also seek techniques to manage the complexity of
systems as they increase in scope and scale. In particular, they recognize the need to solve
recurring architectural problems, such as physical distribution, concurrency, replication,
security, load balancing and fault tolerance. Additionally, the development for the World
Wide Web, while making some things simpler, has exacerbated these architectural problems.
The Unified Modelling Language (UML) was designed to respond to these needs. Simply,
Systems design refers to the process of defining the architecture, components, modules,
interfaces, and data for a system to satisfy specified requirements which can be done easily
through UML diagrams.
In the project four basic UML diagrams have been explained among the following list:
Class Diagram
Sequence Diagram
Activity Diagram
Deployment Diagram
This is one of the most important of the diagrams in development. The diagram
breaks the class into three layers. One has the name, the second describes its attributes and
the third its methods. A padlock to left of the name represents the private attributes.
The relationships are drawn between the classes. Developers use the Class Diagram
to develop the classes. Analyses use it to show the details of the system. Architects look at
class diagrams to see if any class has too many functions and see if they are required to be
split.
Fig 4: Class
Diagram
Activity diagrams are a loosely defined diagram technique for showing workflows of
stepwise activities and actions, with support for choice, iteration and concurrency. In the
Unified Modelling Language, activity diagrams can be used to describe the business and
operational step-by-step workflows of components in a system. An activity diagram shows
the overall flow of control.
Fig 8: Deployment
diagram
Database : MYSQL
Programming : Java
RAM : 2 GB
4.IMPLEMENTATION
➢ Simple
➢ Architecture neutral
➢ Object oriented
➢ Portable
➢ Distributed
➢ High performance
➢ Interpreted
➢ Multi-threaded
➢ Robust
➢ Dynamic
➢ Secure
With most programming languages, you either compile or interpret a program so that
you can run it on your computer. The Java programming language is unusual in that a
program is both compiled and interpreted. With the compiler, first you translate a
program into an intermediate language called Java byte codes the platform-independent
codes interpreted by the interpreter on the Java platform.
The interpreter parses and runs each Java byte code instruction on the computer.
Compilation happens just once; interpretation occurs each time the program is executed.
The following figure illustrates how this works.
If we think of Java byte codes as the machine code instructions for the Java Virtual
Machine (Java VM). Every Java interpreter, whether it’s a development tool or a Web
browser that can run applets, is an implementation of the Java VM. Java byte codes help
make “write once, run anywhere” possible. You can compile your program into byte
codes on any platform that has a Java compiler. The byte codes can then be run on any
implementation of the Java VM. That means that as long as a computer has a Java VM,
the same program written in the Java programming language can run on Windows 2000,
a Solaris workstation, or on an iMac.
The most common types of programs written in the Java programming language are
applets and applications. If you’ve surfed the Web, you’re probably already familiar
with applets. An applet is a program that adheres to certain conventions that allow it to
run within a Java-enabled browser.
However, the Java programming language is not just for writing cute,
entertaining applets for the Web. The general-purpose, high-level Java programming
language is also a powerful software platform. Using the generous API, you can write
many types of programs.
An application is a standalone program that runs directly on the Java platform. A
special kind of application known as a server serves and supports clients on a network.
Examples of servers are Web servers, proxy servers, mail servers, and print servers.
Another specialized program is a Servlet. A Servlet can almost be thought of as an applet
that runs on the server side. Java Servlets are a popular choice for building interactive
web applications, replacing the use of CGI scripts. Servlets are similar to applets in that
they are runtime extensions of applications. Instead of working in browsers, though,
Servlets run within Java Web servers, configuring or tailoring the server.
How does the API support all these kinds of programs? It does so with packages
of software components that provides a wide range of functionality. Every full
implementation of the Java platform gives you the following features:
How does the API support all these kinds of programs? It does so with packages of
software components that provides a wide range of functionality. Every full
implementation of the Java platform gives you the following features:
The essentials: Objects, strings, threads, numbers, input and output, data
structures, system properties, date and time, and so on.
Applets: The set of conventions used by applets.
Internationalization: Help for writing programs that can be localized for users
worldwide. Programs can automatically adapt to specific locales and be displayed
in the appropriate language.
Security: Both low level and high level, including electronic signatures, public
and private key management, access control, and certificates.
Software components: Known as JavaBeansTM, can plug into existing
component architectures.
Object serialization: Allows lightweight persistence and communication via
Remote
4.1.2 ODBC
4.1.3 JDBC
In an effort to set an independent database standard API for Java; Sun Microsystems
developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database
access mechanism that provides a consistent interface to a variety of RDBMSs. This
consistent interface is achieved through the use of “plug-in” database connectivity
modules, or drivers. If a database vendor wishes to have JDBC support, he or she must
provide the driver for each platform that the database and Java run on.
To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC.
As you discovered earlier in this chapter, ODBC has widespread support on a variety of
platforms. Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market
much faster than developing a completely new connectivity solution.
JDBC Goals
Few software packages are designed without goals in mind. JDBC is one that, because
of its many goals, drove the development of the API. These goals, in conjunction with
early reviewer feedback, have finalized the JDBC class library into a solid framework
for building database applications in Java.
The goals that were set for JDBC are important. They will give you some insight as to
why certain classes and functionalities behave the way they do. The design goals for
JDBC are as follows:
SQL Level API
The designers felt that their main goal was to define a SQL interface for Java. Although
not the lowest database interface level possible, it is at a low enough level for higher-
level tools and APIs to be created. Conversely, it is at a high enough level for application
programmers to use it confidently. Attaining this goal allows for future tool vendors to
“generate” JDBC code and to hide many of JDBC’s complexities from the end user.
The JDBC SQL API must “sit” on top of other common SQL level APIs. This goal
allows JDBC to use existing ODBC level drivers by the use of a software interface. This
interface would translate JDBC calls to ODBC and vice versa.
Provide a Java interface that is consistent with the rest of the Java system:
Because of Java’s acceptance in the user community thus far, the designers feel that they
should not stray from the current design of the core Java system.
Keep it simple
This goal probably appears in all software design goal listings. JDBC is no exception.
Sun felt that the design of JDBC should be very simple, allowing for only one method of
completing a task per mechanism. Allowing duplicate functionality only serves to
confuse the users of the API.
• Logical level: It is a second level of abstraction, which describes what data are
stored in the database, and what relationships exist among those data. Database
Administrators decide what data is to be kept in the database.
• View level: It is the highest level of abstraction, which describes only a part of
the entire database. The view level of abstraction exists to simplify their
interaction with the system. The system may provide many views for the same
database.
4.2.2 Instances and Schema
A Data Model is a collection of conceptual tools for describing data, data relationship,
data semantic and consistency constraints. Various data models available are discussed
below.
4.3.1 The Entity Relationship Model
E-R model is a data model used to describe the data involved in a real-world enterprise.
It describes the data in the form of entities and relationships. An entity is a ‘thing’ (or
‘object’) in the real world that can be easily distinguishable from other things. A
relationship is an association among several entities.
4.3.2 Relational Model
The Relational Model uses a collection of tables to represent both data and the
relationships among the data. Each table has multiple columns, and each column has a
unique name.
4.4 Database Languages
A database system provides data definition language and data manipulation language.
Data Definition Language (DDL) consists of a set of definitions used to specify data
base schema. Execution of DDL statement results in a set of tables. These tables are
stored in a specific area known as data dictionary or data directory. A data directory
contains Meta data.
MYSQL is a relational database management system, which organizes data in the form
of tables. MYSQL is one of many database servers based on RDBMS model, which
manages a series of data that attends three specific things-data structures, data integrity
and data manipulation. With MYSQL cooperative server technology we can realize the
benefits of open, relational systems for all the applications. MYSQL makes efficient use
of all systems resources, on all hardware architecture to deliver unmatched performance,
price performance and scalability. Any DBMS to be called as RDBMS has to satisfy
Dr.E.F. Codd’s rules.
MYSQL is portable
The MYSQL RDBMS is available on wide range of platforms ranging from PCs to
super computers and as a multi user loadable module for Novel NetWare, if you develop
application on system, you can run the same application on other systems without any
modifications.
MYSQL is compatible
MYSQL commands can be used for communicating with IBM DB2 mainframe
RDBMS that is different from MYSQL, that is MYSQL compatible with DB2. MYSQL
RDBMS is a high-performance fault tolerant DBMS, which is specially designed for
online transaction processing and for handling large database applications.
• Client/server architecture.
• Parallel processing support for speed up data entry and online transaction
processing used for applications.
• DB procedures, functions and packages.
5.TESTING
Software Testing is a critical element of software quality assurance and represents the
ultimate review of specification, design and coding, Testing presents an interesting anomaly
for the software engineer.
This testing is also called as glass box testing. In this testing, by knowing the
specified function that a product has been designed to perform test can be conducted that
demonstrates each function is fully operation at the same time searching for errors in each
function. It is a test case design method that uses the control structure of the procedural
design to derive test cases. Basis path testing is a white box testing.
Condition testing
Data flow testing
Loop testing
Unit testing focuses verification effort on the smallest unit of software design that is
the module. Using procedural design description as a guide, important control paths are
tested to uncover errors within the boundaries of the module. The unit test is normally white
box testing oriented and the step can be conducted in parallel for multiple modules.
Top-Down Integration
Bottom-up Integration
This method as the name suggests, begins construction and testing with atomic
modules i.e., modules at the lowest level. Because the modules are integrated in the bottom
up manner the processing required for the modules subordinate to a given level is always
available and the need for stubs is eliminated.
Regression Testing
After each validation test case has been conducted, one of two possible conditions
exists: (1) The function or performance characteristics conform to specification and are
accepted, or (2) a deviation from specification is uncovered and a deficiency list is created.
Deviation or error discovered at this stage in a project can rarely be corrected prior to
scheduled completion. It is often necessary to negotiate with the customer to establish a
method for resolving deficiencies.
Configuration Review
It is virtually impossible for a software developer to foresee how the customer will
really use a program. Instructions for use may be misinterpreted. Strange combination of data
may be regularly used; and output that seemed clear to the tester may be unintelligible to a
user in the field.
When custom software is built for one customer, a series of acceptance tests are
conducted to enable the customer to validate all requirements. Conducted by the end user
rather than the system developer, an acceptance test can range from an informal “test drive”
to a planned and systematically executed series of tests. In fact, acceptance testing can be
conducted over a period of weeks or months, thereby uncovering cumulative errors that might
degrade the system over time.
The beta test is conducted at one or more customer sites by the end user of the
software. Unlike alpha testing, the developer is generally not present. Therefore, the beta test
is a “live” application of the software in an environment that cannot be controlled by the
developer. The customer records all problems that are encountered during beta testing and
reports these to the developer at regular intervals. As a result of problems reported during
beta test, the software developer makes modification and then prepares for release of the
software product to the entire customer base.
System testing is actually a series of different tests whose primary purpose is to fully
exercise the computer-based system. Although each test has a different purpose, all work to
verify that all system elements have been properly integrated to perform allocated functions.
This method is designed to test runtime performance of software within the context
of an integrated system.
6.OUTPUT SCREENS
HOME PAGE:
7.CONCLUSION
In this paper, we have proposed the AI-SIEM system using event proles and articular neural
networks. The novelty of our work lies in condensing very large-scale data into event proles
and using the deep learning-based detection methods for enhanced cyber-threat detection
ability. The AI-SIEM system enables the security analysts to deal with Signiant security alerts
promptly and cogently by comparing long-term security data. By reducing false positive
alerts, it can also help the security analysts to rapidly respond to cyber threats dispersed
across a large number of security events.
8.REFERENCES
[1] Gnu octave: high-level interpreted language. http://www.gnu.org/software/octave/.