Emerging trends in Computer and Information Technology Data Mining
(22618)
A
Micro-Project Report
On
“Credit Card Fraud Detection
using Data Mining”
Submitted By
Jidnyasa Chavan (23)
Under Guidance Of
Mrs. K. G. Raut
Diploma Course in Computer Technology
(As per directives of I Scheme, MSBTE)
Sinhgad Technical Education Society’s
SOU. VENUTAI CHAVAN POLYTECHNIC, PUNE -411046
The Academic Year 2022– 2023
1
Emerging trends in Computer and Information Technology Data Mining
(22618)
MAHARASHTRA STATE BOARD OF
TECHNICAL EDUCATION
Certificate
This is to certify that Mast /Ms. Jidnyasa Chavan Roll No. 23 of Semester VI of Diploma
in Computer Technology of Institute STES’s Sou. Venutai Chavan Polytechnic (Code:
0040) has completed the Micro Project satisfactorily in Subject Emerging Trends in
Computer and Information Technology (22618) for the academic year 2022 – 2023 as
prescribed in the curriculum.
Program Code: CM Course Code: CM/6/I
Place: Pune Enrollment No: 2000400244
Date: Exam. Seat No:
(Mrs. K. G. Raut) (Mrs.A.V.Kurkute) (Dr.M.S.Jadhav)
Subject Teacher Head of Department Principal
2
Emerging trends in Computer and Information Technology Data Mining
(22618)
Sr.No Contents Pg.No.
1. Aim of the Microproject 1
2. Rationale 3
3. Course Outcomes achieved 3
4. Literature Review 3
5. Actual Methodology Followed 4
6. Introduction 5
7. Skills Developed 15
8. Applications 15
3
Emerging trends in Computer and Information Technology Data Mining
(22618)
Annexure – I
Micro-Project Proposal
Credit-Card Fraud Detection using Data Mining
1.0 Aim of the Micro – Project:
This Micro-Project aims at developing a case- study for “Credit card fraud
detection” using data mining.
2.0 Intended Course Outcomes:
a. Develop programs using GUI framework (AWT and Swing).
b. Handle events of AWT and Swing components
c. Develop programs to handle events in Java programming.
d. Develop programs using database.
3.0 Proposed Methodology:
This Micro-Project aims at developing a case- study for “Credit card fraud detection”
using data mining.
1. Study all the concepts of Data Mining
2. Identify the requirements of project.
3. Study how data mining techniques are applied for credit card fraud detection.
4. Prepare the final report.
4.0Action Plan:
Sr. No Details of Planned start Planned finish Name of
Activity date date responsible
team members
1 Identify the aim of the 08/02/2023 22/02/2023
project topic Jidnyasa Chavan
2 Understand which tools or 1/03/2023 09/03/2023
Resources are required Jidnyasa Chavan
3 Study all concepts of data 13/03/2023 20/03/2023
mining
4 Study how data mining 03/04/2023 06/04/2023 Jidnyasa Chavan
techniques are applied 12/04/2023
for credit card fraud
detection
5 Prepare final report 10/04/2023 12/04/2023 Jidnyasa Chavan
6.0 Team Members:
Sr. No Roll. No Name of Student
01 23 Jidnyasa Chavan
4
Emerging trends in Computer and Information Technology Data Mining
(22618)
Annexure – II
Micro-Project Report
Credit Card Fraud Detection using Data Mining
1.0 Rationale:
Advancements and applications of Computer Engineering and Information Technology
are ever changing. Emerging trends aims at creating awareness about major trends that
will define technological disruption in the upcoming years in the field of Computer
Engineering and Information Technology. These are some emerging areas expected to
generate revenue, increasing demand as IT professionals and open avenues of
entrepreneurship.
2.0 Aim of the Micro – Project:
This Micro-Project aims at developing case-study ‘Credit card fraud detection’ using data
mining.
3.0 Course Outcomes Addressed:
a. Develop programs using GUI framework (AWT and Swing).
b. Handle events of AWT and Swing components
c. Develop programs to handle events in Java programming.
e. Develop programs using database.
4.0 Literature Review:
Data mining refers to extracting or “mining” knowledge from large amount of data. Fraudulent
electronic transactions are already a significant problem, one that will grow in importance as
the number of access points in the nation’s financial information system grows. Besides
scalability and efficiency, the fraud-detection task exhibits technical problems that include
skewed distributions of training data and non-uniform cost per error, both of which have not
been widely studied in the knowledge-discovery and data mining community. In this article, we
survey and evaluate a number of techniques that address these three main issues concurrently
5.0 Actual Methodology Followed:
This Micro-Project aims at developing a case study for ‘Credit card fraud
detection’ using Data Mining.
1. Study all the concepts of Data Mining
2. Identify the requirements of project.
3. Study how data mining techniques are applied for credit card fraud detection.
4. Prepare the final report.
5
Emerging trends in Computer and Information Technology Data Mining
(22618)
6.0 Introduction:
The first use of Data Mining comes from service providers in the mobile phone and utilities
industries. Mobile phone and utilities companies use Data Mining and Business Intelligence to
predict ‘churn’, the terms they use for when a customer leaves their company to get their
phone/gas/broadband from another provider. They collate billing information, customer services
interactions, website visits and other metrics to give each customer a probability score, then
target offers and incentives to customers whom they perceive to be at a higher risk of churning.
Retailers segment customers into ‘Recency, Frequency, Monetary’ (RFM) groups and target
marketing and promotions to those different groups. A customer who spends little but often and
last did so recently will be handled differently to a customer who spent big but only once, and
also some time ago. The former may receive a loyalty, upsell and cross-sell offers, whereas the
latter may be offered a win-back deal, for instance.
7.0 Types of Data Mining:
1. Data stored in the database:
A database is also called a database management system or DBMS. Every DBMS stores data
that are related to each other in a way or the other. It also has a set of software programs that are
used to manage data and provide easy access to it. These software programs serve a lot of
purposes, including defining structure for database, making sure that the stored information
remains secured and consistent, and managing different types of data access, such as shared,
distributed, and concurrent A relational database has tables that have different names, attributes,
and can store rows or records of large data sets. Every record stored in a table has a unique key.
Entity-relationship model is created to provide a representation of a relational database that
features entities and the relationships that exist between them.
2. Data Warehouse:
A data warehouse is a single data storage location that collects data from multiple sources and
then stores it in the form of a unified plan. When data is stored in a data warehouse, it undergoes
cleaning, integration, loading, and refreshing. Data stored in a data warehouse is organized in
several parts. If you want information on data that was stored 6 or 12 months back, you will get
it in the form of a summary.
3. Transactional data:
Transactional database stores record that are captured as transactions. These transactions include
flight booking, customer purchase, click on a website, and others. Every transaction record has a
unique ID. It also lists all those items that made it a transaction.
4. Other types of data:
We have a lot of other types of data as well that are known for their structure, semantic
meanings, and versatility. They are used in a lot of applications. Here are a few of those data
types: data streams, engineering design data, sequence data, graph data, spatial data, multimedia
data, and more
6
Emerging trends in Computer and Information Technology Data Mining
(22618)
8.0 Data Mining techniques:
1. Association:
It is one of the most used data mining techniques out of all the others. In this technique, a
transaction and the relationship between its items are used to identify a pattern. This is the reason
this technique is also referred to as a relation technique. It is used to conduct market basket
analysis, which is done to find out all those products that customers buy together on a regular
basis This technique is very helpful for retailers who can use it to study the buying habits of
different customers. Retailers can study sales data of the past and then lookout for products that
customers buy together. Then they can put those products in close proximity of each other in their
retail stores to help customers save their time and to increase their sales.
2. Clustering:
This technique creates meaningful object clusters that share the same characteristics. People often
confuse it with classification, but if they properly understand how both these techniques work,
they won’t have any issue. Unlike classification that puts objects into predefined classes,
clustering puts objects in classes that are defined by it.Let us take an example. A library is full of
books on different topics. Now the challenge is to organize those books in a way that readers don’t
have any problem in finding out books on a particular topic. We can use clustering to keep books
with similarities in one shelf and then give those shelves a meaningful name. Readers looking for
books on a particular topic can go straight to that shelf. They won’t be required to roam the entire
library to find their book.
3. Classification:
This technique finds its origins in machine learning. It classifies items or variables in a data set
into predefined groups or classes. It uses linear programming, statistics, decision trees, and
artificial neural network in data mining, amongst other techniques. Classification is used to
develop software that can be modelled in a way that it becomes capable of classifying items in a
data set into different classes. For instance, we can use it to classify all the candidates who
attended an interview into two groups – the first group is the list of those candidates who were
selected and the second is the list that features candidates that were rejected. Data mining software
can be used to perform this classification job.
4. Prediction:
This technique predicts the relationship that exists between independent and dependent
variables as well as independent variables alone. It can be used to predict future profit
depending on the sale.Let us assume that profit and sale are dependent and independent
variables, respectively. Now, based on what the past sales data says, we can make a profit
prediction of the future using regression curve.
5. Sequential Patterns:
This technique aims to use transaction data, and then identify similar trends, patterns, and
events in it over a period of time. The historical sales data can be used to discover items
that buyers bought together at different times of the year. Business can make sense of this
information by recommending customers to buy those products at times when the historical
data doesn’t suggest they would. Businesses can use lucrative deals and discounts to push
through this recommendation.
7
Emerging trends in Computer and Information Technology Data Mining
(22618)
9.0 Data Mining on Credit Card Fraud Detection:
How it is used for credit card fraud detection?
This system implements the supervised anomaly detection algorithm of Data mining to detect fraud
in a real time transaction on the internet, and thereby classifying the transaction as legitimate,
suspicious fraud and illegitimate transaction. The anomaly detection algorithm is designed on the
Neural Networks which implements the working principal of the human brain (as we humans learns
from past experience and then make our present day decisions on what we have learned from our
past experience).
Data mining techniques for fraud detection
The most cost effective approach for fraud detection is to “tease out possible evidences of
fraud from the available data using mathematical algorithms”. Data mining techniques, which
make use of advanced statis-tical methods, are divided in two main approaches: supervised and
unsupervised methods. Both of these approaches are based on training an algorithm with a
record of observations from the past. Supervised methods require that each115of those
observations used for learning has a label about which class it belongs to. In the context of
fraud detection, this means that for each observation we know if it belongs to the class
“fraudulent” or to the class “legitimate”. Often we do not know which class an observation
belongs to. For example, take the case of an online order whose payment was rejected. One will
never know whether this was a legitimate order or whether it had been correctly rejected. Such
occurrences favour the use of unsupervised methods, which do not require data to be labelled.
These methods look120for extreme data occurrences or outliers. In order to get the best of two
worlds, some solutions combine supervised and unsupervised techniques. A few authors have
studied unsupervised methods for fraud detection, explored the use of graph analysis for fraud
detection in a telecommunications setting proposed a mixed approach with the use of a self-
organising map which feeds a Neural Network if a transaction does not fall into an identified
normal behaviour for the given cus125tomer. compared supervised and unsupervised Neural
Networks. According to their experiment the unsupervised method performed far below the
supervised one. Supervised methods have dominated the fraud detection literature. In general,
the emphasis of research in the late 90s and early 2000s was on Neural Networks. proposed the
use of a Neural Network for fraud detection at a commercial bank. studied the use of a profiling
approach to telecommunications fraud. discussed the combi-130nation of multiple classifiers in
an attempt to create scalable systems which would be able to deal with large volumes of data.
More recently, some other works have been published, making use of newer classification
techniques. built a model based on a Hidden Markov Model, with focus on fraud detection for
creditcard issuing banks. also worked on credit-card fraud detection with data from a bank, in
particular addressing the way of pre-processing the data. They studied the use of aggregation
of transactions when using Random Forests, Support Vector Machines, Lo-135gistic Regression
and K- Nearest Neighbour techniques. compared the performance of Random Forests, Support
Vector Machines and Logistic
Regression for detecting fraud of credit-card transactions in an international financial.
The pinpoint two criticisms to the data mining studies of fraud detection: the lack of publicly
available data and the lack of published literature on the topic. Most literature on credit-card
fraud detection has focused on classification140models with data from banks. Such data
invariably consists of transaction registries, where it is possible to find fraud evidence such as
“collision” or “high velocity” events, i.e. transactions happening at the same time in different
locations.
8
Emerging trends in Computer and Information Technology Data Mining
(22618)
Some authors have also addressed the techniques for finding the best derived features. proved
that transaction aggregation improved performance in some situations, with the aggregation
period being an important parameter. However, none of these particularities seems to apply to a
case of detecting fraud with data from one 145 single merchant as in our case. In this study, we
chose to use methods of supervised learning for the classification problem, because it is common
for fraud detection applications to have labelled data for training. We chose to test three different
models. Logistic regression because of its popularity, and Random Forests and Support Vector
Machines, which have been used in a variety of applications showing superior performance,
showed that Support Vector Machines perform well150in classification problems.
10.0 Skills Developed:
a) During developing this Micro-Project we learnt many practically applied concepts of
emerging trends in Computer technology and theory as well.
b) We learned to apply various latest trends in technology in different fields.
c) We learned new computer science technologies like artificial intelligence, data mining,
Internet of things, data analytics and much more.
11.0 Applications of this Project:
Data mining find its application and significance in various fields like:
a) Credit ratings and anti-fraud systems
b) Financial Analysis
c) Telecommunication Industry
d) Intrusion Detection
e) Spatial Data mining
f) Biological Data mining
12.0 Conclusion:
Thus, we prepared a report on credit card fraud detection using data mining techniques with
implementation of emerging trends in computer technology.
13.0 Reference:
www.iberdola.com
www.tutorialspoint.com
www.simplilearn.com
9
Emerging trends in Computer and Information Technology Data Mining
(22618)
10