0% found this document useful (0 votes)

14 views12 pages

Mini Project

The document discusses credit card fraud detection using machine learning and data science, highlighting the challenges of class imbalance and evolving fraud tactics. It reviews various algorithms and methods employed in fraud detection, including synthetic neural networks and outlier detection techniques. The study emphasizes the importance of continuous improvement in fraud detection systems and the integration of multiple algorithms to enhance accuracy.

Uploaded by

shubham191

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views12 pages

Mini Project

Uploaded by

shubham191

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Credit Card Fraud Detection using

Machine Learning and Data Science

When someone other than the account owner uses a credit card without authorization or
permission, it is called "fraud" in credit card transactions.
Fraud detection involves monitoring the activities of populations of users in order to estimate,
perceive or avoid objectionable behaviour, which consist of fraud, intrusion, and defaulting.

This is a highly pertinent issue that has to be addressed by fields like data science and
machine learning, since these fields can automate the answer.

From the standpoint of learning, this issue is especially difficult because it is characterized by
a number of variables, including class imbalance. There are many more legitimate
transactions than fraudulent ones. Furthermore, throughout time, the statistical features of the
transaction patterns frequently alter.

However, these are not the only difficulties in putting a real-world fraud detection system
into practice. In real-world scenarios, automated systems swiftly sort through the enormous
volume of payment requests to decide which ones to approve.

Algorithms for machine learning are used to analyze all approved transactions and flag any
that seem suspect. Experts look into these allegations and get in touch with the cardholders to
verify whether or not the transaction was fraudulent.

The automated system receives input from the investigators, which is utilized to train and
update the algorithm to
eventually improve the fraud-detection performance over time.
Fraud detection techniques are always being improved to prevent crooks from changing their
fraudulent tactics. They categorize these frauds as:
• Both offline and online credit card fraud
• Theft of Cards

• Device intrusion; • Application fraud; • Counterfeit card; • Account bankruptcy; •

Telecommunication fraud

The following are a few methods now in use to identify this kind of fraud:
• Synthetic Neural Network
• Inductive Fuzzy Logic

Bayesian networks, decision trees, genetic algorithms, logistic regression, support vector
machines, hidden markov model, and K-nearest neighbor

Review of Literature II

Fraud is defined as the illegal or criminal use of deception with the goal of gaining personal
or financial gain. It is an intentional violation of a law, regulation, or policy done with the
intention of gaining unapproved financial gain.
A wealth of literature has already been published and is accessible to the general public
regarding anomaly or fraud detection in this field. According to a thorough assessment
carried out by Clifton Phua and his colleagues, methods used in this field include adversarial
detection, automated fraud detection, and data mining applications. Suman, a research scholar
at Hisar HCE's GJUS&T, described methods for detecting credit card fraud in another article
that included both supervised and unsupervised learning. Although these techniques and
algorithms yielded surprising results in certain domains, they fell short.

Wen-Fang YU and Na Wang presented a related study area in which they employed distance
sum algorithms, outlier detection mining, and outlier mining to accurately forecast fraudulent
transactions in an emulation experiment including a credit card transaction data set from a
specific commercial bank. One area of data mining that is primarily utilized in the financial
and online domains is outlier mining. Its task is to identify objects—that is, fraudulent
transactions—that are isolated from the main system. They have measured the difference
between an attribute's observed value and its preset value by taking characteristics of the
behavior of their customers and calculating their worth.

Unconventional methods, like hybrid data mining and complex network classification
algorithms, have shown promise in identifying illicit instances within real card transaction
data sets. These methods are based on network reconstruction algorithms, which enable the
creation of representations of an instance's deviation from a reference group. These
techniques have been effective, on average, with medium-sized online transactions.
Additionally, attempts have been made to advance from an entirely new angle. Improvements
to the alert-feedback relationship in the event of a fraudulent transaction have been attempted.

The authorized system would be notified in the event of a fraudulent transaction, and a
feedback would be provided to cancel the current transaction.

One method that provided fresh insight into this area was Artificial Genetic Algorithm, which
tackled fraud from an alternative angle.

It demonstrated accuracy in identifying fraudulent transactions and reducing the quantity of

false alarms. Nevertheless, a classification issue with fluctuating misclassification costs
accompanied it.

III. METHODS

The method this study suggests looks for unusual activity, or outliers, using the most recent
machine learning methods.
The following figure serves as a representation of the fundamental rough architecture
diagram:

Upon closer inspection and the addition of real-world components, the entire architecture
diagram can be shown as follows: -
Initially, our dataset was acquired via the data analysis website Kaggle, which offers datasets.

There are 31 columns in this dataset, 28 of which are labeled as v1–v28 to safeguard sensitive
information.
Time, Amount, and Class are represented by the other columns. Time indicates how much
time has passed between the first and subsequent transactions. The amount of money
exchanged is called the amount. Class 0 denotes a legitimate transaction, while Class 1
denotes a fraudulent one.

Plotting several graphs will help you visually understand any values in the dataset and look
for any inconsistencies. By doing this, we can make sure that the machine learning algorithms
can analyze the dataset without the need for any missing value imputation.

Following this analysis, we create a heatmap to visualize the data in color and examine the
relationship between the class variable and our predictor factors. Below is a heatmap of this:
The dataset has now been processed and formatted. To guarantee evaluation impartiality, the
class column is eliminated and the time and amount columns are standardized. A collection
of modules' algorithms process the data. The interaction of these algorithms is illustrated in
the module diagram that follows: The following outlier identification modules are applied to
this data once it has been fitted into a model:
• Algorithm for Isolation Forest; Local Outlier Factor

These algorithms belong in the sklearn library. The sklearn package's ensemble module
contains algorithms and methods for classification, regression, and outlier detection that are
ensemble-based.
Using the NumPy, SciPy, and matplotlib modules, this free and open-source Python library
offers a variety of easy-to-use and effective tools for data analysis.

This graph demonstrates how much fewer fraudulent transactions there are overall.
The times at which transactions were completed within two days are displayed on this graph.
It is evident that the most transactions were made during the day and the least throughout the
night.

The amount transacted is shown in this graph. Fewer than 5% of transactions are close to the
maximum amount transacted, while the majority are quite tiny.
We plot a histogram for each column in this dataset once it has been examined. This is done
in order to obtain a graphical representation of the dataset, which can then be utilized to
confirm that machine learning and missing values are there. It is intended to work with the
scientific and numerical libraries and includes a variety of classification, clustering, and
regression techniques.
We have developed a Python program utilizing the Jupyter Notebook platform to illustrate
the methodology proposed in this work. Moreover, the Google Collab platform, which
supports all Python notebook files, can be used to run this program in the cloud.
The following modules are explained in detail, along with pseudocodes for their algorithms
and output graphs:

A. Factor of Local Outliers

The algorithm is called Unsupervised Outlier Detection. The anomaly score of every sample
is referred to as the "Local Outlier Factor." It calculates the sample data's local deviation from
its neighbors.
More specifically, the local data is estimated using the distance between the k-nearest
neighbors, which provides the locality.

This algorithm's pseudocode is expressed as

On plotting the results of Local Outlier Factor algorithm, we get the following figure:
When they are partitioned randomly, anomalies have shorter pathways. Samples that
mutually yield shorter path lengths in a forest of random trees are very likely to be anomalies.

The system can be used to report abnormalities to the relevant authorities once they are
found. We are comparing these algorithms' results to assess the precision and accuracy of the
systems.
B. Algorithm for Isolation Forest

By choosing a feature at random and then selecting a split value between the maximum and
minimum values of the chosen feature, the Isolation Forest "isolates" observations.
A tree can be used to illustrate recursive partitioning, where the path length from the root
node to the ending node represents the number of splits needed to isolate a sample.
The mean of
On plotting the results of Isolation Forest algorithm, we get the following figure:
Results with the complete dataset is used:

Section IV - Execution

This concept is challenging to put into practice since it calls for the collaboration of banks,
who are reluctant to exchange information because of market competitiveness, legal
concerns, and user data protection.
As a result, we searched for some reference articles that used comparable methodologies and
produced findings. According to one of these citation papers:

In 2006, a German bank provided the complete application data set, to which this technique
was deployed. Please see below simply a summary of the results, due to banking
confidentiality concerns. Following the application of this method, a small number of cases
with a high likelihood of being fraudsters are included in the level 1 list.
Every person listed on this list had their cards closed.

V. OUTCOMES

The number of false positives the code found is printed out, and it is then compared to the
real data. This is used to determine the algorithms' precision and accuracy score.
Ten percent of the whole dataset is the portion we used for expedited testing. Finally, the
entire dataset is also used, and both sets of findings are printed.

The output shows these findings together with the classification report for each algorithm,
with class 0 denoting a lawful transaction and class 1 denoting a fraudulent transaction.
To rule out false positives, this result was compared to the class values.
Outcomes after utilizing 10% of the dataset:
Given that the complete dataset

VII. UPCOMING ADVANCES

Even though we were unable to achieve our aim of 100% accuracy in fraud detection, we
were able to develop a system that, given sufficient time and data, can come very near to that
objective. Like any undertaking of this kind, there is space for improvement.
This project's design makes it possible to incorporate several algorithms as modules and
combine their outputs to improve the end product's accuracy.

Adding more algorithms to this model will help it get even better. These algorithms must,
however, produce output in the same format as the others. The modules are simple to add
once that need is met, as demonstrated by the code.

MENTORS

[1] John Richard D. Kho and Larry A. Vea's paper, "Credit Card Fraud Detection Based on
Transaction Behavior," was published in the proceedings of the 2017 IEEE Region 10
Conference (TENCON), held in Malaysia from November5-7.
[2]ROSS GAYLER2, KATE SMITH1, VINCENT LEE1, and CLIFTON PHUA1 "A
Comprehensive Survey of Data Mining-based Fraud Detection Research," released by
Monash University's Wellington Road School of Business Systems and Faculty of
Information Technology

Australia's Clayton, Victoria 3800 [3]The International Journal of Advanced Research in

Computer Engineering & Technology (IJARCET) Volume 3 Issue 3 published a survey paper
on credit card fraud detection by Suman, a research scholar at GJUS&T Hisar HCE, Sonepat,
in March 2014 [4].Published in 2009, "Studies on Credit Card Fraud Detection Model Based
on Distance Sum - by Wen-Fang YU and Na Wang"
Python
Python is a high-level, object-oriented, and interpreted programming language. It was created
by Guido van Rossum from 1985 to 1990 and released in 1991.
Python's syntax is close to the English language, allowing developers to construct programs
with fewer lines than some other programming languages. Python is an interpreter-based
language, which means that code can be executed as soon as it is written. Prototyping may be
done quickly.

Characteristics of Python :-

Following are the important characteristics of python programming -

• Python is a dynamic, high-level, free open source and interpreted programming language.
• It supports object-oriented programming as well as procedural oriented programming.
• It can be used as a scripting language or can be compiled to byte-code for building large
applications.
• It provides very high-level dynamic data types and supports dynamic type checking.
• It supports automatic garbage collection.
• It can be easily integrated with C, C+, COM, ActiveX, CORBA, and Java.

Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
Final Eddited Research Paper1
No ratings yet
Final Eddited Research Paper1
6 pages
Credit Card Fraud Detection: Title
No ratings yet
Credit Card Fraud Detection: Title
5 pages
Credit Card Fraud Detection Using Machine Learning PDF
No ratings yet
Credit Card Fraud Detection Using Machine Learning PDF
6 pages
Credit Card Fraud Detection Using Historical Transaction Data
No ratings yet
Credit Card Fraud Detection Using Historical Transaction Data
5 pages
Fraud Detection Techniques For Credit Card Transactions
No ratings yet
Fraud Detection Techniques For Credit Card Transactions
4 pages
Credit Card Fraud Detection by Data Analytics Using Python: Malay Joshi, Yudhishthir Bhunwal and Dr. Smita Agarwal
No ratings yet
Credit Card Fraud Detection by Data Analytics Using Python: Malay Joshi, Yudhishthir Bhunwal and Dr. Smita Agarwal
4 pages
Credit Card Fraud Detection Proposal Redone
No ratings yet
Credit Card Fraud Detection Proposal Redone
5 pages
Research Paper Danish
No ratings yet
Research Paper Danish
6 pages
RESEARCHINTELre
No ratings yet
RESEARCHINTELre
8 pages
Paper For Credit Card - Edited
No ratings yet
Paper For Credit Card - Edited
8 pages
Ec 2645704571
No ratings yet
Ec 2645704571
2 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
8 pages
Icesc48915.2020.9155615
No ratings yet
Icesc48915.2020.9155615
6 pages
Group 23
No ratings yet
Group 23
11 pages
Creditcard Fraud Detection
No ratings yet
Creditcard Fraud Detection
26 pages
Credit Card Fraud Detection Study
No ratings yet
Credit Card Fraud Detection Study
19 pages
Six Sigma in Fraud Detection Analysis
No ratings yet
Six Sigma in Fraud Detection Analysis
4 pages
Credit Card Fraud 1.4% Positive Class
No ratings yet
Credit Card Fraud 1.4% Positive Class
17 pages
Data Mining21253BBBBBB
No ratings yet
Data Mining21253BBBBBB
9 pages
Analysis and Prediction For Credit Card Fraud
No ratings yet
Analysis and Prediction For Credit Card Fraud
19 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
72 pages
Data Mining
No ratings yet
Data Mining
9 pages
Fraudulent Financial Transactions Detection Using Machine Learning
No ratings yet
Fraudulent Financial Transactions Detection Using Machine Learning
10 pages
Major 1 2nd
No ratings yet
Major 1 2nd
13 pages
Poster
No ratings yet
Poster
2 pages
Online Transaction Fraud Detection Using Backlogging On e Commerce Website IJERTV11IS050319
No ratings yet
Online Transaction Fraud Detection Using Backlogging On e Commerce Website IJERTV11IS050319
6 pages
Itmconf Icdsia2023 02012
No ratings yet
Itmconf Icdsia2023 02012
10 pages
Integrating A Machine Learning-Driven Fraud Detection System
No ratings yet
Integrating A Machine Learning-Driven Fraud Detection System
7 pages
Credit Card Fraud Detection - Final
No ratings yet
Credit Card Fraud Detection - Final
3 pages
Literature Review
No ratings yet
Literature Review
8 pages
Credit Card Fraud Detection Based On Ontology Graph
No ratings yet
Credit Card Fraud Detection Based On Ontology Graph
12 pages
Credit Card Fraud Detection - Machine Learning Methods
No ratings yet
Credit Card Fraud Detection - Machine Learning Methods
5 pages
Banking Fraud Detection with ML
No ratings yet
Banking Fraud Detection with ML
63 pages
FRAUD DETECTION 2 - Formatted Paper
No ratings yet
FRAUD DETECTION 2 - Formatted Paper
8 pages
Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037
No ratings yet
Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037
18 pages
Synopsis ML Projectpdf
No ratings yet
Synopsis ML Projectpdf
13 pages
Credit Card Project-2
No ratings yet
Credit Card Project-2
17 pages
Credit Card Fraud Detection ML
No ratings yet
Credit Card Fraud Detection ML
17 pages
L - Unbalanced Credit Card Fraud Detection
No ratings yet
L - Unbalanced Credit Card Fraud Detection
8 pages
Credit Card Fraud Detection Report
No ratings yet
Credit Card Fraud Detection Report
31 pages
1023 Vol.30 Issue 4 April 2024 Educational Administration Theory and Practice
No ratings yet
1023 Vol.30 Issue 4 April 2024 Educational Administration Theory and Practice
12 pages
Credit Card Research Paper
No ratings yet
Credit Card Research Paper
12 pages
DS 1
No ratings yet
DS 1
9 pages
Fraud Detection Research Paper (03,16,33)
No ratings yet
Fraud Detection Research Paper (03,16,33)
12 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
28 pages
FraudDetection Newformat
No ratings yet
FraudDetection Newformat
6 pages
Enhancing Attribute-Driven Fraud Detection With Risk-Aware Graph Representation
No ratings yet
Enhancing Attribute-Driven Fraud Detection With Risk-Aware Graph Representation
12 pages
Credit Card Fraud Detection Using KNC SVC and Decision Tree Machine Learning Algorithms
No ratings yet
Credit Card Fraud Detection Using KNC SVC and Decision Tree Machine Learning Algorithms
3 pages
A Review Credit Card Fraud Detection in Banks Using Machine Learning Algorithms
No ratings yet
A Review Credit Card Fraud Detection in Banks Using Machine Learning Algorithms
7 pages
Imac Pretty 1
No ratings yet
Imac Pretty 1
8 pages
Report On Credit Card Fraud Detection Algo Using Machine Learning 1
No ratings yet
Report On Credit Card Fraud Detection Algo Using Machine Learning 1
28 pages
Credit Card Fraud Detection Using Enhanced Random Forest Classifier For Imbalanced Data
No ratings yet
Credit Card Fraud Detection Using Enhanced Random Forest Classifier For Imbalanced Data
11 pages
Research Paper 4 (Abnormal Transactions)
No ratings yet
Research Paper 4 (Abnormal Transactions)
7 pages
191 - 197 - Detection of Transaction Fraud Using Deep Learning
No ratings yet
191 - 197 - Detection of Transaction Fraud Using Deep Learning
28 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
3 pages
WEEK 1-Material
No ratings yet
WEEK 1-Material
12 pages
Metric Unit Conversions Liters To Kiloliters Hectoliters Decaliters 1 v1
No ratings yet
Metric Unit Conversions Liters To Kiloliters Hectoliters Decaliters 1 v1
2 pages
Probability Basics for Beginners
No ratings yet
Probability Basics for Beginners
5 pages
NLP Exam
No ratings yet
NLP Exam
3 pages
Online Credit Card Fraud Detection Using Big Data: A Project Review On
No ratings yet
Online Credit Card Fraud Detection Using Big Data: A Project Review On
16 pages
Foundation of Tcs
No ratings yet
Foundation of Tcs
142 pages
Frequent Itemset Mining Guide
No ratings yet
Frequent Itemset Mining Guide
16 pages
Student Solution Chap 08
No ratings yet
Student Solution Chap 08
6 pages
NMiP FDTD
No ratings yet
NMiP FDTD
51 pages
MSW-Transformer: Multi-Scale Shifted Windows Transformer Networks For 12-Lead ECG Classification
No ratings yet
MSW-Transformer: Multi-Scale Shifted Windows Transformer Networks For 12-Lead ECG Classification
36 pages
CS 11 Worksheet - 1
No ratings yet
CS 11 Worksheet - 1
5 pages
Deep Hedging
No ratings yet
Deep Hedging
21 pages
Design & Analysis of Experiments Guide
No ratings yet
Design & Analysis of Experiments Guide
11 pages
1 SimpleDynamicEconomy
No ratings yet
1 SimpleDynamicEconomy
24 pages
ME685 - Solution Set
No ratings yet
ME685 - Solution Set
20 pages
Linear Search and Binary Search 170418023907
No ratings yet
Linear Search and Binary Search 170418023907
19 pages
Statistical Modelling Syllabus
No ratings yet
Statistical Modelling Syllabus
10 pages
Simplex Method Tutorial
No ratings yet
Simplex Method Tutorial
3 pages
Computational Thinking With Spreadsheet
No ratings yet
Computational Thinking With Spreadsheet
7 pages
Parity-Time Symmetry in Optical Microcavity
No ratings yet
Parity-Time Symmetry in Optical Microcavity
25 pages
Questions AIBA
No ratings yet
Questions AIBA
2 pages
Chap 4 (Decision Analysis)
No ratings yet
Chap 4 (Decision Analysis)
45 pages
ICSE Class 10 Computer Applications 2012
No ratings yet
ICSE Class 10 Computer Applications 2012
4 pages
STEM Entrance Exam Quiz Can You Pass This Stem Exam - Question 1 2
No ratings yet
STEM Entrance Exam Quiz Can You Pass This Stem Exam - Question 1 2
1 page
1 One Dimension: Gradient Descent
No ratings yet
1 One Dimension: Gradient Descent
5 pages
MMSE Equalization Explained
No ratings yet
MMSE Equalization Explained
2 pages
Chapter 8 Variation Partitioning - Workshop 10 - Advanced Multivariate Analyses in R
No ratings yet
Chapter 8 Variation Partitioning - Workshop 10 - Advanced Multivariate Analyses in R
11 pages
Topic: Synthesis Design Flow
No ratings yet
Topic: Synthesis Design Flow
9 pages
BBA Sem 2: Business Math Exam
No ratings yet
BBA Sem 2: Business Math Exam
1 page
HW01
No ratings yet
HW01
6 pages

Mini Project

Uploaded by

Mini Project

Uploaded by

Credit Card Fraud Detection using

Machine Learning and Data Science

• Device intrusion; • Application fraud; • Counterfeit card; • Account bankruptcy; •

It demonstrated accuracy in identifying fraudulent transactions and reducing the quantity of

A. Factor of Local Outliers

This algorithm's pseudocode is expressed as

VII. UPCOMING ADVANCES

Australia's Clayton, Victoria 3800 [3]The International Journal of Advanced Research in

Following are the important characteristics of python programming -

You might also like