0% found this document useful (0 votes)
23 views32 pages

A3 (16063620)

This research project focuses on enhancing credit card fraud detection using advanced machine learning techniques and data balancing methods to address the challenges posed by evolving fraud patterns and imbalanced datasets. It evaluates various classifiers on a Kaggle dataset while discussing the limitations of oversampling methods and exploring future directions like federated learning and multi-modal data integration. The aim is to develop a robust and efficient framework for detecting fraud in the context of increasing digital payment threats.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views32 pages

A3 (16063620)

This research project focuses on enhancing credit card fraud detection using advanced machine learning techniques and data balancing methods to address the challenges posed by evolving fraud patterns and imbalanced datasets. It evaluates various classifiers on a Kaggle dataset while discussing the limitations of oversampling methods and exploring future directions like federated learning and multi-modal data integration. The aim is to develop a robust and efficient framework for detecting fraud in the context of increasing digital payment threats.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Project title : Credit Card Fraud Detection: An Enhanced Machine Learning Approach

with Advanced Data Balancing and Future Directions in Digital Payment Security

Student : Yamunaa Ratakrishnan

Supervisor : Dr Sathishkumar Veerappampalayam Easwaramoorthy

Abstract

Online commerce and electronic payment systems expansion has intensified the risk of credit
card fraud on a global scale, bringing massive financial and reputation impacts to card holders
and institutions. Conventional rule-based methods for fraud detection are no longer effective due
to the changing fraud patterns and imbalanced datasets. This research aims to enhance the credit
card fraud detection framework by using supervised learning approaches with over-sampling for
the rare instances (Visa, 2023).

This research is on a highly imbalanced Kaggle credit card fraud dataset (Credit Card Fraud
Detection, 2018), using somewhat similar classifiers such as SVM, Logistic Regression,
Decision Trees, Random Forests and Artificial Neural Networks. To handle the imbalanced
class distribution, the oversampling algorithms (Random Oversampling, Smoteand (ADASYN)
are used (Sundaravadivel et al., 2025). The recall, F1-score, AUC are utilized to evaluate the
model performance, which emphasizes the robustness to capture minority-class instances
(Adepoju et al., 2019).

This project also incorporates a theoretical discussion about the limitations of oversampling and
contributes to the literature the role of mobile payment platforms in fostering the appearance of
new types of frauds. It also describes more recent deep learning progress and focusses on RNNs
and Transformers and mentions federated learning as a promising privacy-preserving technique
(Opena, 2025). The results seek to lead forthcoming anti-fraud mechanisms on a path towards
increased flexibility, cost efficiency and security with respect to ever-developing digital threats.
Contents
1. Introduction..................................................................................................................................4

1.1 Theoretical Background and Problem Statement...................................................................4

1.2 Research Questions................................................................................................................5

1.3 Aim and Objective.................................................................................................................5

1.4 Scope of Work.......................................................................................................................6

2. Literature Review........................................................................................................................7

2.1 Evolution of Credit Card Fraud and Impact of Technological Advancements......................7

2.1.1 Traditional Fraud Detection................................................................................................7

2.1.2 Impact of Online Payments and E-commerce.....................................................................7

2.1.3 Evolving Fraudster Tactics.................................................................................................9

2.2 Machine Learning in Fraud Detection: A Critical Overview..............................................10

2.2.1 Supervised Learning Algorithms......................................................................................10

2.2.2 Deep Learning Advancements..........................................................................................11

2.2.3 Ensemble Learning Methods............................................................................................12

2.3 Data Balancing Techniques: Addressing Class Imbalance..................................................12

2.3.1 The Challenge of Class Imbalance....................................................................................12

2.3.2 Overview of Oversampling Methods................................................................................12

2.3.3 Potential Drawbacks of Oversampling.............................................................................13

2.3.4 Comparative Effectiveness and Limitations.....................................................................13

2.4 Identified Gaps in the Literature..........................................................................................14

3. Methodology..............................................................................................................................15

3.1 Dataset Description..............................................................................................................16

3.2 Data Balancing Techniques.................................................................................................16

3.3 ML Algorithms...................................................................................................................17
3.4 Experimental Design and Metrics of Evaluation................................................................17

4.Work Plan and Timeline.............................................................................................................18

5. Future Research Directions........................................................................................................19

5.1 Fusion of Multi-Modal Data for Improved Fraud Detection..............................................19

5.2 Exploring Federated Learning for Privacy Preserving Fraud Detection.............................22

6.0 Conclusion...............................................................................................................................24
1. Introduction

1.1 Theoretical Background and Problem Statement


The global e-commerce market has experienced significant growth, with transactions exceeding
USD 4.2 trillion in 2020. This shift towards cashless and contactless transactions has made them
more convenient but also opened new avenues for financial fraud. (Verdon, 2021; Adepoju et al.,
2019).

Credit card fraud, a significant sub-class of financial fraud, includes card-not-present,


counterfeit, account takeover, merchant collusion, synthetic ID, and digital wallet fraud. New
threats like deepfake-enabled identity skips, SIM swapping, and app spoofing pose additional
challenges. Global losses are estimated at $33.5 billion in 2022 and will rise to $343 billion by
2027 (Webb, 2024; Juniper Research, 2022).

Between 2019 and 2022, digital fraud attempts increased by 80% due to post-pandemic growth
in digital payments. Despite a constant fraud rate of 4.6%, the volume of digital transactions has
led to increased losses, with e-businesses losing $207 for every $100 in fraudulent orders,
resulting in "friendly fraud" where consumers illegitimately dispute legitimate transactions at a
cost of $35.00 per $100.00 disputed. (Anchin, 2025). Account Takeover (ATO) fraud surged by
24% in 2024, causing $13 billion losses in 2023, becoming the rapidly increasing financial crime
in 2025, primarily due to fake identity fraud. (Bondar, 2025).

Traditional fraud detection systems are falling short in detecting evolving threats, particularly in
highly imbalanced datasets where fraudulent cases are rarely more than 0.2% of transaction
volume, thereby undermining trust and causing financial harm. (Opena, 2025). Supervised
machine learning models offer a powerful alternative to traditional fraud detection methods by
modeling complex nonlinear patterns and automatically adapting to emerging fraudulent
behavior. However, strong imbalances can often hinder their recognition efficiency, affecting the
effectiveness of these models. (Gupta et al., 2025).

Resampling methods like SMOTE, ADASYN, and Random Oversampling are used to improve
minority class distribution, but they introduce difficulties like over-fitting, computational
complexity, and noise in generated samples, particularly in real-time scenarios. (Al Balawi &
Aljohani, 2023; Halim et al., 2023).
Finally, the combination of class imbalance, adaptive nature of fraud, operational latency
requirements and data privacy make a strong case for a unified, scalable and future-proof fraud
detection solution (Imani et al., 2025).

1.2 Research Questions


This project will address the challenges and the gaps through the following research questions:

1. How have recent technological advances such as mobile payments, digital wallets
affected the patterns of credit card fraud, and how would advance deep learning models
such as RNNs, Transformers be used in detecting the new forms of fraud?
2. What are the disadvantages of applying the oversampling algorithms to detect Credit card
fraud regarding overfitting, computational cost and how to resolve them while treating
class imbalance?
3. What are the most interesting future expansion for research in the regime of credit card
fraud detection, in particular considering the combination of multi-modal data and the use
of federated learning for privacy-preserving solutions?

1.3 Aim and Objective


Aim:

To develop and evaluate an enhanced machine learning methodology for credit card fraud
detection that handles the challenges of dynamic fraud patterns, imbalanced learning tasks, and
real-time application, as well as exploring advanced deep learning models and privacy-
preserving techniques.

Objectives

1. To critically review existing literature related to credit card fraud detection, focusing on
technological breakthroughs and the usage of advanced deep learning techniques such
as RNNs and Transformers.

2. To analyze the efficiency of the different ML classifiers in highly imbalanced data by


comparing their performance with and without oversampling techniques (Random
oversampling, SMOTE and ADASYN).
3. To investigate the drawbacks of oversampling methods, such as overfitting,
computational cost, noise, and non-diversity, and present potential methods to attain
balanced and generalized classification performance (Zongkai & Wei, 2024).

4. To propose future research pathways, including multi-modal data fusion and federated
learning for enhanced and privacy-preserving fraud detection.

1.4 Scope of Work


This project emphasizes the development and evaluation of machine learning models for credit
card fraud detection. The scope includes:

● Data Source: Utilization of the Kaggle Credit Card Fraud Detection dataset, which
contains normalized transaction data.

● Machine Learning Algorithms: Implementation and cross comparison of Support


Vector Machines, Decision Trees, Logistic Regression, K-Nearest Neighbours, Random
Forest and Artificial Neural Networks (Gupta et al., 2025).

● Data Balancing Techniques: Application and evaluation of Random Oversampling,


Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic
Sampling (ADASYN) (Halim et al., 2023).

● Performance Metrics: Emphasis on F1-score, Recall, and AUC, alongside accuracy, to


provide a complete evaluation tailored to imbalanced datasets.

● Discussion of Advanced Concepts: Rigorous discussion of the implication of recent


technological improvements on fraud, the role of advanced deep learning architectures
(RNNs, LSTMs, Transformers, GNNs), and the potential drawbacks of oversampling.

● Future Research Exploration: Detailed conceptualization of integrating multi-modal


data and federated learning as promising avenues for future fraud detection systems.

The project does not involve the collection of new, proprietary financial transaction data or the
deployment of a real-world credit card fraud detection system in a live financial environment.
The focus is on a robust methodological exploration and conceptual advancement within a
controlled experimental setting.
2. Literature Review
This section underlines the basis of this research, showing a comprehensive understanding of the
credit card fraud detection domain and placing this research within the context of all the existing
research in the area. It offers an analysis of the existing evidence, identifies gaps and the
robustness of the evidence base, as well as pinpointing what needs to be considered for new
study (Adepoju et al., 2019).

2.1 Evolution of Credit Card Fraud and Impact of Technological Advancements

2.1.1 Traditional Fraud Detection


Credit card fraud involved physical theft or card counterfeiting were common in the past. Early
detection depended on static rule-based systems on thresholds such as location or transaction
value. These systems were simple and not suitable for adaptation, thus resulting in high false
positives or missing new frauds patterns (Saad et al., 2024).

2.1.2 Impact of Online Payments and E-commerce


The rise of digital payment solutions and e-commerce has fundamentally transformed the
financial sector and, with it, made way for new opportunities for fraudsters. The convenience
and speed of online transactions inextricably driven a surge in digital financial activity, a
development that offers undeniable advantages both the users and commerce but has also helped
widen the reach of fraud (Mastercard, 2024).

Mobile Payments and Digital Wallets

Mobile payments and digital wallets offer convenience and contactless transactions, but they also
raise the risk of sophisticated fraud. Digital wallet mobile app fraud involves criminal activities,
causing revenue loss and compliance issues (Sunderajulu, 2024). Scammers use social
engineering, forged websites, and malicious script code to breach payment information.

Specific mechanisms of digital wallet fraud include:

● Click Fraud: Bots simulate user behaviour by repeatedly clicking on ads on mobile
platforms to generate fraudulent revenue or deplete advertising budgets. This distorts
analytics and leads to financial loss without genuine customer engagement (Visa, 2023).
● SMS Fraud: This refers to using SMS messages to deceive users into disclosing
confidential information or carrying out unauthorized transactions (Goel & Jain 2018).

● In-app Purchase Fraud: Attackers exploit vulnerabilities within mobile applications to


make unauthorized purchases (Evgeny, 2025).

● Account Takeover (ATO): User accounts are being accessed by fraudsters via phishing,
credential stuffing, or brute force attacks and misuse stored payment details. ATO attacks
increased by 24% year-over-year in 2024, resulting in around $13 billion in losses in the
year 2023 (Thies & Thies, 2025). This type of fraud is particularly challenging as
scammers become more sophisticated, mimicking legitimate user behaviour to bypass
defences.

● Card-Not-Present (CNP) Fraud: Fraudulent transactions involve using stolen card


details for online or mobile purchases, leading to significant financial losses and
compliance risks for merchants. (TransUnion, 2023).

● Authorized Push Payment (APP) Fraud:Fraudsters use social engineering to


manipulate victims into making electronic payments, causing financial institutions to face
reputational risks and compliance scrutiny for inadequate detection (Visa, 2023).

Contactless Payments

The rise of contactless digital payments, driven by Near Field Communication (NFC)
technology, has brought new levels of ease and speed around payments but also a new form of
risk in the shape of fraud. By 2022, 64% of companies conducted more than half of their
transactions electronically, underscoring the change to digital commerce (Mastercard, 2024).
Despite improved security measures, digital payment fraud remains a growing concern, with
Visa blocking $30 billion in attempted fraudulent transactions in the first half of 2023.
Alarmingly, 90% of businesses expect fraud incidents to increase (Visa, 2023).

One of the biggest new threats is friendly fraud in which consumers initiate chargebacks on
transactions that were legitimate. This type of abuse contributed to as much as 75% of
chargebacks in 2022, causing significant harm to merchants in terms of revenue, transaction
fees, and regulatory overhead (Visa, 2023). According to Opena (2025), businesses lose $207 for
every $100 in fraudulent orders and are projected to face over $100 billion in chargeback-related
costs in 2025, with 61% linked to friendly fraud.

Emergence of New Fraud Vectors

Apart from traditional and digital payment-specific fraud, the evolving technological landscape
has given rise to entirely new and sophisticated fraud vectors:

● Fake Identity Fraud: This has become the rapidly increasing financial crime in 2025,
where false identities are established using a combination of hijacked and manufactured
data, including data such as Social Security Numbers from vulnerable individuals
(Costello, 2025). These identities can clear credit checks and, in some cases, establish
good credit, making them especially elusive through traditional verification methods.

● AI-driven Fraud: According to Opena (2025), the use of artificial intelligence by


fraudsters is a trend which is rapidly rising, with a 28% rise in deepfake based scams, and
31% increase in AI augmented synthetic identify fraud in 2024. Fraudsters use deepfakes
and fake AI-powered mobile apps to bypass biometric verification and extract payment
data via phishing, intensifying the arms race between attackers and security systems.

● Ransom Attacks: Ransomware, a form of malicious software which encrypts user and
organizational data, often targeting mobile platforms through phishing schemes, causing
operational disruptions, data loss, and significant financial and reputational risks. (Sillam,
2023).

2.1.3 Evolving Fraudster Tactics


Fraudsters continually changing their strategy in order to capitalize on the new release of an
existing flaws, hence making the traditional static, rule-based techniques becoming less effective
(Adepoju et al., 2019; Al Balawi & Aljohani, 2023). However, a significant shift has been
observed in the landscape of cybercrime towards the more sophisticated and well-organized
types of cybercrimes, for example account takeover (ATO) fraud which enables attackers to
compromise end user accounts using phishing, malware and credentials stuffing and they can
perform fraudulent transactions undetected (Goswami, 2025).
Recent studies highlight the rise of collusive fraud networks where fraudsters cooperate in
forcing the corruption of nodes in the financial network to avoid detection through mimicking
legitimate participation (Imani et al., 2025). These organized tactics, along with the
multifaceted nature of cross border e-commerce and the variety of digital payment systems make
it more challenging to detect fraud (Opena, 2025).

2.2 Machine Learning in Fraud Detection: A Critical Overview


Machine learning (ML) models are being used for credit card fraud detection due to their ability
to identify complex, non-linear correlations in large datasets and adapt to evolving fraud
behaviors over time. (Adepoju et al., 2019).

2.2.1 Supervised Learning Algorithms


Various supervised ML methods have demonstrated promising performance in fraud detection:

● Logistic Regression (LR): Easy, interpretable, but faces difficulities with imbalanced
data and non-linear patterns (Hmidy & Mabrouk, 2024).
● K-Nearest Neighbors (KNN): Detects local anomalies but is computationally expensive
on large datasets (Upadhyaya & Singh, 2025).
● Support Vector Machines (SVM): Able to handles multi-dimensional data but requires
tuning and doesn’t scale well (Sahin & Duman, 2021).
● Decision Trees (DT): Easy and simple to implement, but prone to overfitting (Najadat et
al., 2020).
● Random Forests (RF): Ensemble-based, more robust, but less interpretable and
resource-heavy (Gupta et al., 2025).
● Naïve Bayes (NB): Fast and simple, though less accurate with correlated features (Gupta
et al., 2021).
● Artificial Neural Networks (ANNs): Able to capture complex patterns with high
accuracy but more data and computation is required (Wu et al., 2025).
2.2.2 Deep Learning Advancements
Deep learning models, a branch of ANNs with multiple hidden layers, have made significant
strides in fraud detection due to their ability to learn features from raw data and handle complex,
high-dimensional datasets well.

Long Short-Term Memory (LSTMs) and Recurrent Neural Networks (RNNs)

Recurrent Neural Networks (RNNs), particularly Long Short-Term Memory (LSTM) variants,
are highly effective for modelling sequential transaction data, and are well suited to identifying
fraud patterns that evolve over time. These temporal patterns could be transactions of repeated
small amounts reflected in larger schemes of corruption (Opena, 2025).

LSTMs outperform traditional ML models in fraud detection by learning patterns across


transaction histories. Hybrid architecture combining RNNs with CNNs enhance real-time
detection by leveraging temporal and spatial feature extraction. Attention mechanisms improve
accuracy and interpretability, with attention-based ensemble models consistently outperforming
standalone deep learning models, particularly in recall.

Transformers

Transformer models, a natural language processing breakthrough, have been applied to


structured data, including financial transactions, thanks to their self-attention mechanism, which
allows for deeper analysis of interrelated features. (Opena, 2025). Transformers excel in large-
scale fraud detection due to their parallel processing and scalability, enabling efficient training
and detecting complex, non-obvious patterns often missed by traditional models. BiLSTM-
Transformer, a hybrid model, excels in handling temporal and contextual relationships in
transaction data, enhancing accuracy, recall, AUC, and F1-score in complex fraud detection
scenarios. (K et al., 2025).

Graph Neural Networks (GNNs)

Graph Neural Networks (GNNs) enhance fraud detection by analyzing relational data, revealing
hidden patterns across linked entities, such as accounts connected to known fraudsters, rather
than isolated records (Imani et al., 2025). GNNs excel in detecting collusive behavior in
organized fraud rings, reducing false positives and improving accuracy and trust in the system by
modeling complex interactions within transactional networks. (Gomes et al., 2024). GNNs, when
combined with traditional classifiers like XGBoost, enhance performance and save money. They
can also tackle fraud graph issues like class imbalance and heterophily.

2.2.3 Ensemble Learning Methods


Ensemble learning techniques like Voting Classifiers, Bagging, and Boosting (AdaBoost,
XGBoost) improve model robustness and generalizability by combining multiple base classifiers.
(Adepoju et al., 2019). Hybrid models combining ensemble methods and oversampling
techniques show superior performance in recall and F1-score, making them ideal for fraud
detection in noisy or imbalanced datasets, improving accuracy and stability. (Sundaravadivel &
Isaac, 2025).

2.3 Data Balancing Techniques: Addressing Class Imbalance

2.3.1 The Challenge of Class Imbalance


Class imbalance in credit card fraud detection hinders accurate detection due to the majority
class's preference for fraudulent transactions (Imani et al., 2025). This results in high accuracy,
but poor fraud recalls, leading to high false negative rates and financial losses. Insufficient
minority class representation hinders models from learning meaningful fraud patterns.

2.3.2 Overview of Oversampling Methods


Oversampling is a common preprocessing strategy to class imbalance by increasing the
representation of the minority (fraudulent) class. This helps machine learning models better learn
rare fraud patterns (Adepoju et al., 2019).

● Random Oversampling replicates minority class samples to achieve balance. Though


simple, it risks overfitting due to duplication of identical instances.

● SMOTE (Synthetic Minority Oversampling Technique) creates new synthetic samples


by interpolating existing minority instances, helping models form more generalizable
decision boundaries and reducing overfitting (Halim et al., 2023).
● ADASYN (Adaptive Synthetic Sampling) builds on SMOTE by focusing synthetic
generation on harder-to-learn cases, improving F1-score and precision in complex or
noisy datasets.

2.3.3 Potential Drawbacks of Oversampling


While oversampling techniques help address class imbalance, they come with several limitations
that can impact model performance and practical deployment.

● Overfitting: Replicating or generating synthetic minority instances may cause the model
to memorize noise rather than learn generalizable patterns, resulting in poor real-world
performance.

● Computational Overhead: Oversampling increases dataset size, leading to longer


training times and higher memory consumption, issue in latency-sensitive fraud detection
systems (Adepoju et al., 2019).

● Noise and Class Overlap: Synthetic data may inaccurately represent the minority class
or blend with majority class features, causing misclassifications (Al Balawi & Aljohani,
2023).

● Lack of Diversity: If synthetic samples are too similar to existing ones, they fail to
provide new learning signals, weakening model generalizability to unseen fraud patterns
(Opena, 2025).

● Bias Amplification: Oversampling may reinforce existing biases in the minority class,
compromising fairness and skewing detection results especially problematic in financial
contexts (Opena, 2025).

2.3.4 Comparative Effectiveness and Limitations


Studies show that SMOTE and ADASYN outperform simple random oversampling, particularly
when paired with ensemble models like Random Forest or XGBoost. These methods improve
recall, AUC, and classifier stability in high-volume fraud detection settings. (Sundaravadivel &
Isaac, 2025).
However, their effectiveness diminishes under extreme imbalance ratios, where synthetic
samples may compromise precision. Advanced methods like GANs struggle to generate diverse
minority data without overfitting, highlighting the need for more adaptive and intelligent
resampling strategies. (Imani et al., 2025).

2.4 Identified Gaps in the Literature


Despite extensive research on machine learning for credit card fraud detection, several critical
challenges and research gaps persist, hindering the deployment, effectiveness, and
generalizability of fraud detection systems in real-world, dynamic financial world (Opena, 2025).

1. Real-time Constraints and Offline Evaluation: Most ML-based fraud detection studies
are conducted offline, neglecting real-time constraints like latency and throughput,
resulting in models often failing to meet live financial system processing requirements.
(Oter et al., 2025).

2. Inadequate Performance Metric Focus: Research often overemphasizes overall


accuracy, neglecting critical fraud detection metrics like recall, F1-score, and AUC,
leading to models being ineffective due to class imbalance. (Sundaravadivel & Isaac,
2025).

3. Limited Integration of Oversampling in Adaptive Frameworks: Oversampling


methods like SMOTE and ADASYN are often used offline and not integrated into
streaming or adaptive learning frameworks, resulting in the unaddressed concept drift and
evolving fraud patterns in real-world financial data. (Halim et al., 2023).

4. Overfitting and Generalizability Issues with Simple Oversampling: Random


duplication oversampling enhances recall in evaluation but often leads to overfitting, high
false positives, and reduced generalizability to unseen data. (Imani et al., 2025).

5. Breakdown of Advanced Oversampling under Extreme Imbalance: Advanced


synthetic oversampling methods, like SMOTE or ADASYN, can fail under high
imbalance ratios, generating overlapping or spurious minority samples, potentially
affecting model precision.
6. Over-reliance on a Single Public Dataset: Many studies overly rely on the Kaggle
credit-card dataset, which lacks diversity in geography, transaction type, and time span,
raising doubts about the generalizability of models to other financial contexts.

7. Lack of Interpretability in Complex Models: Complex models like ensemble methods


and deep neural networks, while achieving high detection performance, lack
interpretability, making it challenging to justify flagged transactions to stakeholders and
regulatory bodies.

8. Low Recall in Simple, Interpretable Models: Simpler models such as logistic


regression and Naive Bayes are fast and easy to interpret but often have low fraud recall,
missing many fraudulent transactions, limiting their practical utility in fraud-sensitive
domains. (Gupta et al., 2025).

9. High Computational Cost of High-Performing Models: High-performing models like


neural networks and random forests, though effective in detecting fraud, can be
impractical for time-sensitive settings like point-of-sale transactions due to high
computational cost and latency. (Opena, 2025).

10. Limited Evaluation of Hybrid Approaches under Streaming Conditions: Few studies
evaluate hybrid or ensemble approaches under streaming conditions, raising uncertainty
about their adaptability to continuous, sequential fraud data with evolving patterns in
real-time scenarios. (Imani et al., 2025).

11. Reproducibility and Practical Validation Issues: Many published results lack
reproducibility and practical validation due to reliance on proprietary or simplistic
datasets, limiting confidence in proposed models. (Oter et al., 2025).

12. Unsolved Trade-off between Performance and Operational Cost/Interpretability:


The challenge of achieving high fraud recall without excessive computation or opaque
models remains unresolved, as existing work often fails to balance performance against
operational cost and interpretability, resulting in conflicts with real-time or resource
constraints. (Olushola & Mart, 2024).
The identified gaps underscore the need for intelligent fraud detection systems that balance
accuracy, operational feasibility, interpretability, and privacy preservation in dynamic financial
environments.

3. Methodology

In this paper, we introduce a machine learning framework for credit card fraud detection based
on a standardized, imbalanced public dataset. It compares the results of several supervised
machine learning algorithms with different ways of balancing datasets, and concentrates its
analysis on recall, F1-score, and AUC statistics to better measure the success of fraud detection
in imbalanced data. (Opena, 2025).

3.1 Dataset Description

For this study, the Kaggle Credit Card Fraud Detection dataset comprises 284,807 transactions,
in which 492 are fraud cases (Credit Card Fraud Detection, 2018). We select the dataset for its
real-world relevance, availability of ground truth, and inclusion in previous benchmarking work.
All the features are numerical and have been PCA transformed for privacy. Anonymized
features Any features that capture aspects of the process of a transaction, such as the time at
which the transaction takes place, or its geographical context, without giving away sensitive raw
data. The dataset has been normalized prior to the training process; hence, the integrity of PCA
transformed values is maintained in the training and test sets. (Adepoju et al., 2019).

3.2 Data Balancing Techniques

Three best oversampling methods will be attempted on the training data in order to tackle the
class imbalance. These practices have been popular in the literature on improving model
performance in the fraud detection domain (Sundaravadivel & Isaac, 2025).

 Random Oversampling: This naïve approach duplicates the instances of the minority
class until the imbalanced situation is solved. This acts as a basic oversampling method
in evaluating the core effect of increasing the minority class representation (Adepoju et
al., 2019).
 Synthetic Minority Oversampling Technique (SMOTE): is a technique that creates
artificial instances between existing minority class neighbors. The goal of this approach
is to make decision boundaries of the model more generalizable by ensuring that the
model learns beyond rote replication (Halim et al., 2023).
 ADASYN (Adaptive Synthetic Sampling): ADASYN improves SMOTE while
providing a more targeted reconstruction of minority examples that are more difficult to
learn. This adaptive refinement process enables the fitting of synthetic data in harder
areas of feature space which could improve the model’s sensitivity to complex and non-
linear fraud patterns (Halim et al., 2023).

All machine learning models will be tested with & without these oversampling techniques to
fully analyze the additivity of these in fraud detection efficacy.

3.3 ML Algorithms

The subsequent supervised machine learning models will be employed to be able to provide a
breadth of model complexity and properties:

 Logistic Regression (LR): As a simple and interpretable model to serve as a baseline


model with low computational costs.
 K-Nearest Neighbors (KNN): Due to the non-parametric nature and local anomaly
pattern detection but lack of computational enforcement for large datasets.
 SVM machines SVM machines (SVM): SVM were used for their ability to process
high-dimensional nonlinear data by discriminating against the best splitting lines(hyper-
planes).
 Decision Trees (DT): They provide an interpretable and rule-based breakdown of the
importance of each feature.
 Random Forest (RF): A classifier based on the random forest method and is usually
more accurate and robust, especially combined with oversampling in the imbalanced
environment.
 Artificial Neural Networks (ANN): Chosen to be used for their ability to find complex
nonlinear patterns and their encouraging performances in the detection of subtle fraud
patterns (Opena, 2025).
3.4 Experimental Design and Metrics of Evaluation

The experimental layout is made robust, generalizable and accurate in fraud detection, and the
number of false negatives reduced by around 1.5% but with efficient computational using
(Opena, 2025).

 Cross-Validation: Performance assessment will be performed using 10-fold cross-


validation to train and test on different data subsets, in order to obtain robust and
generalizable results.

Metrics for Evaluation:

 Recall: It is important to weigh this metric to identify true fraud attempts, since false
negatives are much less costly compared to falsely flagging successful payment.
 F1-score: It is not precisely an accuracy measurement, but shows a trade-off between
precision and recall, especially in skewed datasets (datasets in which those values are
different) where there are false positives (higher precision) and false negatives (heigher
recall).
 AUC (Area Under the Receiver Operating Characteristic Curve): AUC is the
performance of models at all classification thresholds and needs no graph details about
the model’s positive rate.
 Accuracy: Although the target metric is not accurate due to the imbalance in the classes,
in CVD risk prediction, we will report and compare the accuracy with other studies, as it
gives an overall idea about model performance.

Computational complexity:

We will log the training and inference time and simulate real-time use case of fraud detection to
see the applicability of each model and oversampling type considering the latency constraints.

4.Work Plan and Timeline

The assignment will adhere to a work schedule to enable the achievement of the assignment
outputs in the given time frame. Key dates and milestones are shown in the table below.
Table 1: Timeline of the project

The schedule is an organized plan for the project including the literature expansion and the plan
toward the future and high-quality research proposal. It deals with possible issues by giving
proper time to each phase and a time slot to review as well.

5. Future Research Directions

Financial ecosystems and techniques for fraud are forever evolving and therefore require
innovation in fraud detection. The algorithms based on machine learning and deep learning
today provide a leap forward, but there is still potential to integrate data of multiple modalities
and to study federated learning.

5.1 Fusion of Multi-Modal Data for Improved Fraud Detection

Concept

Traditional fraud detection platforms are built around structured, transactional data, and
fraudsters exploit gaps created by these restrictive boundaries to complete transactions across
channels. Multi-modal AI is a novel approach by combining the structured and unstructured or
semi-structured data which are collected from various sources to have a more accurate
correlation of the suspicious activity.

The types of multi-modal data that can be integrated include:


 Textual Data: This can be transcripts of chat logs of customer support, emails, social
media exchanges, and other forms of text conversations. Natural Language Processing
(NLP) models can learn from this text body using unstructured statistical word patterns
and can help flag out any suspicious phrases or communication patterns that are
automatic generators of outgoing HRs (Akintan, 2025).
 Image Data: These might include product images captured for a transaction, scans of
IDs or live-motional selfie videos for account enrollment. These pictures can then be
verified by computer vision models for tampering, discrepancies or biometric validation.
 Voice Data: Voice biometrics gathered in customer service calls can be employed for
second-factor authentication and anomaly detection.
 Behavioral Trending: This includes data on the GPS (geo-location) of the user,
browser/device fingerprints (this can include metadata about the browser or device),
typing patterns during checkout and general usage pattern(s) of mobile app. Behavioral
analytics can identify when the activity does not conform to what is normal, such as an
influx in the amount of transactions being processed, or abnormal transaction amounts.
(Visa, 2023)
 Log Transactions: These are your typical relational records of transaction data that are
at the heart of the system. (TransUnion, 2023).

Integrating multi-modal data offers several substantial advantages for fraud detection systems:
 Improved Detection Accuracy: By cross-referencing diverse data sources such as
transaction amounts, geolocation, device fingerprints, and biometrics multi-modal
systems can identify inconsistencies that single-data models often overlook. For instance,
a transaction from an unfamiliar device in a foreign location, paired with familiar voice
authentication, could trigger further review (Bello & Olufemi, 2024).

 Detection of Complex, Cross-Channel Fraud: Sophisticated fraud often spans multiple


platforms (e.g., social engineering combined with transactional fraud). Multi-modal AI
captures a holistic view of user behaviour by integrating structured (e.g., logs) and
unstructured (e.g., chat transcripts) data, enabling detection of intricate, context-
dependent fraud patterns (Bello & Olufemi, 2024).
 Reduced False Positives: By synthesizing signals from multiple sources, these systems
can more accurately differentiate between legitimate anomalies such as user traveling and
actual fraudulent behavior. This reduces unnecessary alerts and improves customer
experience (Visa, 2023).

 Enhanced Due Diligence & AML Support: Behavioural data, when combined with
traditional Know Your Customer (KYC) information, enriches customer profiling. This
holistic risk assessment supports stronger Anti-Money Laundering (AML) practices
(Fraudcom International, 2025).

 Dynamic Risk Profiling: Multi-modal systems enable real-time risk updates by


integrating historical data with live transaction streams. This facilitates proactive fraud
prevention through early warnings of emerging threats (TransUnion, 2023).

Challenges

Multi-Modal AI in Fraud Detection Despite its incredible potential, applying multi-modal AI for
detecting fraud present several challenges:

 Data Heterogeneity and Fusion: Good processing and fusion of different kinds of
heterogeneous data, including structured, unstructured, or semi-structured data, should
involve complex data pipelines and technologies like neural network ensembles or cross-
modal attention mechanisms.
 Low-latency Inference: The real-time component should be able to perform (process
and analyze) a high volume of various data very quickly (for low-latency inference) to
detect fraud. This requires optimal as well as efficient model structure and computational
systems.
 Data Privacy Management: The aggregation and analysis of diverse types of critical
user data, e.g., biometrics and behaviors, triggers important data privacy issues. Strong
privacy-preserving methodologies and adherence to regulations are crucial.
 Dynamic Updates and Concept Drift: Fraudulence patterns are changing constantly.
The design of systems that can adapt with new types of data by dynamically updating
models upon addition of new types of data (e.g., blockchain transaction logs, signals from
IoT devices) allows the systems to remain resilient against emerging threats without
being bound by a set of pre-defined rules.
 Interpretability: With increasingly complex multi-modal models, it becomes difficult to
retain interpretability of decisions, that is important for compliance with regulations and
for building trust towards stakeholders of financial systems (Imani et al., 2025).

Potential Applications

Possible further research items could investigate some applications, such as:

 Improved Payment Security: Cross-confirming card billing address with user's IP


location, checking for discrepancies in product images uploaded during transaction,
typing pattern analysis while checking out.
 Enhanced Banking Fraud Detection: Making fake detection of social hackers (based
on transaction history and chat logs from customer support, where NLP models highlight
malicious phrases and computer vision models search for tampered IDs), but that’s a
stretch.
 Proactive Risk Management: Use of behavioral analytics for proactive detection of
unusual activity (e.g., a client that regularly pays small invoices suddenly pays a large
payment to an unfamiliar account).

5.2 Exploring Federated Learning for Privacy Preserving Fraud Detection

Concept

Banks can enhance fraud detection by integrating private transactional data, and yet that is
limited because of privacy legislation and competitive reasons. Federated Learning (FL) allows
collaborative model training without sharing raw data. Each organization trains a model locally
on its data and aggregates updates to a central aggregator. (Opena, 2025).

Benefits

Using FL for fraud detection has several advantages:


 Privacy and Security: FL stores raw transaction data on local systems, thus protecting
raw data from data breaches and privacy leaks, which allows us to hide patient sensitive
data, making FL a secure protocol for collaborative learning.
 Regulatory Compliance: By keeping data private, FL supports financial organizations
to meet the requirements of strict data protection laws, which is an essential operational
need.
 Improved Accuracy of Fraud Detection: Model trained together on a variety of data
from different entities results in better fraud detection. This approach allows global
models to learn a larger variety of frauds and make better generalization and detection on
new schemes. Empirical results demonstrate that the models from FL could surpass the
central learnt models under certain scenarios, especially the unbalanced data.
 Scalability: The decentralized nature of FL enables it to be deployed at a scale across a
large financial ecosystem: all these are large number of banks and payment networks.
 Low-Latency Intelligent Models: FL can help to develop low-latency intelligent
models that are critical for real-time fraud prevention.

Technical Mechanisms

There are various mechanisms for the technical realization of FL:

 Model Initialization and Distribution: A central server initializes the parameters of the
global model and distributes the parameters to the selected participating clients.
 Local Model Training: Clients train models with local datasets and make full use of
deep learning architectures such as LSTM and GNNs to capture temporal and relational
fraud patterns in the local data. (Opena, 2025).
 Secure Aggregation: The updated model weights are returned to the central server,
aggregate back with Federated Averaging, and with secure aggregation protocols like
homomorphic encryption for stronger confidentiality along with local model updates.
 Differential Privacy: DP-based approaches could be embedded to preserve privacy and
avoid sensitive transactions patterns, by injecting noise to model updates with no effect
on performance. (Imani et al., 2025).
 Iterative Refinement: The distribution, local training, and secure aggregation process is
repeated iteratively until the performance of the model converges/attains an acceptable
level (Opena, 2025).
 Integration with Explainable AI (XAI): The "black box" nature of deep learning
models in FL can be solved by integrating Explainable AI (XAI) (e.g., SHAP, LIME).
This promotes interpretability, gaining trust, fine-tuning classification, and obeying fraud
analysis (Imani et al., 2025).

Challenges

Despite its advantages, FL faces specific challenges in financial fraud detection:

 Heterogeneous Data: Heterogeneity in the propagation of data across various FIs may
make it difficult to train a generalized model assigned to all FIs.
 Communication Overhead: To develop efficient model aggregation and compression
techniques to minimize the communication overhead in large-scale financial network,
which alleviates transactions overhead for sharing of encrypted model updates.
 Adversarial attacks: FL frameworks are susceptible to poisoning this model, therefore
requiring such techniques as anomaly detection algorithms and Byzantine-solid
aggregation means that enable to prevent the model’s performance. (Imani et al., 2025).
 Standardization: Principles have gained traction there exist practical challenges with
standardizing data structures and feature engineering across institutions.

Potential Impact

Federated learning presents a potential future for fraud detection using a collaborative detection
across domains, to uncover a complex fraud embedded in the seemingly inconspicuous isolated
dataset. This method enables ongoing updating to prevent new attacks while ensuring data
privacy, which is the first step towards a secure global financial system. (Opena, 2025).

6.0 Conclusion

This paper discusses the potential of machine learning to detect Fraudulent activity of credit card
by specifically focusing on its capacity for scaling and flexibility in the face of challenges, such
as, class imbalance and evolving techniques. New forms of technology such as mobile payments,
and digital wallets have revolutionized the face of fraud, and with them come additional vectors
for coercion like account takeovers and AI-driven scams, while enhancing detection accuracy
with sophisticated deep learning models. It illustrates the benefits of oversampling techniques,
like ADASYN and SMOTE, in rare event seeking discovery and the limitations that include
overfitting, computational resources, noise interference, and sampling bias. With the Kaggle
credit card fraud dataset, 6 supervised learning algorithms were investigated, whose recall, F1-
score and AUC were the primary evaluation criterion to determine the optimal real-time project
in fraud detection. (Gupta et al., 2025). In the paper, gaps in the previous work are pointed to
guide future consideration on how to solve the problem: combining information from multi-
modal data for enhanced insight and federation learning for protective model development. This
work highlights the significance of a machine learning-based system in the current fraud
detection and highlights the need of a robust adaptive interpretable and privacy preserving
system to support the financial ecosystem.
References

1. Adepoju, O., Wosowei, J., Lawte, S., & Jaiman, H. (2019). Comparative evaluation of
credit card fraud detection using machine learning techniques. 2019 Global Conference
for Advancement in Technology (GCAT), 1–6.
https://doi.org/10.1109/gcat47503.2019.8978372

2. Al Balawi, M., & Aljohani, N. (2023). Evaluating oversampling techniques for handling
imbalanced data in fraud detection. Journal of Computer Science and Cybersecurity,
6(2), 45–58.

3. Akintan, A. A. F. (2025, May). Integrating Natural Language Processing (NLP) for real-
time fraud detection in financial transactions using AI and cloud technologies.
ResearchGate.
https://www.researchgate.net/publication/391778290_Integrating_Natural_Language_Pro
cessing_NLP_for_RealTime_Fraud_Detection_in_Financial_Transactions_Using_AI_an
d_Cloud_Technologies

4. Bello, N. O. A., & Olufemi, N. K. (2024). Artificial intelligence in fraud prevention:


Exploring techniques and applications challenges and opportunities. Computer Science &
IT Research Journal, 5(6), 1505–1520. https://doi.org/10.51594/csitrj.v5i6.1252

5. Bondar, I. (2025, April 16). Deep dive: Exploring the latest account takeover fraud
statistics [2025]. Veriff. https://www.veriff.com/fraud/news/account-takeover-fraud-
statistics

6. Costello, M. (2025, February 19). Synthetic Identity Fraud: The Fastest-Growing


Financial Crime of 2025 - RCB Bank. RCB Bank. https://rcbbank.bank/learn-synthetic-
identity-fraud-the-fastest-growing-financial-crime-of-2025/

7. Credit card fraud Detection. (2018, March 23). Kaggle.


https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

8. Evgeny. (2025, June 5). Сommon mobile application vulnerabilities to know in 2025.
Touchlane. https://touchlane.com/common-mobile-application-vulnerabilities-2025/
9. Five fraud trends for financial institutions. (n.d.). Visa.
https://corporate.visa.com/en/solutions/commercial-solutions/knowledge-hub/fraud-as-a-
service-trends.html

10. Fraudcom International. (2025, March 23). KYC compliance – The key to effective fraud
prevention. Fraud.com. https://www.fraud.com/post/kyc-compliance#:~:text=Real
%2Dtime%20risk%20scoring:%20By%20combining%20KYC%20data,streamline
%20the%20verification%20process%20for%20low%2Drisk%20customers.

11. Goel, D., & Jain, A. K. (2018). Smishing-Classifier: A novel framework for detection of
smishing attack in mobile environment. In Communications in computer and information
science (pp. 502–512). https://doi.org/10.1007/978-981-10-8660-1_38

12. Gomes, L., Kueck, J., Mattes, M., Spindler, M., & Zaytsev, A. (2024). Collusion
Detection with Graph Neural Networks. arXiv (Cornell University).
https://doi.org/10.48550/arxiv.2410.07091

13. Gupta, R. K., Hassan, A., Majhi, S. K., Parveen, N., Zamani, A. T., Anitha, R., Ojha, B.,
Singh, A. K., & Muduli, D. (2025). Enhanced framework for credit card fraud detection
using robust feature selection and a stacking ensemble model approach. Results in
Engineering, 105084. https://doi.org/10.1016/j.rineng.2025.105084

14. Gupta, R., Singh, S., & Bansal, A. (2025). Evaluating machine learning models for
imbalanced fraud detection datasets. Computational Intelligence and Neuroscience, 2025,
1–14. https://doi.org/10.1155/2025/7634932

15. Goswami, R. (2025). Evolution of fraud patterns in mobile apps and digital payment
systems. Journal of Financial Technology and Security, 11(1), 56–72.

16. Halim, A. M., Dwifebri, M., & Nhita, F. (2023). Handling imbalanced data sets using
SMOTE and ADASYN to improve classification performance of Ecoli data sets.
Building of Informatics Technology and Science (BITS), 5(1).
https://doi.org/10.47065/bits.v5i1.3647
17. Hmidy, Y., & Mabrouk, M. B. (2024). Reliable logistic regression for credit card fraud
detection. International Journal of Advanced Computer Science and Applications,
15(11). https://doi.org/10.14569/ijacsa.2024.0151107

18. Howarth, J. (2025, June 6). 23+ eCommerce Fraud Statistics (2024). Exploding Topics.
https://explodingtopics.com/blog/ecommerce-fraud-stats

19. Imani, M., Beikmohammadi, A., & Arabnia, H. R. (2025). Comprehensive Analysis of
Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under
Varying Imbalance Levels. Technologies, 13(3), 88.
https://doi.org/10.3390/technologies13030088

20. Juniper Research. (2022, July 11). Online payment fraud losses to exceed
USD 343 billion by 2027. The Paypers. https://thepaypers.com/digital-identity-security-
online-fraud/online-payment-fraud-losses-to-exceed-usd-343-billion-by-2027-juniper-
research--1257463

21. K, S., S, V., S, R., D, R., & R, S. (2025). Financial Transactional Fraud Detection using a
Hybrid BiLSTM with Attention-Based Autoencoder. International Research Journal of
Multidisciplinary Technovation, 135–147. https://doi.org/10.54392/irjmt25211

22. Mastercard. (2024, January 24). Ecommerce Fraud Trends and Statistics Merchants need
to know in 2024. Article at a Glance.
https://b2b.mastercard.com/news-and-insights/blog/ecommerce-fraud-trends-and-
statistics-merchants-need-to-know-in-2024/

23. Najadat, H., Altiti, O., & Aqouleh, A. A. (2020). Credit card fraud detection based on
machine and deep learning. Proceedings of the 11th International Conference on
Information and Communication Systems (ICICS), 204–208.
https://doi.org/10.1109/icics49469.2020.239524

24. Olushola, A., & Mart, J. (2024, January 15). Fraud Detection using Machine Learning
(Preprint). ScienceOpen Preprints. https://doi.org/10.14293/PR2199.000647.v1

25. Opena, A. (n.d.). 70+ eCommerce Fraud Statistics [2025]: Trends, Data, & Facts.
Feedink Sp. Z o.o. https://cropink.com/ecommerce-fraud-statistics
26. Oter, P., Gaynor, K., Elly, B., & Smith, B. (2025). Assessing the challenges of
implementing real-time fraud detection solutions.
https://www.researchgate.net/publication/388221452

27. Saad, S., Nadher, I., & Hameed, S. M. (2024). Credit Card Fraud Detection Challenges
and Solutions: A review. Iraqi Journal of Science, 2287–2303.
https://doi.org/10.24996/ijs.2024.65.4.42

28. Sahin, Y., & Duman, E. (2011). Detecting credit card fraud by decision trees and support
vector machines. Proceedings of the International MultiConference of Engineers and
Computer Scientists (IMECS), 1, 442–447.
https://doi.org/10.1109/IMECS.2011.5873802

29. Sillam, Y. (2023, December 20). What is Ransomware | Attack Types, Protection &
Removal | Imperva. Learning Center. https://www.imperva.com/learn/application-
security/ransomware/#:~:text=Ransomware%20is%20a%20type%20of,)%20terminal
%2C%20or%20other%20endpoint.

30. Sundaravadivel, P., Isaac, R. A., Elangovan, D., KrishnaRaj, D., Rahul, V. V. L., & Raja,
R. (2025). Optimizing credit card fraud detection with random forests and SMOTE.
Scientific Reports, 15(1). https://doi.org/10.1038/s41598-025-00873-y

31. Sunderajulu, K. B. (2024). eCommerce & Digital Wallet Payment Fraud. Deleted
Journal, 2(6). https://doi.org/10.62127/aijmr.2024.v02i06.1111

32. Thies, B., & Thies, B. (2025, April 25). Cybersecurity Industry Statistics: ATO,
ransomware, breaches & Fraud. SpyCloud. https://spycloud.com/blog/cybersecurity-
industry-statistics-account-takeover-ransomware-data-breaches-bec-fraud/

33. TransUnion. (2023, March 21). TransUnion report finds digital fraud attempts spike 80%
globally from Pre-Pandemic levels. TransUnion Report Finds Digital Fraud Attempts
Spike 80% Globally from Pre-Pandemic Levels.
https://newsroom.transunion.com/transunion-report-finds-digital-fraud-attempts-spike-
80-globally-from-pre-pandemic/
34. Upadhyaya, A., & Singh, R. (2025). Enhancing credit card fraud detection with K-
Nearest Neighbours (KNN): A machine learning approach. Journal of Information
Science and Engineering, 41(2), 123-135.
https://jisem-journal.com/39_Enhancing+Credit+Card+Fraud+Detection+with+K-
Nearest+Neighbours+(KNN)+A+Machine+Learning+Approach.pdf

35. Verdon, J. (2021, April 27). Global E-Commerce sales to hit $4.2 trillion as online surge
continues, Adobe reports. Forbes.
https://www.forbes.com/sites/joanverdon/2021/04/27/global-ecommerce-sales-to-hit-42-
trillion-as-online-surge-continues-adobe-reports/

36. Visa. (n.d.). Fraud-as-a-Service Trends. Visa Corporate. Retrieved June 22, 2025, from
https://corporate.visa.com/en/solutions/commercial-solutions/knowledge-hub/fraud-as-a-
service-trends.html

37. Webb, M. (2024, June 23). 60+ global credit card fraud statistics you need to know in
2025. Techopedia. https://www.techopedia.com/credit-card-fraud-statistics

38. Wu, Y., Wang, L., Li, H., & Liu, J. (2025). A deep learning method of credit card fraud
detection based on Continuous-Coupled Neural Networks. Mathematics, 13(5), 819.
https://doi.org/10.3390/math13050819

39. Zongkai, L., & Wei, Y. (2024). Comparative analysis of SMOTE, ADASYN, and GANs
for credit card fraud detection. Journal of Artificial Intelligence Research, 78, 145–162.

You might also like