Machine Learning Project – Catching fraudsters one transaction at a time! 🚀
Every day, thousands of transactions flow through financial systems – but some of them are sneaky fraud attempts 🥷. This project builds a smart system to spot those fraudulent transactions before they cause any damage.
✅ Built for Data Science Interns to showcase their skills.
✅ Uses real-world style data (~6.3 million rows!) to train machine learning models.
| Feature | Description |
|---|---|
step |
Time step in hours (1 = first hour, 2 = second hour, ...) |
type |
Transaction type: CASH-IN, CASH-OUT, TRANSFER, etc. |
amount |
Transaction amount. |
oldbalanceOrg |
Sender's balance before the transaction. |
newbalanceOrig |
Sender's balance after the transaction. |
oldbalanceDest |
Recipient's balance before the transaction. |
newbalanceDest |
Recipient's balance after the transaction. |
isFraud |
🚨 Target: 1 = fraud, 0 = legit. |
isFlaggedFraud |
Flag for illegal attempts (>200,000 units). |
📢 Class Imbalance Alert: Fraudulent transactions make up only 0.17% of the dataset!
-
Data Cleaning & Preprocessing
- Handled missing values, outliers, and encoded transaction types.
- Added engineered features like
balanceDiffto improve detection.
-
Exploratory Data Analysis (EDA)
- Visualized fraud patterns across transaction types and amounts.
- Spotted class imbalance early.
-
Model Building
- Baseline: Logistic Regression.
- Advanced: Random Forest, XGBoost, and LightGBM.
- Used weighted losses & tuning for handling rare fraud cases.
-
Evaluation
- Metrics: ROC-AUC, Precision, Recall, F1-Score.
- Feature importance plots to explain predictions.
| Model | ROC-AUC | Precision | Recall |
|---|---|---|---|
| Logistic Regression | 0.92 | 12% | 78% |
| XGBoost | 0.991 | 89% | 95% |
| LightGBM | 🏅 0.993 | 91% | 96% |
🎉 Winner: LightGBM – lightweight, fast, and highly accurate!
fraud-detection-project/
├── FraudDetection.ipynb # Full notebook: EDA + modeling
├── FraudDetection_Technical.md # Technical report
├── FraudDetection_Business.md # Stakeholder-friendly report
├── generate_charts.py # Script to make charts
├── charts/ # Saved chart images
├── README.md # This file
└── Fraud.csv # The dataset
- Clone this repo:
git clone https://github.com/hi-riddhi/fraud-detection-project.git
- Install dependencies:
pip install -r requirements.txt
- Add your dataset: Drop Fraud.csv into the project folder.
- Run the notebook:
jupyter notebook FraudDetection.ipynb
- Make charts:
python generate_charts.py
-
Tackled a real-world imbalanced classification problem.
-
Built smart ML models (LightGBM, XGBoost) to detect fraud.
-
Delivered technical & business reports for different audiences.
If you spot a bug, submit an issue! If you catch a crook, grab a coffee and celebrate. This repo is open-source — so join the squad and help make the world a little safer (and more fun).
Made with 💜 and a slightly paranoid mind.
Hi, I’m Riddhi! This project was part of my Data Science Internship Portfolio.
💻 Skills used: Python, Pandas, Scikit-learn, LightGBM, Data Visualization.