This repository implements a Credit Card Fraud Detection pipeline using XGBoost, SHAP for model interpretation, and Streamlit for interactive demonstration. It is organized into:
-
notebooks/: Jupyter notebooks covering end-to-end data exploration, feature engineering, model training, tuning, and interpretation.
-
scripts/: Python utility modules:
process_final.py:polprocess(df, cat_rates)for consistent feature engineering.
-
artifacts/: Serialized model & metadata for deployment:
fraud_slim.json: trained slim XGBoost Booster.slim_features.joblib: list of features expected by the slim model.category_rates.joblib: mapping of category → historical fraud rate.le_category.joblib: LabelEncoder for transaction categories.uf_names.joblib: user‑friendly category names.
-
streamlit_app/: a polished Streamlit application (
pol_app.py) that loads artifacts and lets you simulate new transactions.
-
Clone the repo
git clone https://github.com/<your‑username>/CC_Fraud_Detection.git cd CC_Fraud_Detection
-
Create & activate a Python environment (conda or venv):
conda create -n fraud-detect python=3.11 conda activate fraud-detect
-
Install dependencies
pip install -r requirements.txt
-
Open
notebooks/CC_Fraud_polished.ipynbto step through:- Setup & Data Load: imports, Kaggle download, preview.
- Preprocessing & Feature Engineering: apply
polprocess()to train/test splits. - Hyperparameter Tuning & Training: train slim XGBoost with early stopping.
- Model Interpretation: SHAP summary & beeswarm plots.
- Export Artifacts: save model, feature list, encoders to
artifacts/.
Run all cells to reproduce results and regenerate the artifacts/ files.
-
Launch:
streamlit run streamlit_app/pol_app.py
-
Simulate new transactions by adjusting amount, category, hour, and population.
-
Threshold slider dynamically changes fraud vs. legitimate decision cutoff.
├── artifacts/ # trained models + metadata
│ ├── fraud_slim.json
│ ├── slim_features.joblib
│ └── ...
├── notebooks/
│ ├── CC_Fraud_polished.ipynb
│ └── NB_archive/...
├── scripts/
│ ├── process_final.py
│ └── booster_wrapper.py
├── streamlit_app/
│ └── pol_app.py
├── requirements.txt
└── README.md
- Slim Model: 12‑feature XGBoost, AUC ≈ 0.99, PR‑AUC ≈ 0.80, F1 optimized at threshold ≈ 0.70.
- Interpretation: SHAP identifies top drivers (amount, category TE, hour patterns, etc.).
All required artifacts are in artifacts/. To deploy:
- Ensure
artifacts/is alongsidestreamlit_app/. - Run the app as above.
This project is MIT‑licensed—feel free to reuse and adapt in your portfolio or demos.