Credit Card Fraud Detection

This repository implements a Credit Card Fraud Detection pipeline using XGBoost, SHAP for model interpretation, and Streamlit for interactive demonstration. It is organized into:

notebooks/: Jupyter notebooks covering end-to-end data exploration, feature engineering, model training, tuning, and interpretation.
scripts/: Python utility modules:
- process_final.py: polprocess(df, cat_rates) for consistent feature engineering.
artifacts/: Serialized model & metadata for deployment:
- fraud_slim.json: trained slim XGBoost Booster.
- slim_features.joblib: list of features expected by the slim model.
- category_rates.joblib: mapping of category → historical fraud rate.
- le_category.joblib: LabelEncoder for transaction categories.
- uf_names.joblib: user‑friendly category names.
streamlit_app/: a polished Streamlit application (pol_app.py) that loads artifacts and lets you simulate new transactions.

🔧 Installation

Clone the repo

git clone https://github.com/<your‑username>/CC_Fraud_Detection.git
cd CC_Fraud_Detection

Create & activate a Python environment (conda or venv):

conda create -n fraud-detect python=3.11
conda activate fraud-detect

Install dependencies
```
pip install -r requirements.txt
```

📓 Jupyter Notebook

Open notebooks/CC_Fraud_polished.ipynb to step through:
1. Setup & Data Load: imports, Kaggle download, preview.
2. Preprocessing & Feature Engineering: apply polprocess() to train/test splits.
3. Hyperparameter Tuning & Training: train slim XGBoost with early stopping.
4. Model Interpretation: SHAP summary & beeswarm plots.
5. Export Artifacts: save model, feature list, encoders to artifacts/.

Run all cells to reproduce results and regenerate the artifacts/ files.

🚀 Streamlit App

Launch:
```
streamlit run streamlit_app/pol_app.py
```
Simulate new transactions by adjusting amount, category, hour, and population.
Threshold slider dynamically changes fraud vs. legitimate decision cutoff.

🗂️ Folder Structure

├── artifacts/               # trained models + metadata
│   ├── fraud_slim.json
│   ├── slim_features.joblib
│   └── ...
├── notebooks/
│   ├── CC_Fraud_polished.ipynb
│   └── NB_archive/...
├── scripts/
│   ├── process_final.py
│   └── booster_wrapper.py
├── streamlit_app/
│   └── pol_app.py
├── requirements.txt
└── README.md

📈 Results Summary

Slim Model: 12‑feature XGBoost, AUC ≈ 0.99, PR‑AUC ≈ 0.80, F1 optimized at threshold ≈ 0.70.
Interpretation: SHAP identifies top drivers (amount, category TE, hour patterns, etc.).

📂 Deployment

All required artifacts are in artifacts/. To deploy:

Ensure artifacts/ is alongside streamlit_app/.
Run the app as above.

⚖️ License

This project is MIT‑licensed—feel free to reuse and adapt in your portfolio or demos.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.vscode		.vscode
artifacts		artifacts
notebooks		notebooks
scripts		scripts
streamlit_app		streamlit_app
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Credit Card Fraud Detection

🔧 Installation

📓 Jupyter Notebook

🚀 Streamlit App

🗂️ Folder Structure

📈 Results Summary

📂 Deployment

⚖️ License

About

Uh oh!

Releases

Packages

Languages

a-dangelo/CC_Fraud_Detection

Folders and files

Latest commit

History

Repository files navigation

Credit Card Fraud Detection

🔧 Installation

📓 Jupyter Notebook

🚀 Streamlit App

🗂️ Folder Structure

📈 Results Summary

📂 Deployment

⚖️ License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages