This is a proof‑of‑concept AI/ML project that trains a simple text classifier (ticket triage vibe) on a subset of the 20 Newsgroups dataset, then serves predictions via a FastAPI HTTP service.
- Trains a TF‑IDF + Logistic Regression model to classify text into one of 4 categories:
comp.sys.mac.hardwarerec.autossci.medtalk.politics.misc
- Exposes a POST
/predictendpoint that takes free‑text and returns the predicted label and class probabilities.
This is intended as a compact, inspectable PoC you can extend (swap datasets, add pre/post‑processing, log metrics, etc.).
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtpython -m src.trainThis downloads (via scikit‑learn) a subset of 20 Newsgroups, trains a model, prints a report, and writes:
artifacts/model.joblibartifacts/labels.jsonartifacts/metrics.json
uvicorn src.serve:app --reloadNow send a request:
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"text":"My Mac keeps freezing when I plug in an external display"}'Example response:
{
"label": "comp.sys.mac.hardware",
"scores": {"comp.sys.mac.hardware": 0.62, "rec.autos": 0.08, "sci.med": 0.05, "talk.politics.misc": 0.25}
}pytest -qdocker build -t poc-ai .
docker run -p 8000:8000 poc-aipoc-ai-ml/
├─ src/
│ ├─ train.py # train + evaluate + save artifacts
│ ├─ serve.py # FastAPI inference app
│ ├─ model.py # model/pipeline factory
│ ├─ schema.py # pydantic request/response models
│ └─ __init__.py
├─ tests/
│ └─ test_predict.py # quick smoke test for the trained model
├─ requirements.txt
├─ Dockerfile
├─ .gitignore
└─ README.md
- This keeps dependencies light and everything in plain Python for clarity.
- You can switch to other datasets (e.g., SMS spam) or add a database/logger with minimal code changes.
- For production use, add robustness (input validation, monitoring, retries, CI/CD, etc.).