This project fine-tunes a language model entirely within a single Jupyter Notebook to extract structured fields β name, category, manufacturer, and price β from product descriptions.
β
All logic β dataset prep, model training, and inference β is inside product_info_finetune.ipynb.
Input:
Maggi noodles pack of 2 by NestlΓ© for $0.50
Expected Output:
{
"name": "Maggi noodles",
"category": "",
"manufacturer": "NestlΓ©",
"price": "$0.50"
}β Prepares a noisy, realistic dataset of product descriptions
β Handles missing fields gracefully by returning ""
β Fine-tunes a model (e.g., Mistral, LLaMA, etc.) using TRL or Unsloth
β Runs inference to verify accuracy
- Python π
- Hugging Face Transformers + Datasets
- Unsloth (for fast fine-tuning)
- π€ TRL / SFTTrainer
- JSONL formatted training data
from transformers import pipeline
pipe = pipeline("text2text-generation", model="your-model-name")
res = pipe("Colgate toothpaste 200g by Colgate-Palmolive for $3")
print(res[0]["generated_text"])Expected:
{
"name": "Colgate toothpaste",
"category": "",
"manufacturer": "Colgate-Palmolive",
"price": "$3"
}βββ Fine-Tune.ipynb # π₯ Everything is in here
βββ Data.jsonl # (optional) saved dataset
βββ README.md # you're reading this
MIT License β free to use, modify, and share.
Built by Puneet Rawat.
Feel free to fork, star, or open issues for suggestions or improvements.