This project builds a fully autonomous AI system that scans the web for deals, estimates the true market price using an ensemble of models, and sends real-time push notifications when something is genuinely undervalued. It combines LLMs, classical ML, vector search, fine-tuning, cloud GPUs, Gradio-based UI, and agentic workflows into a cohesive end-to-end system.
- Autonomous Deal Scanning: Constantly monitors RSS feeds for new listings.
- Fair Price Estimation: Ensemble of LLM and ML models to infer realistic market value.
- Push Notifications: Alerts when significant undervaluations are detected.
- Gradio Dashboard: Interactive dashboard for viewing deals, model estimates, and alerts.
- Multi-Agent Reasoning: Modular design with orchestrated agent collaboration.
| Agent | Purpose |
|---|---|
| Scanner Agent | Scrapes RSS feeds in real-time for new deals |
| Frontier Agent (RAG) | Retrieves similar items (RAG) via embeddings + uses frontier LLM (GPT-4o-mini / DeepSeek) to estimate price |
| Specialist Agent (Fine-Tuned LLM) | QLoRA fine-tuned model deployed on Modal predicts clean prices |
| Random Forest Agent | Traditional ML model predicting price, trained on sentence-transformer embeddings. |
| Ensemble Agent | Linear model combining all price predictions |
| Planning Agent | Central orchestrator that manages agent workflows and decision logic (picks best deal, calculates discount, triggers alerts) |
| Messaging Agent | Sends Pushover alerts for high-value opportunities. |
- Data Collection
- Curated Pricing Dataset (Hugging Face):
Loaded Amazon product metadata from McAuley-Lab/Amazon-Reviews-2023 across 8 categories:
Automotive, Electronics, Office Products, Tools & Home Improvement, Cell Phones & Accessories, Toys & Games, Appliances, Musical Instruments.
These entries provide product descriptions + prices used to train all pricing models.
- Live Deal Scraping:
RSS feeds (e.g., SlickDeals, HotUKDeals) supply real-time deal descriptions and prices for inference.
- Data Cleaning & Transformation
- Normalised product descriptions (titles + bullet points → clean text)
- Extracted & validated pricing information
- Removed duplicates and outliers
- Result: a consistent price–description dataset suitable for model training and evaluation.
- Embeddings & Storage
- Embedded all product descriptions using
sentence-transformers/all-MiniLM-L6-v2
- Stored vectors + metadata in ChromaDB
→ Enables similarity search for the frontier RAG model
→ Provides neighbourhood price statistics (min/max) used in the ensemble
- Model Training
- Specialist LLM: fine-tuned with QLoRA on curated dataset for price prediction
- Frontier RAG Model: retrieves nearest embeddings → frontier LLM estimates fair value
- Random Forest Baseline: trained on embeddings to provide a stable numeric estimate
- Combined through a calibrated linear ensemble, using real learned coefficients.
- Real-Time Deal Scoring For every incoming deal:
- Embed description
- Retrieve similar items from ChromaDB
- Generate three independent predictions
- Combine via ensemble to compute fair market value
- Compare against scraped price to compute discount
- If discount exceeds threshold → push notification
The system doesn't rely on one model.
It learns how to weight them optimally using a trained linear regression:
FinalPrice =
0.73 * SpecialistLLM +1 .03 * FrontierLLM + 0.44 * RandomForest - 0.64 * MinModel - 0.60 * MaxModel + 26.47
- QLoRA fine-tuned on ~400k product descriptions
- Runs in 4-bit quantized mode
- Deployed to Modal as a GPU-backed inference service
- Stateless, fast cold starts, cached weights
The specialist model is exposed via:
Pricer = modal.Cls.lookup("pricer-service", "Pricer")
pricer.price.remote("product description")| Layer | Technology |
|---|---|
| Language | Python |
| LLM & Fine-tuning | QLoRA, Transformers, PEFT |
| Embeddings | SentenceTransformers / OpenAI embeddings |
| Vector DB | ChromaDB |
| Frontend / UI | Gradio |
| Agents / Orchestration | LangChain / custom planning logic |
| Notifications | Pushover.net API |
| Deployment | Modal (GPU service) / Localhost |
The UI includes:
- a table of all discovered deals
- real-time agent logs
- 3D embedding visualization (vector DB)
- automatic refresh (300s)
- Clone the repository
git clone https://github.com/laumek/pricer_agentic_ai.git
cd pricer_agentic_ai
-
Install dependencies from pyproject.toml file
pip install -e . -
Set up environment variables
Use .env.example to create a .env file with your API keys and configuration:
HF_TOKEN=...
PUSHOVER_USER=...
PUSHOVER_TOKEN=...
etc.
- Run the system
python src/price_intel/agents/main.py - Launch the Gradio UI
python src/price_intel/interface/gradio_app.py
- Hugging Face datasets for curated product data.
- SentenceTransformers for embeddings.
- Modal for deployment and LLM inference.
- Gradio for rapid UI prototyping.
- OpenAI / DeepSeek models for RAG and reasoning layers.
- This project builds on code from Ed Donner (https://github.com/ed-donner/llm_engineering) under the MIT License. Significant modifications, enhancements, and additional agents have been implemented independently.