A complete Retrieval-Augmented Generation (RAG) system built with LlamaStack, demonstrating semantic search and AI-powered question answering.
This project showcases a fully functional RAG system that:
- ✅ Connects to LlamaStack server with Ollama backend
- ✅ Creates vector databases for document storage
- ✅ Performs semantic search on documents
- ✅ Includes an AI agent for interactive Q&A
- ✅ Provides both notebook and script interfaces
- Ollama running on port 11434 with
llama3.2:3bmodel - LlamaStack server running on port 8321
# Install Ollama (if not already installed)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the required model
ollama pull llama3.2:3b
# Install project dependencies
uv syncINFERENCE_MODEL=llama3.2:3b uv run --with llama-stack llama stack build --template ollama --image-type venv --runOption A: Jupyter Notebook (Interactive)
uv run jupyter notebook
# Open app.ipynb and run all cellsOption B: Python Script (Automated)
uv run python test_rag.pyAI-Goal/
├── app.ipynb # 📓 Interactive Jupyter notebook demo
├── test_rag.py # 🐍 Standalone Python script demo
├── pyproject.toml # 📦 Project configuration
├── uv.lock # 🔒 Dependency lock file
├── .python-version # 🐍 Python version specification
└── README.md # 📖 This file
- Uses Faiss for vector storage
- 384-dimensional embeddings via
all-MiniLM-L6-v2 - Automatic document chunking and indexing
- Converts queries to embeddings
- Finds semantically similar documents
- Returns ranked results with metadata
- Creates an AI agent with access to the knowledge base
- Uses
builtin::rag/knowledge_searchtool - Provides conversational interface
The system includes sample documents about:
- RAG (Retrieval-Augmented Generation) concepts
- LlamaStack platform overview
- Vector databases and semantic search
- "What is RAG?"
- "Tell me about LlamaStack"
- "How do vector databases work?"
LLAMA_STACK_PORT=8321- LlamaStack server port
- LLM:
llama3.2:3b(via Ollama) - Embeddings:
all-MiniLM-L6-v2(via LlamaStack)
Import Error: llama_stack_client
# Make sure you're using the uv environment
uv run python your_script.py
# Or for Jupyter
uv run jupyter notebookConnection Error to LlamaStack
- Ensure LlamaStack server is running on port 8321
- Check that Ollama is running with llama3.2:3b model
Kernel Issues in Jupyter
- Use the "AI Goal RAG Environment" kernel if available
- Or run:
uv run jupyter notebookto use correct environment
- Vector Database Management - Create, populate, query, cleanup
- Semantic Search - Meaning-based document retrieval
- Error Handling - Robust error handling and debugging
- Multiple Interfaces - Both notebook and script versions
- AI Agent Integration - Conversational RAG interface
- Production Ready - Proper dependency management with uv
This project demonstrates RAG concepts with LlamaStack. Feel free to:
- Add more document types
- Experiment with different embedding models
- Extend the AI agent capabilities
- Improve the user interface
This project is open source and available under the MIT License.