🤖 RAG Playground

An educational Python project to demonstrate how RAG (Retrieval-Augmented Generation) systems work.

📋 Description

This project implements a complete RAG system from scratch, showing each component of the process:

Document loading
Processing and chunking
Embedding generation
Vector storage
Information retrieval
Answer generation with LLMs

🚀 Quick Start

Prerequisites

Python 3.10 or higher
Node.js 18+ (for the frontend)
OpenAI API Key (to use GPT models)

Backend Installation

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt
pip install -r backend/requirements.txt

Configure environment variables

Copy the .env.example file to .env and configure your API key:

cp .env.example .env

Edit .env and add your OpenAI API Key:

OPENAI_API_KEY=your_api_key_here

Frontend Installation

cd frontend
npm install

🎯 Usage

Option 1: Web Frontend (Recommended)

Start Backend:

# In one terminal
source venv/bin/activate
cd backend
python main.py

The backend will be at http://localhost:8000

Start Frontend:

# In another terminal
cd frontend
npm run dev

The frontend will be at http://localhost:5173

Option 2: Web Interface (Streamlit)

Run the Streamlit application:

streamlit run app/main.py

Option 3: Jupyter Notebooks

Explore the educational notebooks in the notebooks/ folder:

01_data_loading.ipynb - Document loading
02_embeddings.ipynb - Embedding generation
03_retrieval.ipynb - Retrieval system
04_full_rag.ipynb - Complete RAG system

Option 4: Programmatic Usage

from src.document_loader import DocumentLoader
from src.text_splitter import TextSplitter
from src.embeddings import EmbeddingGenerator
from src.vector_store import VectorStore
from src.retriever import Retriever
from src.rag_chain import RAGChain

# 1. Load documents
loader = DocumentLoader()
documents = loader.load_directory("./data/documents")

# 2. Split into chunks
splitter = TextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)

# 3. Generate embeddings
embedding_gen = EmbeddingGenerator()
chunks_with_embeddings = embedding_gen.generate_embeddings_for_chunks(chunks)

# 4. Store in vector store
vector_store = VectorStore(collection_name="my_rag", reset=True)
vector_store.add_chunks(chunks_with_embeddings)

# 5. Create retriever and RAG chain
retriever = Retriever(vector_store, embedding_gen, top_k=3)
rag_chain = RAGChain(retriever, model_name="gpt-3.5-turbo")

# 6. Ask a question
result = rag_chain.query("What is RAG?")
print(result['answer'])

📁 Project Structure

rag-playground/
├── src/                    # Main source code
│   ├── document_loader.py  # Document loading
│   ├── text_splitter.py    # Chunk splitting
│   ├── embeddings.py       # Embedding generation
│   ├── vector_store.py     # Vector database
│   ├── retriever.py        # Retrieval system
│   ├── rag_chain.py        # Complete RAG chain
│   └── utils.py            # Utilities
├── backend/                # FastAPI Backend
│   ├── main.py             # API server
│   └── requirements.txt    # Backend dependencies
├── frontend/               # React Frontend
│   ├── src/
│   │   ├── components/     # React components
│   │   └── App.jsx         # Main app
│   └── package.json        # Frontend dependencies
├── app/                    # Streamlit application
│   └── main.py             # Web interface
├── data/                   # Data
│   ├── documents/          # Example documents
│   └── vector_db/          # Vector database (generated)
├── notebooks/              # Educational notebooks
├── tests/                  # Unit tests
└── requirements.txt        # Python dependencies

🔧 Configuration

You can configure various aspects of the system by editing the .env file:

OPENAI_API_KEY: Your OpenAI API key
LLM_MODEL: Model to use (default: gpt-3.5-turbo)
LLM_TEMPERATURE: Temperature for generation (0.0-1.0)
EMBEDDING_MODEL: Embedding model (default: sentence-transformers/all-MiniLM-L6-v2)
CHUNK_SIZE: Chunk size (default: 1000)
CHUNK_OVERLAP: Overlap between chunks (default: 200)
TOP_K: Number of chunks to retrieve (default: 3)

🧪 Testing

Run tests with pytest:

pytest tests/

📚 Example Documents

The project includes example documents in data/documents/:

sample1.txt - Introduction to RAG
sample2.txt - Embeddings and vector representations
sample3.md - Chunking and processing

You can add your own documents to experiment.

🎓 Concepts Learned

This project demonstrates:

Document Loading: How to load different formats (PDF, TXT, MD, DOCX)
Chunking: Strategies for splitting documents into manageable fragments
Embeddings: How to convert text into vector representations
Vector Stores: Storage and search of embeddings
Retrieval: How to find relevant information using similarity
RAG Chain: Combination of retrieval and generation with LLMs

🛠️ Technologies Used

Backend

Python 3.10+
LangChain - Framework for LLM applications
ChromaDB - Vector database
Sentence Transformers - Embedding models
OpenAI API - Language models
FastAPI - REST API

Frontend

React 19 - UI framework
Tailwind CSS - Styling
Vite - Build tool

Others

Streamlit - Alternative web interface
PyPDF2 - PDF processing

📝 License

This is an open-source educational project. Feel free to use and modify it.

🤝 Contributions

Contributions are welcome! If you find bugs or have suggestions, please open an issue or submit a pull request.

📖 Additional Resources

⚠️ Notes

This project requires an OpenAI API key to function completely
Embedding models are downloaded automatically the first time
The vector database is created automatically in data/vector_db/

Enjoy exploring RAG! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
backend		backend
data/documents		data/documents
frontend		frontend
notebooks		notebooks
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
PLAN_PROYECTO_RAG.md		PLAN_PROYECTO_RAG.md
README.md		README.md
RESULTADOS_PRUEBAS.md		RESULTADOS_PRUEBAS.md
START.md		START.md
START_HERE.md		START_HERE.md
quick_start.py		quick_start.py
requirements.txt		requirements.txt
start_backend.sh		start_backend.sh
start_frontend.sh		start_frontend.sh
test_full_rag.py		test_full_rag.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 RAG Playground

📋 Description

🚀 Quick Start

Prerequisites

Backend Installation

Frontend Installation

🎯 Usage

Option 1: Web Frontend (Recommended)

Option 2: Web Interface (Streamlit)

Option 3: Jupyter Notebooks

Option 4: Programmatic Usage

📁 Project Structure

🔧 Configuration

🧪 Testing

📚 Example Documents

🎓 Concepts Learned

🛠️ Technologies Used

Backend

Frontend

Others

📝 License

🤝 Contributions

📖 Additional Resources

⚠️ Notes

About

Uh oh!

Releases

Packages

Languages

jmoliugp/rag-playground

Folders and files

Latest commit

History

Repository files navigation

🤖 RAG Playground

📋 Description

🚀 Quick Start

Prerequisites

Backend Installation

Frontend Installation

🎯 Usage

Option 1: Web Frontend (Recommended)

Option 2: Web Interface (Streamlit)

Option 3: Jupyter Notebooks

Option 4: Programmatic Usage

📁 Project Structure

🔧 Configuration

🧪 Testing

📚 Example Documents

🎓 Concepts Learned

🛠️ Technologies Used

Backend

Frontend

Others

📝 License

🤝 Contributions

📖 Additional Resources

⚠️ Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages