A powerful Retrieval-Augmented Generation (RAG) system built with modern AI technologies, featuring hybrid search capabilities and Thai language support for health-related information.
- 🔍 Hybrid Search: Combines semantic search (vector) with keyword search (BM25) for optimal results
- 🧠 AI-Powered Q&A: Uses Large Language Models for intelligent question answering
- 🌐 Web Interface: User-friendly Streamlit interface with chat-like experience
- 🏥 Health Domain: Pre-loaded with Thai medical knowledge base
- ⚡ Real-time: Fast API backend with async processing
- 🐳 Containerized: Easy deployment with Docker
- 🔧 Extensible: Modular design for easy customization
- 💾 HuggingFace Embeddings: Uses BAAI/bge-m3 model for high-quality embeddings
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Streamlit UI │◄──►│ FastAPI │◄──►│ OpenSearch │
│ (Frontend) │ │ (Backend) │ │ (Vector DB) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
└───────────────────────┼───────────────────────┘
│
┌─────────▼─────────┐
│ HuggingFace │
│ Embeddings + │
│ Ollama LLM │
└───────────────────┘
- Docker & Docker Compose
- Python 3.10+ (via Miniconda/Anaconda)
- Git
- PyTorch 2.6+ (Required for security fix with latest transformers)
- CUDA-compatible GPU (optional, for faster embeddings)
git clone https://github.com/amornpan/Generic-RAG.git
cd Generic-RAGFor Windows:
- Download installer from: https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe
- Install and check "Add Miniconda3 to my PATH environment variable"
- Open new Command Prompt or PowerShell
For Linux/macOS:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
# Accept ToS for the main Anaconda channels
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r# Create conda environment
conda create -n generic_rag_env python=3.10 -y
conda activate generic_rag_env
# if error
#conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/msys2
# conda init
# conda create -n generic_rag_env python=3.10 -c conda-forge -y
# Install PyTorch 2.6+ first (IMPORTANT: Required for security fix)
# For CPU version:
#pip install torch>=2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# For GPU version (CUDA 11.8):
# pip install torch>=2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install other dependencies
pip install -r requirements.txtMake sure Docker Desktop is running on your system.
For Windows (PowerShell/Command Prompt):
docker run -d --name opensearch-node -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "bootstrap.memory_lock=true" -e "OPENSEARCH_JAVA_OPTS=-Xms1g -Xmx1g" -e "DISABLE_INSTALL_DEMO_CONFIG=true" -e "DISABLE_SECURITY_PLUGIN=true" opensearchproject/opensearch:2.11.1For Linux/macOS:
docker run -d --name opensearch-node -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "bootstrap.memory_lock=true" -e "OPENSEARCH_JAVA_OPTS=-Xms1g -Xmx1g" -e "DISABLE_INSTALL_DEMO_CONFIG=true" -e "DISABLE_SECURITY_PLUGIN=true" opensearchproject/opensearch:2.11.1Wait for OpenSearch to start (30-60 seconds), then:
For Windows (PowerShell):
Invoke-RestMethod -Uri "http://localhost:9200/_search/pipeline/hybrid-search-pipeline" `
-Method PUT `
-ContentType "application/json" `
-Body '{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {"technique": "min_max"},
"combination": {
"technique": "arithmetic_mean",
"parameters": {"weights": [0.3, 0.7]}
}
}
}
]
}'For Linux/macOS:
curl -X PUT "localhost:9200/_search/pipeline/hybrid-search-pipeline" \
-H "Content-Type: application/json" \
-d '{
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {"technique": "min_max"},
"combination": {
"technique": "arithmetic_mean",
"parameters": {"weights": [0.3, 0.7]}
}
}
}
]
}'- Download and install Ollama from: https://ollama.ai/download
- For mac/linux
curl -fsSL https://ollama.com/install.sh | sh - Start Ollama service:
ollama list
- Pull required model:
ollama pull qwen2.5:7b
# Create vector index with HuggingFace embeddings
python embedding.pyThis will:
- Load markdown documents from
md_corpus/directory - Create embeddings using BAAI/bge-m3 model
- Store vectors in OpenSearch
- Save index to
md_index.pkl
# Terminal 1: Start API server
python api.py
# Terminal 2: Start Streamlit UI
streamlit run app.py
streamlit run app.py --server.address 0.0.0.0 --server.port 8501- Web UI: http://localhost:8501
- API Docs: http://localhost:9000/docs
- OpenSearch: http://localhost:9200
The UI will show the status of all services (API, Ollama, OpenSearch) at the top.
Generic-RAG/
├── README.md # This file
├── requirements.txt # Python dependencies
├── .env # Environment variables (optional)
│
├── embedding.py # Data indexing with HuggingFace embeddings
├── api.py # FastAPI backend with HuggingFace embeddings
├── app.py # Streamlit frontend
│
├── md_corpus/ # Knowledge base (Markdown files)
│ ├── 1.md # German measles (หัดเยอรมัน)
│ ├── 2.md # Cholera (อหิวาตกโรค)
│ ├── 44.md # Cataract (ต้อกระจก)
│ └── 5555.md # GERD (กรดไหลย้อน)
│
└── md_index.pkl # Saved index (created after running embedding.py)
Create a .env file (optional) to override defaults:
# OpenSearch Configuration
OPENSEARCH_ENDPOINT=http://localhost:9200
OPENSEARCH_INDEX=dg_md_index
# API Configuration
API_HOST=0.0.0.0
API_PORT=9000| Component | Model | Purpose | Notes |
|---|---|---|---|
| Embeddings | BAAI/bge-m3 |
Convert text to vectors | Downloaded automatically (~2GB) |
| LLM | qwen2.5:7b |
Generate answers | Must be pulled via Ollama |
The system includes Thai medical information covering:
- หัดเยอรมัน (German Measles/Rubella) - Symptoms, causes, treatment
- อหิวาตกโรค (Cholera) - Bacterial infection causing severe diarrhea
- ต้อกระจก (Cataract) - Eye condition common in elderly
- กรดไหลย้อน (GERD) - Gastroesophageal reflux disease
- Place markdown files in
md_corpus/directory - Delete old index:
curl -X DELETE "localhost:9200/dg_md_index" - Run
python embedding.pyto reindex - Restart the API server
embedding_model_name = 'BAAI/bge-m3' # Current model
# Can change to other HuggingFace models like:
# embedding_model_name = 'intfloat/multilingual-e5-large'llm_model = "qwen2.5:7b" # Current model
# Can change to:
# llm_model = "qwen2.5:7b" # Better quality
# llm_model = "llama2:13b" # AlternativeRemember to pull new Ollama models:
ollama pull qwen2.5:7b# Test API health
curl http://localhost:9000/health
# Test search endpoint
curl -X POST "http://localhost:9000/search" \
-H "Content-Type: application/json" \
-d '{"query": "อาการของโรคหัดเยอรมัน"}'# View all indices
curl -X GET "localhost:9200/_cat/indices?v"
# View index details
curl -X GET "localhost:9200/dg_md_index?pretty"
# Count documents in index
curl -X GET "localhost:9200/dg_md_index/_count?pretty"
# View sample documents
curl -X GET "localhost:9200/dg_md_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 5,
"query": {
"match_all": {}
}
}'-
"No module named 'xxx'" Error
pip install -r requirements.txt
-
PyTorch Security Error (torch.load vulnerability)
# This error occurs with transformers 4.37+ and PyTorch < 2.6 # Solution: Upgrade PyTorch to 2.6+ pip install torch>=2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu # Alternative: Downgrade transformers # pip install transformers==4.36.0 tokenizers==0.15.0
-
OpenSearch connection failed
# Check if OpenSearch is running docker ps # Check OpenSearch health curl http://localhost:9200/_cluster/health
-
Ollama not responding
# Check installed models ollama list -
No search results
# Check document count curl -X GET "localhost:9200/dg_md_index/_count" # If 0, rerun embedding.py python embedding.py
-
GPU not detected
# Check PyTorch GPU support python -c "import torch; print(torch.cuda.is_available())" # If False, install GPU version of PyTorch pip install torch>=2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
| Component | Minimum | Recommended | Notes |
|---|---|---|---|
| RAM | 8GB | 16GB+ | HuggingFace models need memory |
| CPU | 4 cores | 8+ cores | - |
| Storage | 10GB | 50GB+ | For models and data |
| GPU | None | CUDA 11.8+ | Speeds up embeddings |
- Use GPU: Significantly faster for embeddings
- Batch Processing: embedding.py processes in batches automatically
- Adjust Chunk Size: In embedding.py, modify
chunk_size=1024 - Use Larger LLM: For better answers, use
qwen2.5:7bor larger
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is licensed under the MIT License.
- LlamaIndex - RAG framework
- OpenSearch - Vector database
- Ollama - Local LLM runtime
- HuggingFace - Embedding models
- Streamlit - Web interface
- FastAPI - API framework
Made with ❤️ for the AI community