🤖 Chat with PDF locally using Ollama + LangChain

A powerful local RAG (Retrieval Augmented Generation) application that lets you chat with your PDF documents using Ollama and LangChain. This project includes both a Jupyter notebook for experimentation and a Streamlit web interface for easy interaction. Shout out to Tony Kipkemboi!

📂 Project File Structure

ollama_pdf_rag/
├── src/                      # Source code
│   ├── app/                  # Streamlit application
│   │   ├── components/       # UI components
│   │   │   ├── chat.py       # Chat interface
│   │   │   ├── pdf_viewer.py # PDF display
│   │   │   └── sidebar.py    # Sidebar controls
│   │   └── main.py           # Main app
│   └── core/                 # Core functionality
│       ├── document.py       # Document processing
│       ├── embeddings.py     # Vector embeddings
│       ├── llm.py            # LLM setup
│       └── rag.py            # RAG pipeline
├── data/                     # Data storage
│   ├── pdfs/                 # PDF storage
│   │   └── sample/           # Sample PDFs
│   └── vectors/              # Vector DB storage
├── notebooks/                # Jupyter notebooks
│   └── experiments/          # Experimental notebooks
├── tests/                    # Unit tests
├── docs/                     # Documentation
└── run.py                    # Application runner

✨ Features

🔒 Fully local processing - no data leaves your machine
📄 PDF processing with intelligent chunking
🧠 Multi-query retrieval for better context understanding
🎯 Advanced RAG implementation using LangChain
🖥️ Clean Streamlit interface
📓 Jupyter notebook for experimentation

🚀 Getting Started

Prerequisites

Install Ollama

Visit Ollama's website to download and install

Pull required models:

ollama pull llama3.2  # or your preferred model
ollama pull nomic-embed-text

Clone Repository

git clone https://github.com/aghoshpro/OllamaRAG.git
cd ollama_pdf_rag

Set Up Environment

python -m venv myvenv

# On Windows
.\myvenv\Scripts\activate

**OR**

# On Linux or Mac
 source myvenv/bin/activate

pip install -r requirements.txt

🎮 Run the App

Option 1: Streamlit Interface

python run.py

Then open your browser to local url http://localhost:8501

Streamlit interface showing ChatPDFx StreamLit App

Option 2: Jupyter Notebook

jupyter notebook

Open updated_rag_notebook.ipynb to experiment with the code

💡 Usage Tips

Upload PDF: Use the file uploader in the Streamlit interface or try the sample PDF
Select Model: Choose from your locally available Ollama models
Ask Questions: Start chatting with your PDF through the chat interface
Adjust Display: Use the zoom slider to adjust PDF visibility
Clean Up: Use the "Delete Collection" button when switching documents

🛠️ Troubleshooting

Ensure Ollama is running in the background
Check that required models are downloaded
Verify Python environment is activated
For Windows users, ensure WSL2 is properly configured if using Ollama

⚠️ Common Errors

🟡 ONNX DLL Error

DLL load failed while importing onnx_copy2py_export: a dynamic link Library (DLL) initialization routine failed.

Try these solutions:

Install Microsoft Visual C++ Redistributable:
- Download and install both x64 and x86 versions from Microsoft's official website
- Restart your computer after installation

If the error persists, try installing ONNX Runtime manually:

pip uninstall onnxruntime onnxruntime-gpu
pip install onnxruntime

CPU-Only Systems

If you're running on a CPU-only system:

Ensure you have the CPU version of ONNX Runtime:

pip uninstall onnxruntime-gpu  # Remove GPU version if installed
pip install onnxruntime  # Install CPU-only version

You may need to modify the chunk size in the code to prevent memory issues:
- Reduce chunk_size to 500-1000 if you experience memory problems
- Increase chunk_overlap for better context preservation

Note: The application will run slower on CPU-only systems, but it will still work effectively.

🟡 TesseractNotFoundError

TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.

Try this solution from stackoverflow:

Install tesseract using windows installer available at: https://github.com/UB-Mannheim/tesseract/wiki
Note the tesseract path from the installation C:\Program Files\Tesseract-OCR add it to system environmental variables path.
- Restart the system to adapt the variable effect
- Activate the myenv and run the app again.
pip install pytesseract [OPTIONAL]

Set the tesseract path in the script before calling image_to_string [OPTIONAL]

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\USER\AppData\Local\Tesseract-OCR\tesseract.exe'

🟡 Lookup Error with NLTK

Lookup error ... nltk.download('averaged_perceptron_tagger_eng') is not found

Add the following to main.py and run again

import nltk
nltk.download('averaged_perceptron_tagger_eng')

🧪 Testing

Running Tests

# Run all tests
python -m unittest discover tests

# Run tests verbosely
python -m unittest discover tests -v

Pre-commit Hooks

The project uses pre-commit hooks to ensure code quality. To set up:

pip install pre-commit
pre-commit install

This will:

Run tests before each commit
Run linting checks
Ensure code quality standards are met

Continuous Integration

The project uses GitHub Actions for CI. On every push and pull request:

Tests are run on multiple Python versions (3.9, 3.10, 3.11)
Dependencies are installed
Ollama models are pulled
Test results are uploaded as artifacts

🤝 Contributing

Feel free to:

Open issues for bugs or suggestions
Submit pull requests
Comment on the YouTube video for questions
Star the repository if you find it useful!

📝 License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github		.github
data/pdfs/sample		data/pdfs/sample
docs		docs
notebooks/experiments		notebooks/experiments
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
ChatPDFx.gif		ChatPDFx.gif
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt
run.py		run.py
st_app_ui.png		st_app_ui.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 Chat with PDF locally using Ollama + LangChain

📂 Project File Structure

✨ Features

🚀 Getting Started

Prerequisites

🎮 Run the App

Option 1: Streamlit Interface

Option 2: Jupyter Notebook

💡 Usage Tips

🛠️ Troubleshooting

⚠️ Common Errors

🟡 ONNX DLL Error

CPU-Only Systems

🟡 TesseractNotFoundError

🟡 Lookup Error with NLTK

🧪 Testing

Running Tests

Pre-commit Hooks

Continuous Integration

🤝 Contributing

📝 License

About

Uh oh!

Releases

Packages

Languages

License

aghoshpro/OllamaRAG

Folders and files

Latest commit

History

Repository files navigation

🤖 Chat with PDF locally using Ollama + LangChain

📂 Project File Structure

✨ Features

🚀 Getting Started

Prerequisites

🎮 Run the App

Option 1: Streamlit Interface

Option 2: Jupyter Notebook

💡 Usage Tips

🛠️ Troubleshooting

⚠️ Common Errors

🟡 ONNX DLL Error

CPU-Only Systems

🟡 TesseractNotFoundError

🟡 Lookup Error with NLTK

🧪 Testing

Running Tests

Pre-commit Hooks

Continuous Integration

🤝 Contributing

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages