📱 MobiRAG

MobiRAG (Mobile Retrieval-Augmented Generation) is a lightweight, privacy-first Android app that enables users to chat with any PDF file stored on their phone — entirely offline. With on-device embedding generation, vector compression, and SLM inference, MobiRAG brings the power of AI search and summarization directly to your pocket.

No internet, no cloud servers, and no telemetry — everything runs natively on your phone, ensuring complete data privacy and zero leakage. Whether you’re reviewing research papers, legal documents, or ebooks, MobiRAG offers a seamless way to search, ask questions, and summarize content using optimized RAG for mobile devices.

🎥 Demo

MobiRAG_finalcut_cropped.mp4

🔺️ YT Video

✨ Features

Feature	Description
🔐 100% On-Device	No cloud calls. No telemetry. Your data never leaves your phone.
🧠 Embeddings via ONNX	Runs `all-MiniLM-L6-v2` model for fast, good-quality sentence embeddings on phone.
📚 PDF Discovery & Parsing	Detects and processes all PDFs on device using `PDFBox`.
🔎 Semantic Search with FAISS	PQ-compressed embeddings enable scalable vector search on-device.
💬 SLM Chat with Context	On-device Small LM like Qwen 0.5B generates answers grounded in PDF context.
🔁 Hybrid RAG	Combines FAISS vector similarity with TF-IDF keyword overlap.
🖼️ Lightweight UI	Responsive and optimized for phones with minimalistic design.

🚀 How it Works

1. PDF Indexing

Extracts text from each page using PDFBox
Splits into clean sentence-based units
Combines sentences into chunks of ~1024 characters with 1-sentence overlap
Encodes all chunks using ONNX all-MiniLM-L6-v2 model
Compresses embeddings using FAISS Product Quantization (PQ)
Stores only metadata (PDF URI, page, offset), not the chunk text itself

2. Query Execution

User query is embedded using ONNX all-MiniLM-L6-v2 model
FAISS returns top-k chunk IDs based on PQ-compressed vector similarity (nearest neighbours appraoch)
Matching metadata is used to extract chunk text on the fly
TF-IDF keyword overlap further refines relevance
Combined context is used to prompt a local SLM (Qwen) to generate the answer

This reduces memory footprint and ensures that deleted PDFs cannot be queried later (privacy by design).

🛠️ Technical Highlights

✨ Efficient Compression: FAISS PQ compresses 384-dim vectors at 32x, enabling 1000 PDFs to fit in ~2.4MB (upto 97x can be obtained with a compromize on performance).
✨ FAISS Tradeoff: Slight drop in search quality from PQ vs flat index — but acceptable for mobile efficiency.
✨ Metadata-Only Storage: Chunks are not stored — only chunkID + PDF location + offset are. Text is extracted on-demand.
✨ Privacy by Design: If a PDF is deleted from storage, its chunks become inaccessible.

🚨 Requirements

Android 8.0+
ARM64 device (with >=4GB RAM recommended)
LLM file: qwen2.5-0.5b-instruct-q4_k_m.gguf

📆 Getting Started

🔧 Setup

git clone https://github.com/nishchaljs/MobiRAG.git
cd MobiRAG
git submodule update --init --recursive

📲 Build Instructions

Open in Android Studio
Add your embedding ONNX + tokenizer to assets/all-minilm-l6-v2/
Place your .gguf LLM file inside Android/data/com.mobirag/files/

📦 Key Dependencies

✅ TODO

Refactor to MVC architecture
Reorganize app code into clean Model-View-Controller separation for maintainability and testing
Improve SLM Inference speed
llama.cpp uses android cpu - tends to be very slow for long prompts. Need to explore alternatives like mlc llm to utilize GPUs for inference on android
Improve hybrid RAG scoring
Replace naive combination of vector similarity + keyword overlap with a unified scoring function (e.g., z-score normalization, weighted distances)
Optimize FAISS PQ & SLM inference
Experiment with different PQ training sizes, nprobe settings, and SLM decoding strategies for best quality-performance tradeoff
Improve system prompt
Design a robust, guardrailed prompt template that guides the SLM to avoid hallucinations and respect query constraints

⚡ Credits

🔗 Sentence-Embeddings-Android
🔗 android-faiss

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
app		app
assets		assets
gradle/wrapper		gradle/wrapper
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
google623fc075ee01fede.html		google623fc075ee01fede.html
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📱 MobiRAG

🎥 Demo

✨ Features

🚀 How it Works

1. PDF Indexing

2. Query Execution

🛠️ Technical Highlights

🚨 Requirements

📆 Getting Started

🔧 Setup

📲 Build Instructions

📦 Key Dependencies

✅ TODO

⚡ Credits

📄 License

About

Uh oh!

Releases

Packages

Languages

License

geenet/MobiRAG

Folders and files

Latest commit

History

Repository files navigation

📱 MobiRAG

🎥 Demo

✨ Features

🚀 How it Works

1. PDF Indexing

2. Query Execution

🛠️ Technical Highlights

🚨 Requirements

📆 Getting Started

🔧 Setup

📲 Build Instructions

📦 Key Dependencies

✅ TODO

⚡ Credits

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages