A sophisticated book recommendation system that combines the power of AI, vector similarity search, and natural language processing to provide personalized book recommendations.
- AI-Powered Recommendations: Uses OpenAI's language models to provide intelligent, contextual book recommendations
- Semantic Search: Leverages HuggingFace embeddings and ChromaDB for similarity-based book discovery
- Book Comparison: Compare two books with AI-generated insights
- Fast API: Built with FastAPI for high-performance API endpoints
- Vector Database: ChromaDB for efficient similarity search and retrieval
BookLM.mp4
The system consists of several key components:
- Data Layer: SQLite database storing dataset
- Vector Database: ChromaDB for semantic similarity search
- AI Layer: OpenAI LLM for intelligent recommendations
- API Layer: FastAPI serving REST endpoints
- Frontend: Static HTML/CSS/JS interface
Tools Used:
- OpenAI for providing the language models
- HuggingFace for embedding models
- ChromaDB for vector database functionality
- FastAPI for the web framework
- LangChain for AI/ML orchestration
- Python 3.8 or higher
- OpenAI API key
- Sufficient disk space for book embeddings
git clone https://github.com/mehrdad-dev/BookLM.git
cd BookLMpip install -r requirements.txtCreate a .env file in the project root with the following variables:
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_BASE=your_openai_base_url_here (if needed)
LLM_MODEL=gemma-3-1b-it
# Embedding Model
EMBEDDING_MODEL_NAME=sentence-transformers/all-MiniLM-L6-v2
# Database Configuration
CSV_PATH=dataset/Best_books_ever[Cleaned].csv
DB_PATH=books.db
ROWS_LIMIT=100
# Vector Database
INDEX_PATH=chroma_books_indexThe original dataset I used for this project: https://github.com/scostap/goodreads_bbe_dataset
You can find a cleaned version of this dataset in the dataset/ folder.
Ensure you have the book dataset in the dataset/ folder. The system expects a CSV file with the following columns:
bookId: Unique book identifiertitle: Book titleauthor: Book authorrating: Book ratingdescription: Book descriptiongenres: Book genrescharacters: Book characterscoverImg: Book cover image URL
uvicorn main:app --reloadOn the first run, the system will:
- Load book data from CSV into SQLite database
- Create embeddings for book descriptions using HuggingFace
- Store embeddings in ChromaDB for similarity search
- Start the web server
This process may take a few minutes depending on the dataset size.
-
Book Recommendations:
- Navigate to the "Recommendation" tab
- Enter your book preferences (e.g., "I want a fantasy book about magical worlds")
- Get AI-powered recommendations with reasoning
-
Book Comparison:
- Navigate to the "Compare" tab
- Search book titles
- Select two books
- Get AI-generated comparison insights
BookLM/
βββ main.py
βββ requirements.txt
βββ README.md
βββ books.db # SQLite database (auto-generated)
βββ chroma_books_index/ # ChromaDB vector database (auto-generated)
βββ books_1.Best_Books_Ever.csv
βββ dataset/
β βββ Best_books_ever[Cleaned].csv
β βββ dataset.ipynb
βββ static/
βββ index.html
OPENAI_API_KEY: Your OpenAI API keyOPENAI_API_BASE: Your OpenAI Base URLLLM_MODEL: OpenAI model to use (I used: gemma-3-1b-it)EMBEDDING_MODEL_NAME: HuggingFace embedding modelCSV_PATH: Path to your book dataset CSVDB_PATH: SQLite database file pathROWS_LIMIT: Number of books to process (for testing)INDEX_PATH: ChromaDB index directory
- ROWS_LIMIT: Reduce for faster initial setup, increase for more comprehensive recommendations
- Chunk Size: Modify
chunk_sizeinprepare_documents()for different embedding granularity - Similarity Search: Adjust
kparameter inquery_db()for more/fewer recommendations
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License
Happy Reading! πβ¨