A Streamlit-based chatbot that combines document retrieval with language generation capabilities. This chatbot can translate input, retrieve relevant documents from a corpus, and generate contextual responses.
- Multi-language Support: Automatically translates user queries using Google Translator
- Semantic Search: Retrieves relevant documents using sentence embeddings and FAISS
- Conversational Memory: Maintains conversation history for context-aware responses
- LLM-Powered Responses: Generates human-like responses using Llama 3.1 (8B parameter model)
- Interactive Web Interface: Built with Streamlit for easy interaction
The chatbot consists of three main components:
- Retriever: Uses
SentenceTransformerwith the 'all-MiniLM-L6-v2' model to create embeddings and FAISS for efficient similarity search - Generator: Implements Meta's Llama 3.1 (8B) model with 4-bit quantization for memory efficiency
- Translator: Incorporates Google Translator for handling non-English queries
- Python 3.8+
- PyTorch
- CUDA-compatible GPU (recommended)
- Hugging Face account for model access
# Clone the repository
git clone https://github.com/yourusername/conversational-retrieval-chatbot.git
cd conversational-retrieval-chatbot
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Log in to Hugging Face (required to access Llama 3.1)
huggingface-cli loginCreate a requirements.txt file with the following dependencies:
streamlit
faiss-cpu # Use faiss-gpu if using GPU
torch
sentence-transformers
transformers
deep-translator
huggingface-hub
accelerate
bitsandbytes
-
Run the Streamlit app:
streamlit run main.py
-
Access the app in your browser (typically at http://localhost:8501)
-
Enter your corpus in the sidebar (one document per line)
-
Start chatting in the main interface
To use a different sentence embedding model, modify:
self.retriever_model = SentenceTransformer('your-preferred-model')Modify the generation settings in the generate_answer method:
outputs = self.generator_model.generate(
**inputs,
max_length=100, # Adjust maximum response length
temperature=0.5, # Control randomness (higher = more random)
top_p=0.9, # Nucleus sampling parameter
num_return_sequences=1, # Number of response candidates
do_sample=False # Set to True for more varied responses
)Change the default translation languages:
def translate_text(self, text, source_lang='your-source-lang', target_lang='your-target-lang'):- Requires significant GPU memory for the Llama 3.1 model, even with quantization
- Performance may vary based on the quality and relevance of the provided corpus
- Translation quality depends on Google Translator's capabilities
- Hugging Face for transformer models
- FAISS for efficient similarity search
- Sentence Transformers for text embeddings
- Streamlit for the web interface
- Meta AI for the Llama 3.1 model
Note, I have writing readme