An integrated speech recognition and conversational AI system that combines OpenAI Whisper for transcription with SmolLM for intelligent response generation.
π€ Speech Recognition: High-quality transcription using OpenAI Whisper π€ Conversational AI: Natural response generation using SmolLM-360M π¬ Context Awareness: Maintains conversation history and context π Analysis: Sentiment analysis and topic detection π Multi-language: Support for multiple languages π― Flexible Interface: Both web UI and CLI options
- Clone the repository:
git clone <repository-url>
cd Sauron-ASR- Install dependencies:
pip install -r requirements.txt- Check installation:
python run_integrated_system.py --check-depsLaunch the interactive web interface:
python run_integrated_system.pyThen open your browser to http://localhost:7860
Process a single audio file:
python run_integrated_system.py --cli --audio examples/sample.wavWith translation:
python run_integrated_system.py --cli --audio examples/sample.wav --task translate --language spanishRun system tests:
python run_integrated_system.py --testAudio Input β Whisper ASR β Transcript β SmolLM β Response
β β
Analysis ββ Context Memory ββ History
-
ASR Pipeline (
nofile.py,bambara_utils.py)- OpenAI Whisper for speech recognition
- Support for multiple languages including Bambara
- Audio preprocessing and resampling
-
Conversation Agent (
conversation_agent.py)- SmolLM-360M for response generation
- Context analysis and sentiment detection
- Conversation memory management
-
Integration Layer (
integrated_asr_chat.py)- Combines ASR and conversation components
- Gradio web interface
- Real-time processing pipeline
-
CLI Interface (
run_integrated_system.py)- Command-line access
- Batch processing capabilities
- System testing and validation
- ASR Model:
openai/whisper-small(configurable) - LLM Model:
HuggingFaceTB/SmolLM-360M-Instruct - Device: Auto-detected (CUDA/CPU)
- History Length: 5 exchanges (configurable)
- Response Length: Max 200 characters
- Temperature: 0.7 for balanced creativity
- English
- Spanish
- French
- Italian
- Portuguese
- Russian
- Bambara (custom support)
from conversation_agent import ConversationAgent
agent = ConversationAgent()
response = agent.generate_response("Hello, how are you?")
analysis = agent.analyze_transcript("I'm feeling great today!")
summary = agent.get_conversation_summary()generate_response(transcript, context=None): Generate conversational responseanalyze_transcript(transcript): Analyze text for sentiment and topicsget_conversation_summary(): Get conversation statisticsclear_history(): Reset conversation memory
agent = ConversationAgent()
# First interaction
response1 = agent.generate_response("Hi there!")
print(response1) # "Hello! How can I help you today?"
# Follow-up with context
response2 = agent.generate_response("What's the weather like?")
print(response2) # "I don't have access to current weather data, but I'd be happy to help you find a weather service!"from integrated_asr_chat import process_audio_and_respond
result = process_audio_and_respond(
audio="path/to/audio.wav",
task_type="transcribe",
language="english"
)
print(f"Transcript: {result['transcript']}")
print(f"Response: {result['response']}")-
Model Loading Errors
- Ensure sufficient RAM (2GB+ recommended)
- Check internet connection for model downloads
- Try CPU mode if GPU issues occur
-
Audio Processing Issues
- Verify audio file format (WAV recommended)
- Check sample rate (16kHz preferred)
- Ensure ffmpeg is installed
-
Dependencies
- Run
python run_integrated_system.py --check-deps - Reinstall with
pip install -r requirements.txt --force-reinstall
- Run
- Use GPU for faster processing
- Limit conversation history for memory efficiency
- Use smaller audio files for real-time processing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
GNU General Public License v3.0 - see LICENSE file for details.
- OpenAI Whisper for speech recognition
- Hugging Face for SmolLM and transformers
- Gradio for the web interface
- The open-source community for various tools and libraries