Sauron-ASR

An integrated speech recognition and conversational AI system that combines OpenAI Whisper for transcription with SmolLM for intelligent response generation.

Features

🎤 Speech Recognition: High-quality transcription using OpenAI Whisper 🤖 Conversational AI: Natural response generation using SmolLM-360M 💬 Context Awareness: Maintains conversation history and context 📊 Analysis: Sentiment analysis and topic detection 🌍 Multi-language: Support for multiple languages 🎯 Flexible Interface: Both web UI and CLI options

Quick Start

Installation

Clone the repository:

git clone <repository-url>
cd Sauron-ASR

Install dependencies:

pip install -r requirements.txt

Check installation:

python run_integrated_system.py --check-deps

Usage

Web Interface (Recommended)

Launch the interactive web interface:

python run_integrated_system.py

Then open your browser to http://localhost:7860

CLI Mode

Process a single audio file:

python run_integrated_system.py --cli --audio examples/sample.wav

With translation:

python run_integrated_system.py --cli --audio examples/sample.wav --task translate --language spanish

Testing

Run system tests:

python run_integrated_system.py --test

System Architecture

Audio Input → Whisper ASR → Transcript → SmolLM → Response
                ↓                          ↑
            Analysis ←→ Context Memory ←→ History

Components

ASR Pipeline (nofile.py, bambara_utils.py)
- OpenAI Whisper for speech recognition
- Support for multiple languages including Bambara
- Audio preprocessing and resampling
Conversation Agent (conversation_agent.py)
- SmolLM-360M for response generation
- Context analysis and sentiment detection
- Conversation memory management
Integration Layer (integrated_asr_chat.py)
- Combines ASR and conversation components
- Gradio web interface
- Real-time processing pipeline
CLI Interface (run_integrated_system.py)
- Command-line access
- Batch processing capabilities
- System testing and validation

Configuration

Model Settings

ASR Model: openai/whisper-small (configurable)
LLM Model: HuggingFaceTB/SmolLM-360M-Instruct
Device: Auto-detected (CUDA/CPU)

Conversation Settings

History Length: 5 exchanges (configurable)
Response Length: Max 200 characters
Temperature: 0.7 for balanced creativity

Supported Languages

English
Spanish
French
Italian
Portuguese
Russian
Bambara (custom support)

API Reference

ConversationAgent

from conversation_agent import ConversationAgent

agent = ConversationAgent()
response = agent.generate_response("Hello, how are you?")
analysis = agent.analyze_transcript("I'm feeling great today!")
summary = agent.get_conversation_summary()

Key Methods

generate_response(transcript, context=None): Generate conversational response
analyze_transcript(transcript): Analyze text for sentiment and topics
get_conversation_summary(): Get conversation statistics
clear_history(): Reset conversation memory

Examples

Basic Conversation

agent = ConversationAgent()

# First interaction
response1 = agent.generate_response("Hi there!")
print(response1)  # "Hello! How can I help you today?"

# Follow-up with context
response2 = agent.generate_response("What's the weather like?")
print(response2)  # "I don't have access to current weather data, but I'd be happy to help you find a weather service!"

Audio Processing

from integrated_asr_chat import process_audio_and_respond

result = process_audio_and_respond(
    audio="path/to/audio.wav",
    task_type="transcribe",
    language="english"
)

print(f"Transcript: {result['transcript']}")
print(f"Response: {result['response']}")

Troubleshooting

Common Issues

Model Loading Errors
- Ensure sufficient RAM (2GB+ recommended)
- Check internet connection for model downloads
- Try CPU mode if GPU issues occur
Audio Processing Issues
- Verify audio file format (WAV recommended)
- Check sample rate (16kHz preferred)
- Ensure ffmpeg is installed
Dependencies
- Run python run_integrated_system.py --check-deps
- Reinstall with pip install -r requirements.txt --force-reinstall

Performance Tips

Use GPU for faster processing
Limit conversation history for memory efficiency
Use smaller audio files for real-time processing

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

License

GNU General Public License v3.0 - see LICENSE file for details.

Acknowledgments

OpenAI Whisper for speech recognition
Hugging Face for SmolLM and transformers
Gradio for the web interface
The open-source community for various tools and libraries

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bambara_utils.py		bambara_utils.py
conversation_agent.py		conversation_agent.py
integrated_asr_chat.py		integrated_asr_chat.py
nofile.py		nofile.py
packages.txt		packages.txt
requirements.txt		requirements.txt
run_integrated_system.py		run_integrated_system.py
starter-fabri.bat		starter-fabri.bat
test_integration.py		test_integration.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sauron-ASR

Features

Quick Start

Installation

Usage

Web Interface (Recommended)

CLI Mode

Testing

System Architecture

Components

Configuration

Model Settings

Conversation Settings

Supported Languages

API Reference

ConversationAgent

Key Methods

Examples

Basic Conversation

Audio Processing

Troubleshooting

Common Issues

Performance Tips

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

Fabriwin0/Sauron-ASR

Folders and files

Latest commit

History

Repository files navigation

Sauron-ASR

Features

Quick Start

Installation

Usage

Web Interface (Recommended)

CLI Mode

Testing

System Architecture

Components

Configuration

Model Settings

Conversation Settings

Supported Languages

API Reference

ConversationAgent

Key Methods

Examples

Basic Conversation

Audio Processing

Troubleshooting

Common Issues

Performance Tips

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages