Skip to content

Conversation

@ShadowSparke
Copy link

Was messing around with chatterbox and thought it would be really nice to have it built in to alltalk, so I went ahead and implemented it. It works pretty well, I've only tested it a little bit but the results are promising. Figured I'd start up a pull request early while I work through the kinks.

BlakeBC and others added 8 commits July 7, 2025 21:16
… support

 MAJOR FEATURE: Complete Chatterbox TTS Integration

##  Core Implementation
-  Full Chatterbox TTS engine implementation in system/tts_engines/chatterbox/
-  Auto-model downloading via ChatterboxTTS.from_pretrained()
-  Cross-platform device support (CUDA/MPS/CPU) with automatic detection
-  Voice cloning via audio prompts from WAV files (no reference text required)
-  Low VRAM support with CPU/GPU memory swapping
-  Robust installation with dependency conflict resolution

##  Advanced Parameter System (5 New Controls)
-  **Seed**: Reproducible generation with cross-library seed setting
  - Supports torch, CUDA, random, and numpy seed synchronization
  - Smart random generation when seed=0 for varied outputs
-  **Temperature**: Neural network sampling randomness control
-  **Repetition Penalty**: Reduce repetitive speech patterns
-  **Exaggeration** (0.0-2.0): Speech expressiveness and emotion control
-  **CFG Weight** (0.0-1.0): Classifier-free guidance strength
-  **Min P** (0.0-1.0): Minimum probability threshold for sampling
-  **Top P** (0.0-1.0): Nucleus sampling parameter control

##  Complete API Integration
-  FastAPI endpoint updates in tts_server.py with new Form parameters
-  Dependency injection functions for default value handling
-  Parameter flow: Gradio  API  Model Engine  Chatterbox
-  Backward compatibility with existing AllTalk API structure
-  OpenAI API compatibility maintained

##  Enhanced Gradio Interface
-  5 new interactive sliders in Advanced TTS Settings
-  Smart capability-based visibility (only shown for Chatterbox)
-  Proper form submission and button handler updates
-  Real-time parameter validation and user feedback

##  Voice File System Integration
-  WAV file scanning in voices/ directory and subdirectories
-  Folder-based voice organization support
-  Audio prompt path resolution for voice cloning
-  Graceful fallback when voice files missing

##  Model Capabilities & Configuration
-  Updated model_settings.json with new parameter capabilities
-  Engine registration in tts_engines.json
-  Proper default values and validation ranges
-  Cross-platform compatibility flags

##  Technical Excellence
-  Comprehensive error handling with HTTP status codes
-  Debug logging with color-coded output
-  Memory management for CUDA environments
-  Generation locks to prevent concurrent conflicts
-  Proper async/await patterns throughout

##  Impact
- **Files Modified**: 7 core files across the AllTalk architecture
- **New Parameters**: 5 advanced generation controls
- **API Endpoints**: Updated with full parameter support
- **UI Components**: Enhanced with capability-aware controls
- **Compatibility**: Maintains full backward compatibility

This implementation transforms AllTalk into a more powerful TTS platform while maintaining its clean architecture and user-friendly interface. Chatterbox users now have professional-grade control over voice generation with industry-standard parameters.
… support

 MAJOR FEATURE: Complete Chatterbox TTS Integration

##  Core Implementation
-  Full Chatterbox TTS engine implementation in system/tts_engines/chatterbox/
-  Auto-model downloading via ChatterboxTTS.from_pretrained()
-  Cross-platform device support (CUDA/MPS/CPU) with automatic detection
-  Voice cloning via audio prompts from WAV files (no reference text required)
-  Low VRAM support with CPU/GPU memory swapping
-  Robust installation with dependency conflict resolution

##  Advanced Parameter System (5 New Controls)
-  **Seed**: Reproducible generation with cross-library seed setting
  - Supports torch, CUDA, random, and numpy seed synchronization
  - Smart random generation when seed=0 for varied outputs
-  **Temperature**: Neural network sampling randomness control
-  **Repetition Penalty**: Reduce repetitive speech patterns
-  **Exaggeration** (0.0-2.0): Speech expressiveness and emotion control
-  **CFG Weight** (0.0-1.0): Classifier-free guidance strength
-  **Min P** (0.0-1.0): Minimum probability threshold for sampling
-  **Top P** (0.0-1.0): Nucleus sampling parameter control

##  Complete API Integration
-  FastAPI endpoint updates in tts_server.py with new Form parameters
-  Dependency injection functions for default value handling
-  Parameter flow: Gradio  API  Model Engine  Chatterbox
-  Backward compatibility with existing AllTalk API structure
-  OpenAI API compatibility maintained

##  Enhanced Gradio Interface
-  5 new interactive sliders in Advanced TTS Settings
-  Smart capability-based visibility (only shown for Chatterbox)
-  Proper form submission and button handler updates
-  Real-time parameter validation and user feedback

##  Voice File System Integration
-  WAV file scanning in voices/ directory and subdirectories
-  Folder-based voice organization support
-  Audio prompt path resolution for voice cloning
-  Graceful fallback when voice files missing

##  Model Capabilities & Configuration
-  Updated model_settings.json with new parameter capabilities
-  Engine registration in tts_engines.json
-  Proper default values and validation ranges
-  Cross-platform compatibility flags

##  Technical Excellence
-  Comprehensive error handling with HTTP status codes
-  Debug logging with color-coded output
-  Memory management for CUDA environments
-  Generation locks to prevent concurrent conflicts
-  Proper async/await patterns throughout

##  Impact
- **Files Modified**: 7 core files across the AllTalk architecture
- **New Parameters**: 5 advanced generation controls
- **API Endpoints**: Updated with full parameter support
- **UI Components**: Enhanced with capability-aware controls
- **Compatibility**: Maintains full backward compatibility

This implementation transforms AllTalk into a more powerful TTS platform while maintaining its clean architecture and user-friendly interface. Chatterbox users now have professional-grade control over voice generation with industry-standard parameters.
… long inputs

- Added `chunk_text` method to split input text into manageable chunks at sentence boundaries, ensuring no overlaps.
- Updated TTS generation process to utilize text chunking, allowing for better handling of long texts (over 500 characters).
- Enhanced debug logging to verify chunking process and character counts for original and generated audio.
- Adjusted audio generation to concatenate chunks before saving, improving output quality and consistency.
@ShadowSparke ShadowSparke deleted the alltalkbeta-chatterbox branch July 21, 2025 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants