-
-
Notifications
You must be signed in to change notification settings - Fork 260
Alltalkbeta - Chatterbox Implementation #619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
ShadowSparke
wants to merge
8
commits into
erew123:alltalkbeta
from
ShadowSparke:alltalkbeta-chatterbox
Closed
Alltalkbeta - Chatterbox Implementation #619
ShadowSparke
wants to merge
8
commits into
erew123:alltalkbeta
from
ShadowSparke:alltalkbeta-chatterbox
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… support MAJOR FEATURE: Complete Chatterbox TTS Integration ## Core Implementation - Full Chatterbox TTS engine implementation in system/tts_engines/chatterbox/ - Auto-model downloading via ChatterboxTTS.from_pretrained() - Cross-platform device support (CUDA/MPS/CPU) with automatic detection - Voice cloning via audio prompts from WAV files (no reference text required) - Low VRAM support with CPU/GPU memory swapping - Robust installation with dependency conflict resolution ## Advanced Parameter System (5 New Controls) - **Seed**: Reproducible generation with cross-library seed setting - Supports torch, CUDA, random, and numpy seed synchronization - Smart random generation when seed=0 for varied outputs - **Temperature**: Neural network sampling randomness control - **Repetition Penalty**: Reduce repetitive speech patterns - **Exaggeration** (0.0-2.0): Speech expressiveness and emotion control - **CFG Weight** (0.0-1.0): Classifier-free guidance strength - **Min P** (0.0-1.0): Minimum probability threshold for sampling - **Top P** (0.0-1.0): Nucleus sampling parameter control ## Complete API Integration - FastAPI endpoint updates in tts_server.py with new Form parameters - Dependency injection functions for default value handling - Parameter flow: Gradio API Model Engine Chatterbox - Backward compatibility with existing AllTalk API structure - OpenAI API compatibility maintained ## Enhanced Gradio Interface - 5 new interactive sliders in Advanced TTS Settings - Smart capability-based visibility (only shown for Chatterbox) - Proper form submission and button handler updates - Real-time parameter validation and user feedback ## Voice File System Integration - WAV file scanning in voices/ directory and subdirectories - Folder-based voice organization support - Audio prompt path resolution for voice cloning - Graceful fallback when voice files missing ## Model Capabilities & Configuration - Updated model_settings.json with new parameter capabilities - Engine registration in tts_engines.json - Proper default values and validation ranges - Cross-platform compatibility flags ## Technical Excellence - Comprehensive error handling with HTTP status codes - Debug logging with color-coded output - Memory management for CUDA environments - Generation locks to prevent concurrent conflicts - Proper async/await patterns throughout ## Impact - **Files Modified**: 7 core files across the AllTalk architecture - **New Parameters**: 5 advanced generation controls - **API Endpoints**: Updated with full parameter support - **UI Components**: Enhanced with capability-aware controls - **Compatibility**: Maintains full backward compatibility This implementation transforms AllTalk into a more powerful TTS platform while maintaining its clean architecture and user-friendly interface. Chatterbox users now have professional-grade control over voice generation with industry-standard parameters.
… support MAJOR FEATURE: Complete Chatterbox TTS Integration ## Core Implementation - Full Chatterbox TTS engine implementation in system/tts_engines/chatterbox/ - Auto-model downloading via ChatterboxTTS.from_pretrained() - Cross-platform device support (CUDA/MPS/CPU) with automatic detection - Voice cloning via audio prompts from WAV files (no reference text required) - Low VRAM support with CPU/GPU memory swapping - Robust installation with dependency conflict resolution ## Advanced Parameter System (5 New Controls) - **Seed**: Reproducible generation with cross-library seed setting - Supports torch, CUDA, random, and numpy seed synchronization - Smart random generation when seed=0 for varied outputs - **Temperature**: Neural network sampling randomness control - **Repetition Penalty**: Reduce repetitive speech patterns - **Exaggeration** (0.0-2.0): Speech expressiveness and emotion control - **CFG Weight** (0.0-1.0): Classifier-free guidance strength - **Min P** (0.0-1.0): Minimum probability threshold for sampling - **Top P** (0.0-1.0): Nucleus sampling parameter control ## Complete API Integration - FastAPI endpoint updates in tts_server.py with new Form parameters - Dependency injection functions for default value handling - Parameter flow: Gradio API Model Engine Chatterbox - Backward compatibility with existing AllTalk API structure - OpenAI API compatibility maintained ## Enhanced Gradio Interface - 5 new interactive sliders in Advanced TTS Settings - Smart capability-based visibility (only shown for Chatterbox) - Proper form submission and button handler updates - Real-time parameter validation and user feedback ## Voice File System Integration - WAV file scanning in voices/ directory and subdirectories - Folder-based voice organization support - Audio prompt path resolution for voice cloning - Graceful fallback when voice files missing ## Model Capabilities & Configuration - Updated model_settings.json with new parameter capabilities - Engine registration in tts_engines.json - Proper default values and validation ranges - Cross-platform compatibility flags ## Technical Excellence - Comprehensive error handling with HTTP status codes - Debug logging with color-coded output - Memory management for CUDA environments - Generation locks to prevent concurrent conflicts - Proper async/await patterns throughout ## Impact - **Files Modified**: 7 core files across the AllTalk architecture - **New Parameters**: 5 advanced generation controls - **API Endpoints**: Updated with full parameter support - **UI Components**: Enhanced with capability-aware controls - **Compatibility**: Maintains full backward compatibility This implementation transforms AllTalk into a more powerful TTS platform while maintaining its clean architecture and user-friendly interface. Chatterbox users now have professional-grade control over voice generation with industry-standard parameters.
… long inputs - Added `chunk_text` method to split input text into manageable chunks at sentence boundaries, ensuring no overlaps. - Updated TTS generation process to utilize text chunking, allowing for better handling of long texts (over 500 characters). - Enhanced debug logging to verify chunking process and character counts for original and generated audio. - Adjusted audio generation to concatenate chunks before saving, improving output quality and consistency.
…k_tts into alltalkbeta
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Was messing around with chatterbox and thought it would be really nice to have it built in to alltalk, so I went ahead and implemented it. It works pretty well, I've only tested it a little bit but the results are promising. Figured I'd start up a pull request early while I work through the kinks.