English (default) | 简体中文
This repo started as a single Bark voice cloning project and has evolved into a collection of cutting-edge TTS / voice cloning / voice conversion training & inference scripts (UI + Colab notebooks).
It is a practical toolbox focused on:
- A ready-to-run Gradio Web UI for Bark voice cloning + TTS + voice conversion.
- A separate Sambert UI workflow for Chinese (and bilingual) personal voice cloning with data labeling → training → inference.
- A curated set of Colab/Jupyter notebooks covering multiple cutting-edge TTS / VC pipelines (GPT-SoVITS, XTTS, VALL-E X, F5‑TTS, CosyVoice, OpenAI TTS + VC, etc.).
- Bark Web UI:
app.py- Tabs: Clone Voice (create
.npzprompt), TTS, Voice Conversion - Uses:
cloning/clonevoice.py,swap_voice.py,bark/,util/,training/
- Tabs: Clone Voice (create
- Sambert Web UI:
sambert-ui/app.py(local),sambert-ui/app_colab.py(Colab-friendly) - Bark training utilities (experimental):
training/training_prepare.py,training/train.py,training/data.py
- Python 3.10+ recommended
- GPU recommended (CPU works but is slow)
pip install -r requirements.txtpython app.py- On first run, Bark checkpoints are downloaded into
./models/(seebark/generation.py). - HuBERT + tokenizer for voice cloning are downloaded into
./models/hubert/(seebark/hubert/hubert_manager.py). - Generated audio files are written to
outputs/by default (configurable viaconfig.yaml→output_folder_path).
The Bark UI’s “Create Voice” feature writes a .npz prompt file. The default path in app.py is set for Colab (/content/...).
If you run locally, you may need to update that destination path to a valid path on your machine (e.g. inside bark/assets/prompts/).
Sambert UI provides a full pipeline: auto labeling → training → inference.
cd sambert-ui
pip install -r requirements.txt
python app.pyMore details: sambert-ui/README.md
- TTS (text → audio):
- Core API:
bark/api.py(generate_with_settings,semantic_to_waveform) - UI wrapper:
app.py(generate_text_to_speech)
- Core API:
- Voice cloning (audio → .npz prompt):
cloning/clonevoice.py(HuBERT + tokenizer + EnCodec → save.npz)
- Voice conversion (audio → new voice):
swap_voice.py(HuBERT tokens + Bark semantic_to_waveform withhistory_prompt)
training/training_prepare.py: generate semantic tokens from text, then synthesize wav pairstraining/train.py: prepare HuBERT-ready features and trigger tokenizer training (callsbark/hubert/customtokenizer.py)training/data.py: text sourcing / filtering helpers
Voice-related notebooks are grouped under:
notebooks/tts/(TTS / voice cloning)notebooks/vc/(voice conversion; any notebook withVCin its filename)
- Bark:
Bark_Voice_Cloning.ipynb,Bark_Coqui.ipynb - Sambert / Chinese voice cloning:
Voice_Cloning_for_Chinese_Speech_v2.ipynb,SambertHifigan.ipynb,Sambert_Voice_Cloning_in_One_Click.ipynb,Sambert_UI.ipynb - GPT-SoVITS:
GPT_SoVITS.ipynb,GPT_SoVITS_2.ipynb,GPT_SoVITS_emo.ipynb,GPT_SoVITS_v2_0808.ipynb,GPT_SoVITS_v3.ipynb,GPT_SoVITS_v3_03_30.ipynb,GPT_SoVITS_v4.ipynb - XTTS:
XTTS_Colab.ipynb - VALL‑E X:
VALL_E_X.ipynb - F5‑TTS:
F5_TTS.ipynb,F5_TTS_Training.ipynb - CosyVoice:
CosyVoice.ipynb,CosyVoice2.ipynb - Other:
OpenVoice.ipynb,Seamless_Meta.ipynb
- KNN‑VC:
KNN_VC.ipynb - NeuCoSVC:
NeuCoSVC.ipynb,NeuCoSVC_v2_先享版.ipynb - OpenAI TTS + VC:
OpenAI_TTS_KNN_VC.ipynb,OpenAI_TTS_KNN_VC_en.ipynb,OpenAI_TTS_RVC.ipynb
.
├── app.py # Bark Gradio UI (voice cloning / TTS / voice conversion)
├── bark/ # Bark core + HuBERT utilities
├── cloning/ # Voice cloning (audio -> .npz prompt)
├── training/ # Experimental training utilities
├── swap_voice.py # Voice conversion helper
├── util/ # Settings + SSML/text helpers
├── config.yaml # UI + output configuration
├── sambert-ui/ # Sambert UI (label/train/infer)
└── notebooks/
├── tts/ # TTS / voice cloning notebooks
├── vc/ # Voice conversion notebooks (filenames contain "VC")
└── ... # Other notebooks (LLM/agent/video/etc.)
This repository is intended for research and learning. Please comply with local laws and obtain proper consent before cloning or converting any voice.
10/19/2023: Fixed
ERROR: Exception in ASGI applicationby specifyinggradio==3.33.0andgradio_client==0.2.7in requirements.txt.
11/08/2023: Integrated KNN-VC into OpenAI TTS and created an easy-to-use Gradio interface. Try it here.
02/27/2024: We are thrilled to launch our most powerful AI song cover generator ever with Shanghai Artificial Intelligence Laboratory! Just need to provide the name of a song and our application running on an A100 GPU will handle everything else. Check it out in our website (please click "EN" in the first tab of our website to see the english version)! 💕
Based on bark-gui and bark. Thanks to C0untFloyd.
Quick start: Colab Notebook ⚡
HuggingFace Demo: Bark Voice Cloning 🤗 (Need a GPU)
Demo Video: YouTube Video
If you would like to run the code locally, remember to replace the original path /content/Bark-Voice-Cloning/bark/assets/prompts/file.npz with the path of file.npz in your own computer.
(1) First upload audio for voice cloning and click Create Voice.
(2) Choose the option called "file" in Voice if you'd like to use voice cloning.
(3) Click Generate. Done!
10/26/2023: Integrated labeling, training and inference into an easy-to-use user interface of SambertHifigan. Thanks to wujohns.
We want to point out that Bark is very good at generating English speech but relatively poor at generating Chinese speech. So we'd like to adopt another approach, which is called SambertHifigan, to realizing voice cloning for Chinese speech. Please check out our Colab Notebook for the implementation.
Quick start: Colab Notebook ⚡
HuggingFace demo: Voice Cloning for Chinese Speech 🤗