GitHub - KevinWang676/Bark-Voice-Cloning: Bark Voice Cloning and Voice Cloning for Chinese Speech

Bark Voice Cloning + Multi‑Model TTS / Voice Cloning / Voice Conversion (UI + Notebooks)

English (default) | 简体中文

Introduction

This repo started as a single Bark voice cloning project and has evolved into a collection of cutting-edge TTS / voice cloning / voice conversion training & inference scripts (UI + Colab notebooks).

It is a practical toolbox focused on:

A ready-to-run Gradio Web UI for Bark voice cloning + TTS + voice conversion.
A separate Sambert UI workflow for Chinese (and bilingual) personal voice cloning with data labeling → training → inference.
A curated set of Colab/Jupyter notebooks covering multiple cutting-edge TTS / VC pipelines (GPT-SoVITS, XTTS, VALL-E X, F5‑TTS, CosyVoice, OpenAI TTS + VC, etc.).

What's inside (Key entrypoints)

Bark Web UI: app.py
- Tabs: Clone Voice (create .npz prompt), TTS, Voice Conversion
- Uses: cloning/clonevoice.py, swap_voice.py, bark/, util/, training/
Sambert Web UI: sambert-ui/app.py (local), sambert-ui/app_colab.py (Colab-friendly)
Bark training utilities (experimental): training/training_prepare.py, training/train.py, training/data.py

Quick Start (Bark UI)

Requirements

Python 3.10+ recommended
GPU recommended (CPU works but is slow)

Install

pip install -r requirements.txt

Run

python app.py

Downloads & outputs

On first run, Bark checkpoints are downloaded into ./models/ (see bark/generation.py).
HuBERT + tokenizer for voice cloning are downloaded into ./models/hubert/ (see bark/hubert/hubert_manager.py).
Generated audio files are written to outputs/ by default (configurable via config.yaml → output_folder_path).

Important note for local runs

The Bark UI’s “Create Voice” feature writes a .npz prompt file. The default path in app.py is set for Colab (/content/...).
If you run locally, you may need to update that destination path to a valid path on your machine (e.g. inside bark/assets/prompts/).

Quick Start (Sambert UI)

Sambert UI provides a full pipeline: auto labeling → training → inference.

cd sambert-ui
pip install -r requirements.txt
python app.py

More details: sambert-ui/README.md

Training & inference scripts (Bark path)

Inference

TTS (text → audio):
- Core API: bark/api.py (generate_with_settings, semantic_to_waveform)
- UI wrapper: app.py (generate_text_to_speech)
Voice cloning (audio → .npz prompt):
- cloning/clonevoice.py (HuBERT + tokenizer + EnCodec → save .npz)
Voice conversion (audio → new voice):
- swap_voice.py (HuBERT tokens + Bark semantic_to_waveform with history_prompt)

Training (experimental)

training/training_prepare.py: generate semantic tokens from text, then synthesize wav pairs
training/train.py: prepare HuBERT-ready features and trigger tokenizer training (calls bark/hubert/customtokenizer.py)
training/data.py: text sourcing / filtering helpers

Notebooks (Colab/Jupyter)

Notebook organization

Voice-related notebooks are grouped under:

notebooks/tts/ (TTS / voice cloning)
notebooks/vc/ (voice conversion; any notebook with VC in its filename)

TTS / Voice cloning notebooks

Bark: Bark_Voice_Cloning.ipynb, Bark_Coqui.ipynb
Sambert / Chinese voice cloning: Voice_Cloning_for_Chinese_Speech_v2.ipynb, SambertHifigan.ipynb, Sambert_Voice_Cloning_in_One_Click.ipynb, Sambert_UI.ipynb
GPT-SoVITS: GPT_SoVITS.ipynb, GPT_SoVITS_2.ipynb, GPT_SoVITS_emo.ipynb, GPT_SoVITS_v2_0808.ipynb, GPT_SoVITS_v3.ipynb, GPT_SoVITS_v3_03_30.ipynb, GPT_SoVITS_v4.ipynb
XTTS: XTTS_Colab.ipynb
VALL‑E X: VALL_E_X.ipynb
F5‑TTS: F5_TTS.ipynb, F5_TTS_Training.ipynb
CosyVoice: CosyVoice.ipynb, CosyVoice2.ipynb
Other: OpenVoice.ipynb, Seamless_Meta.ipynb

Voice conversion (VC) notebooks

KNN‑VC: KNN_VC.ipynb
NeuCoSVC: NeuCoSVC.ipynb, NeuCoSVC_v2_先享版.ipynb
OpenAI TTS + VC: OpenAI_TTS_KNN_VC.ipynb, OpenAI_TTS_KNN_VC_en.ipynb, OpenAI_TTS_RVC.ipynb

Repo layout

.
├── app.py                      # Bark Gradio UI (voice cloning / TTS / voice conversion)
├── bark/                        # Bark core + HuBERT utilities
├── cloning/                     # Voice cloning (audio -> .npz prompt)
├── training/                    # Experimental training utilities
├── swap_voice.py                # Voice conversion helper
├── util/                        # Settings + SSML/text helpers
├── config.yaml                  # UI + output configuration
├── sambert-ui/                  # Sambert UI (label/train/infer)
└── notebooks/
    ├── tts/                     # TTS / voice cloning notebooks
    ├── vc/                      # Voice conversion notebooks (filenames contain "VC")
    └── ...                      # Other notebooks (LLM/agent/video/etc.)

Disclaimer

This repository is intended for research and learning. Please comply with local laws and obtain proper consent before cloning or converting any voice.

Original README

Bark Voice Cloning 🐶 & Voice Cloning for Chinese Speech 🎶

简体中文

1️⃣ Bark Voice Cloning

10/19/2023: Fixed ERROR: Exception in ASGI application by specifying gradio==3.33.0 and gradio_client==0.2.7 in requirements.txt.

11/08/2023: Integrated KNN-VC into OpenAI TTS and created an easy-to-use Gradio interface. Try it here.

02/27/2024: We are thrilled to launch our most powerful AI song cover generator ever with Shanghai Artificial Intelligence Laboratory! Just need to provide the name of a song and our application running on an A100 GPU will handle everything else. Check it out in our website (please click "EN" in the first tab of our website to see the english version)! 💕

Based on bark-gui and bark. Thanks to C0untFloyd.

Quick start: Colab Notebook ⚡

HuggingFace Demo: Bark Voice Cloning 🤗 (Need a GPU)

Demo Video: YouTube Video

If you would like to run the code locally, remember to replace the original path /content/Bark-Voice-Cloning/bark/assets/prompts/file.npz with the path of file.npz in your own computer.

If you like the quick start, please star this repository. ⭐⭐⭐

Easy to use:

(1) First upload audio for voice cloning and click Create Voice.

(2) Choose the option called "file" in Voice if you'd like to use voice cloning.

(3) Click Generate. Done!

2️⃣ Voice Cloning for Chinese Speech

10/26/2023: Integrated labeling, training and inference into an easy-to-use user interface of SambertHifigan. Thanks to wujohns.

We want to point out that Bark is very good at generating English speech but relatively poor at generating Chinese speech. So we'd like to adopt another approach, which is called SambertHifigan, to realizing voice cloning for Chinese speech. Please check out our Colab Notebook for the implementation.

Quick start: Colab Notebook ⚡

HuggingFace demo: Voice Cloning for Chinese Speech 🤗

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bark Voice Cloning + Multi‑Model TTS / Voice Cloning / Voice Conversion (UI + Notebooks)

Introduction

What's inside (Key entrypoints)

Quick Start (Bark UI)

Requirements

Install

Run

Downloads & outputs

Important note for local runs

Quick Start (Sambert UI)

Training & inference scripts (Bark path)

Inference

Training (experimental)

Notebooks (Colab/Jupyter)

Notebook organization

TTS / Voice cloning notebooks

Voice conversion (VC) notebooks

Repo layout

Disclaimer

Original README

Bark Voice Cloning 🐶 & Voice Cloning for Chinese Speech 🎶

简体中文

1️⃣ Bark Voice Cloning

If you like the quick start, please star this repository. ⭐⭐⭐

Easy to use:

2️⃣ Voice Cloning for Chinese Speech

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 386 Commits
bark		bark
cloning		cloning
notebooks		notebooks
sambert-ui		sambert-ui
training		training
util		util
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
README_zh.md		README_zh.md
app.py		app.py
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
swap_voice.py		swap_voice.py

License

KevinWang676/Bark-Voice-Cloning

Folders and files

Latest commit

History

Repository files navigation

Bark Voice Cloning + Multi‑Model TTS / Voice Cloning / Voice Conversion (UI + Notebooks)

Introduction

What's inside (Key entrypoints)

Quick Start (Bark UI)

Requirements

Install

Run

Downloads & outputs

Important note for local runs

Quick Start (Sambert UI)

Training & inference scripts (Bark path)

Inference

Training (experimental)

Notebooks (Colab/Jupyter)

Notebook organization

TTS / Voice cloning notebooks

Voice conversion (VC) notebooks

Repo layout

Disclaimer

Original README

Bark Voice Cloning 🐶 & Voice Cloning for Chinese Speech 🎶

简体中文

1️⃣ Bark Voice Cloning

If you like the quick start, please star this repository. ⭐⭐⭐

Easy to use:

2️⃣ Voice Cloning for Chinese Speech

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages