Skip to content

KevinWang676/Bark-Voice-Cloning

Repository files navigation

Bark Voice Cloning + Multi‑Model TTS / Voice Cloning / Voice Conversion (UI + Notebooks)

English (default) | 简体中文

Introduction

This repo started as a single Bark voice cloning project and has evolved into a collection of cutting-edge TTS / voice cloning / voice conversion training & inference scripts (UI + Colab notebooks).

It is a practical toolbox focused on:

  • A ready-to-run Gradio Web UI for Bark voice cloning + TTS + voice conversion.
  • A separate Sambert UI workflow for Chinese (and bilingual) personal voice cloning with data labeling → training → inference.
  • A curated set of Colab/Jupyter notebooks covering multiple cutting-edge TTS / VC pipelines (GPT-SoVITS, XTTS, VALL-E X, F5‑TTS, CosyVoice, OpenAI TTS + VC, etc.).

What's inside (Key entrypoints)

  • Bark Web UI: app.py
    • Tabs: Clone Voice (create .npz prompt), TTS, Voice Conversion
    • Uses: cloning/clonevoice.py, swap_voice.py, bark/, util/, training/
  • Sambert Web UI: sambert-ui/app.py (local), sambert-ui/app_colab.py (Colab-friendly)
  • Bark training utilities (experimental): training/training_prepare.py, training/train.py, training/data.py

Quick Start (Bark UI)

Requirements

  • Python 3.10+ recommended
  • GPU recommended (CPU works but is slow)

Install

pip install -r requirements.txt

Run

python app.py

Downloads & outputs

  • On first run, Bark checkpoints are downloaded into ./models/ (see bark/generation.py).
  • HuBERT + tokenizer for voice cloning are downloaded into ./models/hubert/ (see bark/hubert/hubert_manager.py).
  • Generated audio files are written to outputs/ by default (configurable via config.yamloutput_folder_path).

Important note for local runs

The Bark UI’s “Create Voice” feature writes a .npz prompt file. The default path in app.py is set for Colab (/content/...).
If you run locally, you may need to update that destination path to a valid path on your machine (e.g. inside bark/assets/prompts/).

Quick Start (Sambert UI)

Sambert UI provides a full pipeline: auto labeling → training → inference.

cd sambert-ui
pip install -r requirements.txt
python app.py

More details: sambert-ui/README.md

Training & inference scripts (Bark path)

Inference

  • TTS (text → audio):
    • Core API: bark/api.py (generate_with_settings, semantic_to_waveform)
    • UI wrapper: app.py (generate_text_to_speech)
  • Voice cloning (audio → .npz prompt):
    • cloning/clonevoice.py (HuBERT + tokenizer + EnCodec → save .npz)
  • Voice conversion (audio → new voice):
    • swap_voice.py (HuBERT tokens + Bark semantic_to_waveform with history_prompt)

Training (experimental)

  • training/training_prepare.py: generate semantic tokens from text, then synthesize wav pairs
  • training/train.py: prepare HuBERT-ready features and trigger tokenizer training (calls bark/hubert/customtokenizer.py)
  • training/data.py: text sourcing / filtering helpers

Notebooks (Colab/Jupyter)

Notebook organization

Voice-related notebooks are grouped under:

  • notebooks/tts/ (TTS / voice cloning)
  • notebooks/vc/ (voice conversion; any notebook with VC in its filename)

TTS / Voice cloning notebooks

Voice conversion (VC) notebooks

Repo layout

.
├── app.py                      # Bark Gradio UI (voice cloning / TTS / voice conversion)
├── bark/                        # Bark core + HuBERT utilities
├── cloning/                     # Voice cloning (audio -> .npz prompt)
├── training/                    # Experimental training utilities
├── swap_voice.py                # Voice conversion helper
├── util/                        # Settings + SSML/text helpers
├── config.yaml                  # UI + output configuration
├── sambert-ui/                  # Sambert UI (label/train/infer)
└── notebooks/
    ├── tts/                     # TTS / voice cloning notebooks
    ├── vc/                      # Voice conversion notebooks (filenames contain "VC")
    └── ...                      # Other notebooks (LLM/agent/video/etc.)

Disclaimer

This repository is intended for research and learning. Please comply with local laws and obtain proper consent before cloning or converting any voice.

Original README

Bark Voice Cloning 🐶 & Voice Cloning for Chinese Speech 🎶

1️⃣ Bark Voice Cloning

10/19/2023: Fixed ERROR: Exception in ASGI application by specifying gradio==3.33.0 and gradio_client==0.2.7 in requirements.txt.

11/08/2023: Integrated KNN-VC into OpenAI TTS and created an easy-to-use Gradio interface. Try it here.

02/27/2024: We are thrilled to launch our most powerful AI song cover generator ever with Shanghai Artificial Intelligence Laboratory! Just need to provide the name of a song and our application running on an A100 GPU will handle everything else. Check it out in our website (please click "EN" in the first tab of our website to see the english version)! 💕

Based on bark-gui and bark. Thanks to C0untFloyd.

Quick start: Colab Notebook

HuggingFace Demo: Bark Voice Cloning 🤗 (Need a GPU)

Demo Video: YouTube Video

If you would like to run the code locally, remember to replace the original path /content/Bark-Voice-Cloning/bark/assets/prompts/file.npz with the path of file.npz in your own computer.

If you like the quick start, please star this repository. ⭐⭐⭐

Easy to use:

(1) First upload audio for voice cloning and click Create Voice.

image

(2) Choose the option called "file" in Voice if you'd like to use voice cloning.

(3) Click Generate. Done!

image

2️⃣ Voice Cloning for Chinese Speech

10/26/2023: Integrated labeling, training and inference into an easy-to-use user interface of SambertHifigan. Thanks to wujohns.

We want to point out that Bark is very good at generating English speech but relatively poor at generating Chinese speech. So we'd like to adopt another approach, which is called SambertHifigan, to realizing voice cloning for Chinese speech. Please check out our Colab Notebook for the implementation.

Quick start: Colab Notebook

HuggingFace demo: Voice Cloning for Chinese Speech 🤗

Star History Chart

About

Bark Voice Cloning and Voice Cloning for Chinese Speech

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published