Skip to content

An OpenVoice-based voice cloning tool, single executable file (~14M), supporting multiple formats without dependencies on ffmpeg, Python, PyTorch, ONNX. 基于OpenVoice的声音克隆工具,免安装的单个可执行文件(~14M),支持多种格式,不依赖ffmpeg、python、pytorch、onnx

License

Notifications You must be signed in to change notification settings

jingangdidi/voice_clone

Repository files navigation

voice_clone

License

中文文档

An OpenVoice-based voice cloning tool, single executable file (~14M), supporting multiple formats without dependencies on ffmpeg, Python, PyTorch, ONNX.

基于OpenVoice的声音克隆工具,免安装的单个可执行文件(~14M),支持多种格式,不依赖ffmpeg、python、pytorch、onnx

👑 Features

  • ​💪​ Single-file executable - no installation required
  • 🎈 Independent of FFmpeg, Python, PyTorch, and ONNX
  • 🎨​ Support multiple formats (e.g. mp4, mp3, wav)
  • 👄 Offer multiple built-in base speakers: en-au, en-br, en-default, en-india, en-newest, en-us, es, fr, jp, kr, zh
  • 💻​ Support CPU & GPU

🚀 Quick-Start

structure

some dir
├─ voice_clone    # single executable file
└─ checkpoints_v2 # OpenVoice model
     └─ converter # use -m specify this dir, default: ./checkpoints_v2/converter
          ├─ config.json
          └─ checkpoint.pth

1. download a pre-built binary

latest release

2. download OpenVoice modle

checkpoints_v2

3. convert some voice to your voice

./voice_clone -s raw_voice.wav -t your_voice.wav

😁 Usage Example

1. convert tone color of "test.mp4" to default built-in base speaker "en-default"

voice_clone -s test.mp4

output 1 file:

test--en-default.wav # test.mp4 --> en-default

2. convert tone color of "test1.mp4" and "test2.wav" to built-in base speaker "zh"

voice_clone -s test1.mp4:test2.wav -t zh

output 2 files:

test1--zh.wav # test1.mp4 --> zh
test2--zh.wav # test2.wav --> zh

3. convert tone color of "test.mp4" to built-in base speaker "zh" and "my_voice.wav"

voice_clone -s test.mp4 -t zh:my_voice.wav

output 2 files:

test--zh.wav       # test.mp4 --> zh
test--my_voice.wav # test.mp4 --> my_voice.wav

4. convert tone color of "test1.mp4" and "test2.wav" to built-in base speaker "zh" and "my_voice.wav", save extracted tone color

voice_clone -s test1.mp4:test2.wav -t zh:my_voice.wav -S -n result1.wav:result2.wav -o ./result

output 5 files:

test1.tone           # test1.mp4 tone color, next time use test1.mp4, will skip extract tone color from test1.mp4, use test1.tone directly
test2.tone           # test2.wav tone color, next time use test2.wav, will skip extract tone color from test2.wav, use test2.tone directly
my_voice.tone        # my_voice.wav tone color, next time use my_voice.wav, will skip extract tone color from my_voice.wav, use my_voice.tone directly
./result/result1.wav # test1.mp4 --> zh
./result/result2.wav # test2.wav --> my_voice.wav

⚡️ Performance

os: ubuntu 22.04, CPU: i7-13700K, GPU: NVIDIA GeForce RTX 4090, cuda: 12.2

CPU/GPU thread elapsed time command
CPU 1 ~40s voice_clone -s test_data/test.wav -T 1
CPU 10 ~16s voice_clone -s test_data/test.wav -T 10
CPU 20 ~15s voice_clone -s test_data/test.wav -T 20
CPU all ~14s voice_clone -s test_data/test.wav -T 0
GPU ~1.6s voice_clone -s test_data/test.wav

🛠 Building from source

  • default use cpu and simple vad (not require onnx)
git clone https://github.com/jingangdidi/voice_clone.git
cd voice_clone
cargo build --release
  • use silero vad (require onnx)
- default = ["simple_vad"]
+ default = ["silero_vad"]
  • use GPU
- candle-core = { git = "https://github.com/jingangdidi/candle", package = "candle-core", branch = "main" }
+ candle-core = { git = "https://github.com/jingangdidi/candle", package = "candle-core", branch = "main", features = ["cuda"] }

- candle-nn = { git = "https://github.com/jingangdidi/candle", package = "candle-nn", branch = "main" }
+ candle-nn = { git = "https://github.com/jingangdidi/candle", package = "candle-nn", branch = "main", features = ["cuda"] }

🚥 Arguments

Usage: voice_clone -s <source> [-t <target>] [-n <name>] [-m <model>] [-S] [-T <thread>] [-o <outpath>]

voice clone

Options:
  -s, --source      source files, colon separated
  -t, --target      target files, colon separated. -t also support base speakers: en-au, en-br, en-default, en-india, en-newest, en-us, es, fr, jp, kr, zh. default: en-default
  -n, --name        result voice file names, colon separated, default: source--target.wav
  -m, --model       openvoice model path, default: ./checkpoints_v2/converter
  -S, --save        save source and target tone color to to the same directory as the specified -s and -t files, maintaining identical nomenclature while altering the format extension to ".tone"
  -T, --thread      cpu threads, 0 means all threads, default: 4
  -o, --outpath     output path, default: ./
  -h, --help        display usage information

📚 Acknowledgements

⏰ changelog

  • [2025.08.25] release v0.1.1
    • 🛠Fix: Replace the fixed value 44100 in the code with the actual sample rate of the source and target voice.
    • ⭐️Add: Support print source and target tone cosine similarity.
    • 💪🏻Optimize: Support voice less than 10 seconds.
  • [2025.08.08] release v0.1.0

About

An OpenVoice-based voice cloning tool, single executable file (~14M), supporting multiple formats without dependencies on ffmpeg, Python, PyTorch, ONNX. 基于OpenVoice的声音克隆工具,免安装的单个可执行文件(~14M),支持多种格式,不依赖ffmpeg、python、pytorch、onnx

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published