Speech recognition | Speech synthesis |
---|---|
✔️ | ✔️ |
Speaker identification | Speaker diarization | Speaker verification |
---|---|---|
✔️ | ✔️ | ✔️ |
Spoken Language identification | Audio tagging | Voice activity detection |
---|---|---|
✔️ | ✔️ | ✔️ |
Keyword spotting | Add punctuation |
---|---|
✔️ | ✔️ |
Architecture | Android | iOS | Windows | macOS | linux |
---|---|---|---|---|---|
x64 | ✔️ | ✔️ | ✔️ | ✔️ | |
x86 | ✔️ | ✔️ | |||
arm64 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
arm32 | ✔️ | ✔️ | |||
riscv64 | ✔️ |
1. C++ | 2. C | 3. Python | 4. JavaScript |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
5. Java | 6. C# | 7. Kotlin | 8. Swift |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
9. Go | 10. Dart | 11. Rust | 12. Pascal |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
For Rust support, please see sherpa-rs
It also supports WebAssembly.
This repository supports running the following functions locally
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker diarization
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., silero-vad)
- Keyword spotting
on the following platforms and operating systems:
- x86,
x86_64
, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64) - Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- NodeJS
- WebAssembly
- Raspberry Pi
- RV1126
- LicheePi4A
- VisionFive 2
- 旭日X3派
- 爱芯派
- etc
with the following APIs
- C++, C, Python, Go,
C#
- Java, Kotlin, JavaScript
- Swift, Rust
- Dart, Object Pascal
You can visit the following Huggingface spaces to try sherpa-onnx without installing anything. All you need is a browser.
Description | URL |
---|---|
Speaker diarization | Click me |
Speech recognition | Click me |
Speech recognition with Whisper | Click me |
Speech synthesis | Click me |
Generate subtitles | Click me |
Audio tagging | Click me |
Spoken language identification with Whisper | Click me |
We also have spaces built using WebAssembly. They are listed below:
Description | Huggingface space | ModelScope space |
---|---|---|
Voice activity detection with silero-vad | Click me | 地址 |
Real-time speech recognition (Chinese + English) with Zipformer | Click me | 地址 |
Real-time speech recognition (Chinese + English) with Paraformer | Click me | 地址 |
Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-large | Click me | 地址 |
Real-time speech recognition (English) | Click me | 地址 |
VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoice | Click me | 地址 |
VAD + speech recognition (English) with Whisper tiny.en | Click me | 地址 |
VAD + speech recognition (English) with Zipformer trained with GigaSpeech | Click me | 地址 |
VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeech | Click me | 地址 |
VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeech | Click me | 地址 |
VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2 | Click me | 地址 |
VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC model | Click me | 地址 |
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large | Click me | 地址 |
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small | Click me | 地址 |
Speech synthesis (English) | Click me | 地址 |
Speech synthesis (German) | Click me | 地址 |
Speaker diarization | Click me | 地址 |
You can find pre-built Android APKs for this repository in the following table
Description | URL | 中国用户 |
---|---|---|
Speaker diarization | Address | 点此 |
Streaming speech recognition | Address | 点此 |
Text-to-speech | Address | 点此 |
Voice activity detection (VAD) | Address | 点此 |
VAD + non-streaming speech recognition | Address | 点此 |
Two-pass speech recognition | Address | 点此 |
Audio tagging | Address | 点此 |
Audio tagging (WearOS) | Address | 点此 |
Speaker identification | Address | 点此 |
Spoken language identification | Address | 点此 |
Keyword spotting | Address | 点此 |
Description | URL | 中国用户 |
---|---|---|
Streaming speech recognition | Address | 点此 |
Description | URL | 中国用户 |
---|---|---|
Android (arm64-v8a, armeabi-v7a, x86_64) | Address | 点此 |
Linux (x64) | Address | 点此 |
macOS (x64) | Address | 点此 |
macOS (arm64) | Address | 点此 |
Windows (x64) | Address | 点此 |
Note: You need to build from source for iOS.
Description | URL |
---|---|
Speech recognition (speech to text, ASR) | Address |
Text-to-speech (TTS) | Address |
VAD | Address |
Keyword spotting | Address |
Audio tagging | Address |
Speaker identification (Speaker ID) | Address |
Spoken language identification (Language ID) | See multi-lingual Whisper ASR models from Speech recognition |
Punctuation | Address |
Speaker segmentation | Address |
- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi
Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.
Streaming ASR and TTS based on FastAPI
It shows how to use the ASR and TTS Python APIs with FastAPI.
Uses streaming ASR in C# with graphical user interface.
Video demo in Chinese: 【开源】Windows实时字幕软件(网课/开会必备)
It uses the JavaScript API of sherpa-onnx along with Electron
Video demo in Chinese: 爆了!炫神教你开打字挂!真正影响胜率的英雄联盟工具!英雄联盟的最后一块拼图!和游戏中的每个人无障碍沟通!