Speech recognition | Speech synthesis | Speaker verification | Speaker identification |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
Spoken Language identification | Audio tagging | Voice activity detection |
---|---|---|
✔️ | ✔️ | ✔️ |
Keyword spotting | Add punctuation |
---|---|
✔️ | ✔️ |
Architecture | Android | iOS | Windows | macOS | linux |
---|---|---|---|---|---|
x64 | ✔️ | ✔️ | ✔️ | ✔️ | |
x86 | ✔️ | ✔️ | |||
arm64 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
arm32 | ✔️ | ✔️ | |||
riscv64 | ✔️ |
1. C++ | 2. C | 3. Python | 4. JavaScript |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
5. Java | 6. C# | 7. Kotlin | 8. Swift |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
9. Go | 10. Dart | 11. Rust | 12. Pascal |
---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ |
For Rust support, please see sherpa-rs
It also supports WebAssembly.
This repository supports running the following functions locally
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., silero-vad)
- Keyword spotting
on the following platforms and operating systems:
- x86,
x86_64
, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64) - Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- NodeJS
- WebAssembly
- Raspberry Pi
- RV1126
- LicheePi4A
- VisionFive 2
- 旭日X3派
- 爱芯派
- etc
with the following APIs
- C++, C, Python, Go,
C#
- Java, Kotlin, JavaScript
- Swift, Rust
- Dart, Object Pascal
You can visit the following Huggingface spaces to try sherpa-onnx
without
installing anything. All you need is a browser.
Description | URL |
---|---|
Speech recognition | Click me |
Speech recognition with Whisper | Click me |
Speech synthesis | Click me |
Generate subtitles | Click me |
Audio tagging | Click me |
Spoken language identification with Whisper | Click me |
We also have spaces built using WebAssembly. They are listed below:
Description | Huggingface space | ModelScope space |
---|---|---|
Voice activity detection with silero-vad | Click me | 地址 |
Real-time speech recognition (Chinese + English) with Zipformer | Click me | 地址 |
Real-time speech recognition (Chinese + English) with Paraformer | Click me | 地址 |
Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-large | Click me | 地址 |
Real-time speech recognition (English) | Click me | 地址 |
VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoice | Click me | 地址 |
VAD + speech recognition (English) with Whisper tiny.en | Click me | 地址 |
VAD + speech recognition (English) with Zipformer trained with GigaSpeech | Click me | 地址 |
VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeech | Click me | 地址 |
VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeech | Click me | 地址 |
VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2 | Click me | 地址 |
VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC model | Click me | 地址 |
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large | Click me | 地址 |
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small | Click me | 地址 |
Speech synthesis (English) | Click me | 地址 |
Speech synthesis (German) | Click me | 地址 |
Description | URL | 中国用户 |
---|---|---|
Streaming speech recognition | Address | 点此 |
Text-to-speech | Address | 点此 |
Voice activity detection (VAD) | Address | 点此 |
VAD + non-streaming speech recognition | Address | 点此 |
Two-pass speech recognition | Address | 点此 |
Audio tagging | Address | 点此 |
Audio tagging (WearOS) | Address | 点此 |
Speaker identification | Address | 点此 |
Spoken language identification | Address | 点此 |
Keyword spotting | Address | 点此 |
Description | URL | 中国用户 |
---|---|---|
Streaming speech recognition | Address | 点此 |
Description | URL | 中国用户 |
---|---|---|
Android (arm64-v8a, armeabi-v7a, x86_64) | Address | 点此 |
Linux (x64) | Address | 点此 |
macOS (x64) | Address | 点此 |
macOS (arm64) | Address | 点此 |
Windows (x64) | Address | 点此 |
Note: You need to build from source for iOS.
Description | URL | 中国用户 |
---|---|---|
Generate subtitles (生成字幕) | Address | 点此 |
Description | URL |
---|---|
Speech recognition (speech to text, ASR) | Address |
Text-to-speech (TTS) | Address |
VAD | Address |
Keyword spotting | Address |
Audio tagging | Address |
Speaker identification (Speaker ID) | Address |
Spoken language identification (Language ID) | See multi-lingual Whisper ASR models from Speech recognition |
Punctuation | Address |
- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi
Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.
Streaming ASR and TTS based on FastAPI
It shows how to use the ASR and TTS Python APIs with FastAPI.
Uses streaming ASR in C# with graphical user interface.