Stars
a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi
An example starter repo for Python projects
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。
Official implementation of compound word transformer (AAAI'21)
A pure python module for reading and writing kaldi ark files
A simple library for Fréchet Audio Distance (FAD) calculation
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
Reference-aware automatic speech evaluation toolkit
State-of-the-art pretrained music models for training, evaluation, inference
Speech Human Evaluation Estimation Toolkit (SHEET)
SimulEval: A General Evaluation Toolkit for Simultaneous Translation
The repository provides links to collections of influential and interesting research papers from top AI conferences, with open-source code to promote reproducibility and provide detailed implementa…
INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"
A standardized toolkit of Kernel Audio Distance (KAD)—a distribution-free, unbiased, and computationally efficient metric for evaluating generative audio.
A system works on singing voice synthesis
Extracting character conversations in Genshin Project
logWMSE, an audio quality metric & loss function with support for digital silence target. Useful for training and evaluating audio source separation systems.
原神多语言文本搜索工具,可按关键字搜索所有文本、语音,可用于外语学习,剧情考据,模型训练等用途
Compute distribution-based quality metrics for audio data using embeddings, with a focus on music.
logWMSE, an audio quality metric with support for digital silence target. Useful for evaluating audio source separation systems, even when there are many audio tracks or stems.
jerryuhoo / VISinger
Forked from PlayVoice/VI-SVSUse VITS and Opencpop to develop singing voice synthesis; Different from VISinger.