cnlinxi

🎯

Focusing

cnlinxi

🎯

Focusing

252 followers · 128 following

Achievements

Highlights

Lists (26)

Sort

SpeechEditing

SpeechSeperation

1 repository

Tools

32 repositories

Universal Method

6 repositories

Vocoder

20 repositories

VoiceConversion

6 repositories

Starred repositories

analyticsinmotion / werpy

🐍📦 Ultra-fast Python package for calculating and analyzing the Word Error Rate (WER). Built for the scalable evaluation of speech and transcription accuracy.

Python 19 5 Updated Dec 19, 2025

inclusionAI / Ming-UniAudio

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

Python 406 28 Updated Nov 27, 2025

corticph / error-align

Text-to-text alignment algorithm for speech recognition error analysis.

Python 22 1 Updated Dec 15, 2025

hans0809 / MiniMind-in-Depth

轻量级大语言模型MiniMind的源码解读，包含tokenizer、RoPE、MoE、KV Cache、pretraining、SFT、LoRA、DPO等完整流程

520 45 Updated Jun 16, 2025

mangiucugna / json_repair

A python module to repair invalid JSON from LLMs

Python 4,184 161 Updated Dec 17, 2025

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 18,688 2,060 Updated Dec 17, 2025

ufal / SimulStreaming

Python 394 52 Updated Oct 22, 2025

QuentinFuxa / WhisperLiveKit

Simultaneous speech-to-text model

Python 9,270 912 Updated Dec 19, 2025

mingyin0312 / RLFromScratch

Python 465 37 Updated Aug 28, 2025

OpenMOSS / MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…

Python 1,061 95 Updated Dec 8, 2025

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 7,753 577 Updated Sep 15, 2025

fighting41love / zhvoice

Chinese voice corpus. 中文语音语料，语音更加清晰自然，包含8个开源数据集，3200个说话人，900小时语音，1300万字。

710 125 Updated Jun 12, 2020

sarulab-speech / UTMOSv2

UTokyo-SaruLab MOS Prediction System

Python 273 28 Updated Dec 18, 2025

ML-GSAI / Diffusion-LLM-Papers

A Collection of Papers on Diffusion Language Models

149 6 Updated Sep 15, 2025

Audio-Foundation-Models / ConversationTTS

Python 81 5 Updated Jul 9, 2025

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 10,824 1,156 Updated Apr 9, 2025

TEN-framework / ten-vad

Voice Activity Detector (VAD) : low-latency, high-performance and lightweight

C 1,760 140 Updated Dec 18, 2025

zejunwang1 / CTCDataset

中文文本纠错数据集汇总

Python 28 10 Updated Dec 17, 2025

MYZY-AI / Muyan-TTS

Python 472 43 Updated May 19, 2025

DanielLin94144 / Full-Duplex-Bench

A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models

Python 110 4 Updated Sep 21, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,391 319 Updated Jun 21, 2025

shuaijiang / Whisper-Finetune

Forked from yeyupiaoling/Whisper-Finetune

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deplo…

C 312 24 Updated Nov 28, 2025

cnlinxi

Highlights

Lists (26)

AcousticFrontend

AcousticModel

ASR

ASR-pretrain

ASV

AudioQuality

AwesomeList

BandwidthExtension

Classification

Codec

Data

Develop

Evaluation

FrontEnd

How-to

LLM

Music

Performance

Quant

SingingVoiceSynthesis