Skip to content
View cnlinxi's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report cnlinxi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

🐍📦 Ultra-fast Python package for calculating and analyzing the Word Error Rate (WER). Built for the scalable evaluation of speech and transcription accuracy.

Python 19 5 Updated Dec 19, 2025

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

Python 406 28 Updated Nov 27, 2025

Text-to-text alignment algorithm for speech recognition error analysis.

Python 22 1 Updated Dec 15, 2025

轻量级大语言模型MiniMind的源码解读,包含tokenizer、RoPE、MoE、KV Cache、pretraining、SFT、LoRA、DPO等完整流程

520 45 Updated Jun 16, 2025

A python module to repair invalid JSON from LLMs

Python 4,184 161 Updated Dec 17, 2025

Open-Source Frontier Voice AI

Python 18,688 2,060 Updated Dec 17, 2025
Python 394 52 Updated Oct 22, 2025

Simultaneous speech-to-text model

Python 9,270 912 Updated Dec 19, 2025
Python 465 37 Updated Aug 28, 2025

MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…

Python 1,061 95 Updated Dec 8, 2025

Text-audio foundation model from Boson AI

Python 7,753 577 Updated Sep 15, 2025

Chinese voice corpus. 中文语音语料,语音更加清晰自然,包含8个开源数据集,3200个说话人,900小时语音,1300万字。

710 125 Updated Jun 12, 2020

UTokyo-SaruLab MOS Prediction System

Python 273 28 Updated Dec 18, 2025

A Collection of Papers on Diffusion Language Models

149 6 Updated Sep 15, 2025

Spark-TTS Inference Code

Python 10,824 1,156 Updated Apr 9, 2025

Voice Activity Detector (VAD) : low-latency, high-performance and lightweight

C 1,760 140 Updated Dec 18, 2025

中文文本纠错数据集汇总

Python 28 10 Updated Dec 17, 2025
Python 472 43 Updated May 19, 2025

A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models

Python 110 4 Updated Sep 21, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,391 319 Updated Jun 21, 2025

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deplo…

C 312 24 Updated Nov 28, 2025

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,850 78 Updated Dec 6, 2025

Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.

579 8 Updated Jun 7, 2024

[EMNLP 2025] MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

Python 133 7 Updated Nov 9, 2025

A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.

Python 100 5 Updated May 5, 2025

代码大模型 预训练&微调&DPO 数据处理 业界处理pipeline sota

Python 48 11 Updated Jul 25, 2024

Heuristic filtering framework for RefineCode

Python 82 10 Updated Mar 13, 2025

Towards Human-Sounding Speech

Python 5,818 501 Updated Dec 5, 2025

A quick guide (especially) for trending instruction finetuning datasets

3,328 226 Updated Nov 28, 2023
Next