xinkez

Follow

xinkez

Follow

10 followers · 49 following

Stars

astral-sh / ty-vscode

A Visual Studio Code extension for ty.

TypeScript 335 12 Updated Feb 16, 2026

webmachinelearning / webmcp

🤖 WebMCP

Bikeshed 1,344 77 Updated Feb 12, 2026

antirez / voxtral.c

Pure C inference of Mistral Voxtral Realtime 4B speech to text model

C 1,336 77 Updated Feb 15, 2026

liyunlongaaa / MiMo-Tokenizer-Trainer

Unofficial implementation of training pipeline in mimo-tokenizer about "MiMo-Audio: Audio Language Models are Few-Shot Learners"

Python 2 Updated Nov 9, 2025

z-lab / dflash

DFlash: Block Diffusion for Flash Speculative Decoding

Python 550 34 Updated Feb 6, 2026

JIA-Lab-research / MGM-Omni

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

Python 276 17 Updated Nov 17, 2025

jjery2243542 / flow-slm

Python 13 1 Updated Nov 28, 2025

locustio / locust

Write scalable load tests in plain Python 🚗💨

Python 27,513 3,175 Updated Feb 17, 2026

stepfun-ai / Step-Audio-EditX

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python 869 57 Updated Feb 13, 2026

narcotic-sh / senko

Very fast, accurate speaker diarization

Python 234 18 Updated Feb 7, 2026

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 12,678 1,203 Updated Feb 18, 2026

Soul-AILab / SAC

Trainging, inference, and testing of the SAC speech codec model.

Python 98 6 Updated Nov 1, 2025

vibevoice-community / VibeVoice

VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)

Python 972 365 Updated Jan 23, 2026

XiaomiMiMo / MiMo-Audio-Training

Python 97 11 Updated Oct 16, 2025

meituan-longcat / LongCat-Audio-Codec

LongCat Audio Tokenizer and Detokenizer

Python 284 21 Updated Feb 10, 2026

inclusionAI / MingTok-Audio

Python 79 8 Updated Nov 12, 2025

OpenMOSS / MOSS-Speech

MOSS-Speech is a true speech-to-speech large language model without text guidance.

Python 123 6 Updated Feb 13, 2026

XiaomiMiMo / MiMo-Audio

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 968 94 Updated Sep 20, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,431 214 Updated Jan 8, 2026

Hannieliao / Emilia-NV

Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"

84 2 Updated Sep 18, 2025

LAION-AI / emotion-annotations

Python 107 10 Updated Oct 1, 2025

ali-vilab / alitok

[ICLR2026] AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model

Python 53 2 Updated Oct 12, 2025

playht / PlayDiffusion

Python 536 56 Updated Oct 1, 2025

liutaocode / TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 611 39 Updated Feb 17, 2026

jasonppy / VoiceStar

VoiceStar: Robust, Duration-controllable TTS that can Extrapolate

Python 308 27 Updated May 31, 2025

X-LANCE / KWStreamingSearch

Python 81 5 Updated Jun 25, 2025

YuqingWang1029 / TokenBridge

[ICCV2025] TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/TokenBridge

Python 151 4 Updated Jul 24, 2025

nene1212 / MaskGCT-Training

Training code for MaskGCT-T2S model.

Python 24 8 Updated Dec 14, 2024

FunAudioLLM / FunMusic

A fundamental toolkit designed for music, song, and audio generation

Python 1,305 132 Updated May 20, 2025

jishengpeng / WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,269 110 Updated Mar 2, 2025