Skip to content
View xinkez's full-sized avatar

Block or report xinkez

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Visual Studio Code extension for ty.

TypeScript 335 12 Updated Feb 16, 2026

🤖 WebMCP

Bikeshed 1,344 77 Updated Feb 12, 2026

Pure C inference of Mistral Voxtral Realtime 4B speech to text model

C 1,336 77 Updated Feb 15, 2026

Unofficial implementation of training pipeline in mimo-tokenizer about "MiMo-Audio: Audio Language Models are Few-Shot Learners"

Python 2 Updated Nov 9, 2025

DFlash: Block Diffusion for Flash Speculative Decoding

Python 550 34 Updated Feb 6, 2026

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

Python 276 17 Updated Nov 17, 2025
Python 13 1 Updated Nov 28, 2025

Write scalable load tests in plain Python 🚗💨

Python 27,513 3,175 Updated Feb 17, 2026

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python 869 57 Updated Feb 13, 2026

Very fast, accurate speaker diarization

Python 234 18 Updated Feb 7, 2026

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 12,678 1,203 Updated Feb 18, 2026

Trainging, inference, and testing of the SAC speech codec model.

Python 98 6 Updated Nov 1, 2025

VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)

Python 972 365 Updated Jan 23, 2026

LongCat Audio Tokenizer and Detokenizer

Python 284 21 Updated Feb 10, 2026
Python 79 8 Updated Nov 12, 2025

MOSS-Speech is a true speech-to-speech large language model without text guidance.

Python 123 6 Updated Feb 13, 2026

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 968 94 Updated Sep 20, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,431 214 Updated Jan 8, 2026

Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"

84 2 Updated Sep 18, 2025

[ICLR2026] AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model

Python 53 2 Updated Oct 12, 2025
Python 536 56 Updated Oct 1, 2025

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 611 39 Updated Feb 17, 2026

VoiceStar: Robust, Duration-controllable TTS that can Extrapolate

Python 308 27 Updated May 31, 2025
Python 81 5 Updated Jun 25, 2025

[ICCV2025] TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/TokenBridge

Python 151 4 Updated Jul 24, 2025

Training code for MaskGCT-T2S model.

Python 24 8 Updated Dec 14, 2024

A fundamental toolkit designed for music, song, and audio generation

Python 1,305 132 Updated May 20, 2025

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,269 110 Updated Mar 2, 2025
Next