Paulmzr

Follow

Zhengrui Ma Paulmzr

Follow

Student at University of Chinese Academy of Sciences, @ictnlp

24 followers · 16 following

University of Chinese Academy of Sciences

Achievements

Achievements

Stars

ulinwang / kimusic

AI 音乐创作 Skill — 从概念到MP3的简单工作流

Python 2 1 Updated Mar 28, 2026

ultraworkers / claw-code

The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.

Rust 186,140 108,748 Updated Apr 17, 2026

tanweai / pua

你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候，对你的期望是很高的。一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.

TypeScript 16,392 942 Updated Apr 18, 2026

xingchensong / FlashCosyVoice

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 247 25 Updated Feb 25, 2026

OpenMOSS / SpeechGPT-2.0-preview

GPT-4o-level, real-time spoken dialogue system.

Python 371 32 Updated Jan 27, 2025

XiaomiMiMo / lmms-eval

Forked from EvolvingLMMs-Lab/lmms-eval

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 72 5 Updated Aug 8, 2025

xingchensong / TouchNet

A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.

Python 230 30 Updated Apr 8, 2026

KexinHUANG19 / InstructTTSEval

Python 42 1 Updated Jun 25, 2025

ictnlp / FastLongSpeech

FastLongSpeech is a novel framework designed to extend the capabilities of Large Speech-Language Models for efficient long-speech processing without necessitating dedicated long-speech training data.

Python 15 1 Updated Jul 22, 2025

ictnlp / StreamUni

StreamUni is a framework that efficiently enables unified Large Speech-Language Models to accomplish streaming speech translation in a cohesive manner.

Python 19 2 Updated Jul 14, 2025

kyutai-labs / hibiki

Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…

Rust 1,449 116 Updated Apr 15, 2025

dreamtheater123 / Awesome-SpeechLM-Survey

Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.

189 6 Updated Jun 17, 2025

ictnlp / Stream-Omni

Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.

Python 385 44 Updated Jun 17, 2025

XiaomiMiMo / MiMo-VL

MiMo-VL

634 31 Updated Aug 21, 2025

XiaomiMiMo / MiMo

MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

Python 2,031 84 Updated Jun 5, 2025

shaochenze / EAR

Python 43 3 Updated May 15, 2025

ga642381 / speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

1,220 74 Updated Apr 4, 2026

baichuan-inc / Baichuan-Audio

Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction

Python 220 13 Updated Feb 28, 2025

dynamic-superb / dynamic-superb

The official repository of Dynamic-SUPERB.

Python 200 90 Updated Jun 24, 2025

bytedance / MegaTTS3

Python 6,086 471 Updated Aug 29, 2025

nari-labs / dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 19,259 1,681 Updated Nov 19, 2025

ictnlp / SLED-TTS

Streamable Text-to-Speech model using a language modeling approach, without vector quantization

Python 110 8 Updated May 20, 2025

jishengpeng / WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,286 111 Updated Mar 2, 2025

ictnlp / LLaMA-Omni2

Python 268 27 Updated May 19, 2025

keonlee9420 / evaluate-zero-shot-tts

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature

Python 93 11 Updated Mar 12, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 20,646 2,367 Updated Mar 16, 2026

Kyubyong / g2p

g2p: English Grapheme To Phoneme Conversion

Python 917 136 Updated Jan 5, 2023

ictnlp / MonoAttn-Transducer

Code for ICML25 Paper "Overcoming Non-monotonicity in Transducer-based Streaming Generation"

Python 12 2 Updated May 19, 2025

ictnlp / PCFG-NAT

Code for NeurIPS 2023 paper "Non-autoregressive Machine Translation with Probabilistic Context-free Grammar".

Cuda 12 1 Updated Jan 4, 2024

descriptinc / descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,767 178 Updated Jan 26, 2026