Skip to content
View Paulmzr's full-sized avatar
  • University of Chinese Academy of Sciences

Block or report Paulmzr

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

AI 音乐创作 Skill — 从概念到MP3的简单工作流

Python 2 1 Updated Mar 28, 2026

The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.

Rust 186,140 108,748 Updated Apr 17, 2026

你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候,对你的期望是很高的。 一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.

TypeScript 16,392 942 Updated Apr 18, 2026

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 247 25 Updated Feb 25, 2026

GPT-4o-level, real-time spoken dialogue system.

Python 371 32 Updated Jan 27, 2025

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 72 5 Updated Aug 8, 2025

A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.

Python 230 30 Updated Apr 8, 2026
Python 42 1 Updated Jun 25, 2025

FastLongSpeech is a novel framework designed to extend the capabilities of Large Speech-Language Models for efficient long-speech processing without necessitating dedicated long-speech training data.

Python 15 1 Updated Jul 22, 2025

StreamUni is a framework that efficiently enables unified Large Speech-Language Models to accomplish streaming speech translation in a cohesive manner.

Python 19 2 Updated Jul 14, 2025

Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…

Rust 1,449 116 Updated Apr 15, 2025

Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.

189 6 Updated Jun 17, 2025

Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.

Python 385 44 Updated Jun 17, 2025

MiMo-VL

634 31 Updated Aug 21, 2025

MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining

Python 2,031 84 Updated Jun 5, 2025
Python 43 3 Updated May 15, 2025

Awesome speech/audio LLMs, representation learning, and codec models

1,220 74 Updated Apr 4, 2026

Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction

Python 220 13 Updated Feb 28, 2025

The official repository of Dynamic-SUPERB.

Python 200 90 Updated Jun 24, 2025
Python 6,086 471 Updated Aug 29, 2025

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 19,259 1,681 Updated Nov 19, 2025

Streamable Text-to-Speech model using a language modeling approach, without vector quantization

Python 110 8 Updated May 20, 2025

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,286 111 Updated Mar 2, 2025
Python 268 27 Updated May 19, 2025

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature

Python 93 11 Updated Mar 12, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 20,646 2,367 Updated Mar 16, 2026

g2p: English Grapheme To Phoneme Conversion

Python 917 136 Updated Jan 5, 2023

Code for ICML25 Paper "Overcoming Non-monotonicity in Transducer-based Streaming Generation"

Python 12 2 Updated May 19, 2025

Code for NeurIPS 2023 paper "Non-autoregressive Machine Translation with Probabilistic Context-free Grammar".

Cuda 12 1 Updated Jan 4, 2024

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,767 178 Updated Jan 26, 2026
Next