Skip to content
View RicherMans's full-sized avatar

Block or report RicherMans

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

Python 741 88 Updated Dec 17, 2025

AllenAI's post-training codebase

Python 3,456 476 Updated Dec 19, 2025

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 2,047 144 Updated Dec 19, 2025

XARES-LLM

Python 29 1 Updated Dec 19, 2025

5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs

Python 55 9 Updated Nov 19, 2025

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 1,816 107 Updated Dec 8, 2025

Xiaomi Miloco

Python 1,923 122 Updated Dec 17, 2025

Xiaomi MiMo-VL-Miloco

179 4 Updated Nov 14, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,487 213 Updated Dec 16, 2025

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 605 51 Updated Oct 29, 2025

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 443 24 Updated Dec 15, 2025

Contexts Optical Compression

Python 21,498 1,922 Updated Oct 25, 2025

Proxmox VE Helper-Scripts (Community Edition)

Shell 24,032 2,162 Updated Dec 19, 2025

Data Pipeline, Models, and Benchmark for Omni-Captioner.

Python 105 Updated Oct 17, 2025

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,641 53 Updated Nov 15, 2025
Python 19 2 Updated Oct 9, 2025

Exploration into Discrete Distribution Network, by Lei Yang out of Beijing

Python 36 Updated Oct 19, 2025
Python 70 7 Updated Nov 12, 2025

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

Python 406 28 Updated Nov 27, 2025

Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"

Python 102 4 Updated Oct 26, 2025
Python 65 4 Updated Dec 5, 2025

Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"

81 2 Updated Sep 18, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,136 191 Updated Oct 9, 2025

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Python 2,989 319 Updated Dec 15, 2025

Make text LLMs listen and speak

Python 1,036 180 Updated Dec 11, 2025

An transformer based LLM. Written completely in Rust

Rust 2,999 254 Updated Oct 10, 2025

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 903 87 Updated Sep 20, 2025

Flash Attention Triton kernel with support for second-order derivatives

Python 122 11 Updated Dec 17, 2025

[ICCV 2025] Implementation of the paper "Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs"

Python 61 2 Updated Oct 25, 2025
Next