Skip to content
View Zth9730's full-sized avatar
🥬
Ataraxy
🥬
Ataraxy
  • Computer of Science and Technology Beijing

Block or report Zth9730

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
164 results for source starred repositories
Clear filter

MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning

39 3 Updated Dec 30, 2025

The official repository VeRL for Autoregressive TTS.

Python 5 Updated Nov 18, 2025
Python 78 7 Updated Nov 12, 2025

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 13,833 1,712 Updated Feb 29, 2024

dLLM: Simple Diffusion Language Modeling

Python 1,713 171 Updated Feb 6, 2026

verl: Volcano Engine Reinforcement Learning for LLMs

Python 19,041 3,203 Updated Feb 6, 2026

[NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.

Python 189 14 Updated Dec 9, 2025

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,753 65 Updated Jan 20, 2026

Fast and memory-efficient exact kmeans

Python 138 11 Updated Feb 5, 2026

Compute WER and SER for speech recognition evaluation

Python 25 2 Updated Dec 15, 2025

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 965 94 Updated Sep 20, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,395 210 Updated Jan 8, 2026

从零构建大模型:从预训练到RLHF的完整实践

Python 2,365 173 Updated Jan 30, 2026

[TMLR 2025🔥] A survey for the autoregressive models in vision.

787 22 Updated Nov 8, 2025

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 241 26 Updated Nov 11, 2025

Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"

Python 147 16 Updated Jun 3, 2025
Python 297 39 Updated Jul 22, 2025

Text-audio foundation model from Boson AI

Python 7,906 601 Updated Jan 18, 2026

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,333 96 Updated Sep 22, 2025

[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".

Python 204 10 Updated Jun 18, 2025

Update ASR paper everyday

Python 450 22 Updated Feb 6, 2026

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,233 1,662 Updated Feb 4, 2026

A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.

Python 224 32 Updated Aug 6, 2025
Python 36 4 Updated Sep 6, 2025

Bert-VITS2项目bug多且教程不友好。本proj尽可能修复了Bert-vits2项目的bug,并且可一键启动训练。仅需50条目标说话人语音,获得稳定、快速的TTS模型。

Python 67 9 Updated Aug 19, 2025

Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。

Python 270 17 Updated Feb 3, 2026

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 4,099 300 Updated Jan 5, 2026

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 66,993 8,142 Updated Feb 4, 2026

(WIP)long form speech generatoins

Python 31 4 Updated Apr 2, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 23,403 4,349 Updated Feb 7, 2026
Next