Skip to content
View zll961020's full-sized avatar

Block or report zll961020

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 1 Updated Dec 22, 2025

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python 503 31 Updated Dec 23, 2025

🤗 R1-AQA Model: mispeech/r1-aqa

Python 1 Updated Mar 28, 2025

SALMONN family: A suite of advanced multi-modal LLMs

1 Updated Sep 28, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 1 Updated Oct 9, 2025

Official code release for "Efficient Perspective-Correct 3D Gaussian Splatting Using Hybrid Transparency"

Cuda 1 Updated Oct 16, 2025

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

Python 1 Updated Oct 17, 2025

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 1 Updated Oct 21, 2025

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 1 Updated Oct 24, 2025

[ECCV`24&ICLR`25] CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians

Jupyter Notebook 1 Updated Oct 26, 2025

The best ChatGPT that $100 can buy.

Python 1 Updated Oct 28, 2025

We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

Python 1 Updated Oct 31, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…

Python 1 Updated Nov 7, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 1 Updated Nov 10, 2025

TorchCFM: a Conditional Flow Matching library

Python 1 Updated Nov 11, 2025

Depth Anything 3

Jupyter Notebook 1 Updated Nov 18, 2025

CUDA accelerated rasterization of gaussian splatting

Cuda 1 Updated Nov 18, 2025

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 1 Updated Dec 7, 2025

Train transformer language models with reinforcement learning.

Python 1 Updated Dec 12, 2025

🤗 R1-AQA Model: mispeech/r1-aqa

Python 311 27 Updated Mar 28, 2025

SALMONN family: A suite of advanced multi-modal LLMs

1,372 112 Updated Sep 28, 2025

Update ASR paper everyday

Python 412 20 Updated Dec 24, 2025

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 8,881 985 Updated Dec 13, 2025

A python package to build AI-powered real-time audio applications

Python 1,901 154 Updated Feb 12, 2025

Some comprehensive papers about speaker diarization

324 10 Updated May 22, 2025

Tools for handling multimodal data in machine learning projects.

Python 1 Updated Nov 24, 2025

Tools for handling multimodal data in machine learning projects.

Python 1,095 257 Updated Dec 15, 2025

TorchCFM: a Conditional Flow Matching library

Python 2,188 177 Updated Nov 11, 2025

High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.

Python 12,729 1,266 Updated Oct 28, 2025

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

Python 607 51 Updated Oct 29, 2025
Next