Lists (15)
Sort Name ascending (A-Z)
Stars
zll961020 / ROLL
Forked from alibaba/ROLLAn Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
zll961020 / r1-aqa
Forked from xiaomi-research/r1-aqa🤗 R1-AQA Model: mispeech/r1-aqa
zll961020 / SALMONN
Forked from bytedance/SALMONNSALMONN family: A suite of advanced multi-modal LLMs
zll961020 / Qwen3-Omni
Forked from QwenLM/Qwen3-OmniQwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
zll961020 / HTGS
Forked from nerficg-project/HTGSOfficial code release for "Efficient Perspective-Correct 3D Gaussian Splatting Using Hybrid Transparency"
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
zll961020 / whisperX
Forked from m-bain/whisperXWhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
zll961020 / SLAM-LLM
Forked from X-LANCE/SLAM-LLMA Framework for Speech, Language, Audio, Music Processing with Large Language Model
zll961020 / CityGaussian
Forked from Linketic/CityGaussian[ECCV`24&ICLR`25] CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians
zll961020 / nanochat
Forked from karpathy/nanochatThe best ChatGPT that $100 can buy.
zll961020 / west
Forked from wenet-e2e/westWe Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
zll961020 / ms-swift
Forked from modelscope/ms-swiftUse PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
TorchCFM: a Conditional Flow Matching library
Depth Anything 3
zll961020 / gsplat
Forked from nerfstudio-project/gsplatCUDA accelerated rasterization of gaussian splatting
zll961020 / pyannote-audio
Forked from pyannote/pyannote-audioNeural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
zll961020 / trl
Forked from huggingface/trlTrain transformer language models with reinforcement learning.
SALMONN family: A suite of advanced multi-modal LLMs
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
A python package to build AI-powered real-time audio applications
Some comprehensive papers about speaker diarization
zll961020 / lhotse
Forked from lhotse-speech/lhotseTools for handling multimodal data in machine learning projects.
Tools for handling multimodal data in machine learning projects.
TorchCFM: a Conditional Flow Matching library
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.