Lists (15)
Sort Name ascending (A-Z)
Stars
Murmur: An Efficient Inference System for Long-Form ASR
Official repository for the WenetSpeech-Chuan dataset.
zll961020 / multinerf
Forked from google-research/multinerfA Code Release for Mip-NeRF 360, Ref-NeRF, and RawNeRF
zll961020 / CLAP
Forked from LAION-AI/CLAPContrastive Language-Audio Pretraining
zll961020 / hello-agents
Forked from datawhalechina/hello-agents📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程
zll961020 / VibeVoice
Forked from microsoft/VibeVoiceOpen-Source Frontier Voice AI
zll961020 / Qwen3-ASR
Forked from QwenLM/Qwen3-ASRQwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.
zll961020 / DiariZen
Forked from BUTSpeechFIT/DiariZenA toolkit for speaker diarization.
Claude Code v2.1.88 Source Code
zll961020 / claude-howto
Forked from luongnv89/claude-howtoA visual, example-driven guide to Claude Code — from basic concepts to advanced agents, with copy-paste templates that bring immediate value.
zll961020 / deepagents
Forked from langchain-ai/deepagentsAgent harness built with LangChain and LangGraph. Equipped with a planning tool, a filesystem backend, and the ability to spawn subagents - well-equipped to handle complex agentic tasks.
zll961020 / deer-flow
Forked from bytedance/deer-flowAn open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…
来自于文章Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition
Variational Bayes HMM over x-vectors diarization
zll961020 / ROLL
Forked from alibaba/ROLLAn Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.
zll961020 / r1-aqa
Forked from xiaomi-research/r1-aqa🤗 R1-AQA Model: mispeech/r1-aqa
zll961020 / SALMONN
Forked from bytedance/SALMONNSALMONN family: A suite of advanced multi-modal LLMs
zll961020 / Qwen3-Omni
Forked from QwenLM/Qwen3-OmniQwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
zll961020 / HTGS
Forked from nerficg-project/HTGSOfficial code release for "Efficient Perspective-Correct 3D Gaussian Splatting Using Hybrid Transparency"
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
zll961020 / whisperX
Forked from m-bain/whisperXWhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
zll961020 / SLAM-LLM
Forked from X-LANCE/SLAM-LLMA Framework for Speech, Language, Audio, Music Processing with Large Language Model
zll961020 / CityGaussian
Forked from Linketic/CityGaussian[ECCV`24&ICLR`25] CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians
zll961020 / nanochat
Forked from karpathy/nanochatThe best ChatGPT that $100 can buy.
zll961020 / west
Forked from wenet-e2e/westWe Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
zll961020 / ms-swift
Forked from modelscope/ms-swiftUse PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
TorchCFM: a Conditional Flow Matching library