-
SJTU X-LANCE & BIGAI NLCo
- 中国
-
03:26
(UTC -12:00) - https://danjuan-77.github.io/
-
sam-audio Public
Forked from facebookresearch/sam-audioThe repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Python Other UpdatedDec 19, 2025 -
danjuan-77.github.io Public
Forked from RayeRen/acad-homepage.github.ioAcadHomepage: A Modern and Responsive Academic Personal Homepage
JavaScript MIT License UpdatedDec 8, 2025 -
Qwen3-VL Public
Forked from QwenLM/Qwen3-VLQwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Jupyter Notebook Apache License 2.0 UpdatedNov 28, 2025 -
SLAM-LLM-lora-exp Public
Forked from cwx-worst-one/SLAM-LLMBeta version for SLAM-LLM
Python MIT License UpdatedOct 27, 2025 -
UltraVoice100K Public
This is the official repository for the UltraVoice100K dataset, providing code and dataset samples.
-
Qwen3-Omni Public
Forked from QwenLM/Qwen3-OmniQwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Jupyter Notebook Apache License 2.0 UpdatedOct 9, 2025 -
URO-Bench Public
Forked from Ruiqi-Yan/URO-BenchTowards Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Shell MIT License UpdatedAug 31, 2025 -
-
OpenS2S Public
Forked from CASIA-LM/OpenS2SOpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
Python UpdatedAug 27, 2025 -
GLM-4-Voice Public
Forked from zai-org/GLM-4-VoiceGLM-4-Voice | 端到端中英语音对话模型
Python Apache License 2.0 UpdatedAug 27, 2025 -
Kimi-Audio Public
Forked from MoonshotAI/Kimi-AudioKimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Python UpdatedAug 12, 2025 -
MIO Public
Forked from MIO-Team/MIOMIO: A Foundation Model on Multimodal Tokens
Python UpdatedJul 31, 2025 -
Qwen2.5-Omni Public
Forked from QwenLM/Qwen2.5-OmniQwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Jupyter Notebook Apache License 2.0 UpdatedJul 22, 2025 -
F5-TTS Public
Forked from SWivid/F5-TTSOfficial code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Python MIT License UpdatedJun 18, 2025 -
EmoVoice Public
Forked from yanghaha0908/EmoVoiceOfficial code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"
Python UpdatedMay 27, 2025 -
CosyVoice Public
Forked from FunAudioLLM/CosyVoiceMulti-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Python Apache License 2.0 UpdatedMay 20, 2025 -
InternLM-XComposer Public
Forked from InternLM/InternLM-XComposerInternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Python Apache License 2.0 UpdatedMay 14, 2025 -
SALMONN Public
Forked from bytedance/SALMONNSALMONN: Speech Audio Language Music Open Neural Network
Python Apache License 2.0 UpdatedMay 14, 2025 -
MiniCPM-o Public
Forked from OpenBMB/MiniCPM-VMiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Python Apache License 2.0 UpdatedMay 14, 2025 -
NExT-GPT Public
Forked from NExT-GPT/NExT-GPTCode and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
Python BSD 3-Clause "New" or "Revised" License UpdatedMay 14, 2025 -
VITA Public
Forked from VITA-MLLM/VITA✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Python Other UpdatedMay 14, 2025 -
-
Ola Public
Forked from Ola-Omni/OlaOla: Pushing the Frontiers of Omni-Modal Language Model
Python Apache License 2.0 UpdatedMay 14, 2025 -
-
Awesome-Colorful-LLM Public
Forked from patrick-tssn/Awesome-Colorful-LLMRecent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, Fundamental Sciences such as Mathematics, and Ominous.
MIT License UpdatedApr 28, 2025 -
async_cosyvoice Public
Forked from qi-hua/async_cosyvoice使用vllm加速cosyvoice2的推理
Jupyter Notebook Apache License 2.0 UpdatedApr 26, 2025 -
mini-omni2 Public
Forked from gpt-omni/mini-omni2Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Python MIT License UpdatedApr 23, 2025 -
SpeechCraft Public
Forked from thuhcsi/SpeechCraftThe official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
Python UpdatedApr 14, 2025 -
nn-zero-to-hero Public
Forked from karpathy/nn-zero-to-heroNeural Networks: Zero to Hero-[My learning notes]
Jupyter Notebook MIT License UpdatedNov 2, 2024 -
KV-Reuse-Not-KV-Evict Public
This repository contains the code for my experiments on inference acceleration using different methods based on the Phi3_mini model.
Python UpdatedJul 19, 2024