danjuan-77

🌴

On vacation

Wenming Tu danjuan-77

🌴

On vacation

Ph.D. Student @X-LANCE | Research Intern @bigai-nlco

38 followers · 87 following

SJTU X-LANCE & BIGAI NLCo
中国
03:26 (UTC -12:00)
https://danjuan-77.github.io/

Achievements

sam-audio Public
Forked from facebookresearch/sam-audio

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python Other Updated Dec 19, 2025
danjuan-77.github.io Public
Forked from RayeRen/acad-homepage.github.io

AcadHomepage: A Modern and Responsive Academic Personal Homepage

JavaScript MIT License Updated Dec 8, 2025
Qwen3-VL Public
Forked from QwenLM/Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook Apache License 2.0 Updated Nov 28, 2025
SLAM-LLM-lora-exp Public
Forked from cwx-worst-one/SLAM-LLM

Beta version for SLAM-LLM

Python MIT License Updated Oct 27, 2025
UltraVoice100K Public

This is the official repository for the UltraVoice100K dataset, providing code and dataset samples.

JavaScript 12 1 Updated Oct 26, 2025
Qwen3-Omni Public
Forked from QwenLM/Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook Apache License 2.0 Updated Oct 9, 2025
URO-Bench Public
Forked from Ruiqi-Yan/URO-Bench

Towards Comprehensive Benchmark for End-to-End Spoken Dialogue Models

Shell MIT License Updated Aug 31, 2025
VocalNet Public
Forked from SJTU-OmniAgent/VocalNet

Python Apache License 2.0 Updated Aug 31, 2025
OpenS2S Public
Forked from CASIA-LM/OpenS2S

OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model

Python Updated Aug 27, 2025
GLM-4-Voice Public
Forked from zai-org/GLM-4-Voice

GLM-4-Voice | 端到端中英语音对话模型

Python Apache License 2.0 Updated Aug 27, 2025
Kimi-Audio Public
Forked from MoonshotAI/Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python Updated Aug 12, 2025
MIO Public
Forked from MIO-Team/MIO

MIO: A Foundation Model on Multimodal Tokens

Python Updated Jul 31, 2025
Qwen2.5-Omni Public
Forked from QwenLM/Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook Apache License 2.0 Updated Jul 22, 2025
F5-TTS Public
Forked from SWivid/F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python MIT License Updated Jun 18, 2025
EmoVoice Public
Forked from yanghaha0908/EmoVoice

Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"

Python Updated May 27, 2025
CosyVoice Public
Forked from FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python Apache License 2.0 Updated May 20, 2025
InternLM-XComposer Public
Forked from InternLM/InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python Apache License 2.0 Updated May 14, 2025
SALMONN Public
Forked from bytedance/SALMONN

SALMONN: Speech Audio Language Music Open Neural Network

Python Apache License 2.0 Updated May 14, 2025
MiniCPM-o Public
Forked from OpenBMB/MiniCPM-V

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python Apache License 2.0 Updated May 14, 2025
NExT-GPT Public
Forked from NExT-GPT/NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

Python BSD 3-Clause "New" or "Revised" License Updated May 14, 2025
VITA Public
Forked from VITA-MLLM/VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python Other Updated May 14, 2025
HumanOmni Public
Forked from HumanMLLM/HumanOmni

HumanOmni

Python Updated May 14, 2025
Ola Public
Forked from Ola-Omni/Ola

Ola: Pushing the Frontiers of Omni-Modal Language Model

Python Apache License 2.0 Updated May 14, 2025
R1-Omni Public
Forked from HumanMLLM/R1-Omni

Python Updated May 12, 2025
Awesome-Colorful-LLM Public
Forked from patrick-tssn/Awesome-Colorful-LLM

Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, Fundamental Sciences such as Mathematics, and Ominous.

MIT License Updated Apr 28, 2025
async_cosyvoice Public
Forked from qi-hua/async_cosyvoice

使用vllm加速cosyvoice2的推理

Jupyter Notebook Apache License 2.0 Updated Apr 26, 2025
mini-omni2 Public
Forked from gpt-omni/mini-omni2

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python MIT License Updated Apr 23, 2025
SpeechCraft Public
Forked from thuhcsi/SpeechCraft

The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.

Python Updated Apr 14, 2025
nn-zero-to-hero Public
Forked from karpathy/nn-zero-to-hero

Neural Networks: Zero to Hero-[My learning notes]

Jupyter Notebook MIT License Updated Nov 2, 2024
KV-Reuse-Not-KV-Evict Public

This repository contains the code for my experiments on inference acceleration using different methods based on the Phi3_mini model.

Python Updated Jul 19, 2024

Wenming Tu danjuan-77

Achievements

Achievements

sam-audio Public

Uh oh!

danjuan-77.github.io Public

Uh oh!

Qwen3-VL Public

Uh oh!

SLAM-LLM-lora-exp Public

Uh oh!

UltraVoice100K Public

Uh oh!

Qwen3-Omni Public

Uh oh!

URO-Bench Public

Uh oh!

VocalNet Public

Uh oh!

OpenS2S Public

Uh oh!

GLM-4-Voice Public

Uh oh!

Kimi-Audio Public

Uh oh!

MIO Public

Uh oh!

Qwen2.5-Omni Public

Uh oh!

F5-TTS Public

Uh oh!

EmoVoice Public

Uh oh!

CosyVoice Public

Uh oh!

InternLM-XComposer Public

Uh oh!

SALMONN Public

Uh oh!

MiniCPM-o Public

Uh oh!

NExT-GPT Public

Uh oh!

VITA Public

Uh oh!

HumanOmni Public

Uh oh!

Ola Public

Uh oh!

R1-Omni Public

Uh oh!

Awesome-Colorful-LLM Public

Uh oh!

async_cosyvoice Public

Uh oh!

mini-omni2 Public

Uh oh!

SpeechCraft Public

Uh oh!

nn-zero-to-hero Public

Uh oh!

KV-Reuse-Not-KV-Evict Public

Uh oh!