-
Tongyi Lab
- Hangzhou, China
-
17:06
(UTC +08:00) - https://scholar.google.com/citations?user=hreTTqwAAAAJ&hl=en
Lists (4)
Sort Name ascending (A-Z)
Stars
OmX - Oh My codeX: Your codex is not alone. Add hooks, agent teams, HUDs, and so much more.
Lightweight coding agent that runs in your terminal
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
Mobile-Agent: The Powerful GUI Agent Family
Voice Activity Projection Models: Self-supervised learning of Turn-taking Events
Turn detection for full-duplex dialogue communication
Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.
[ICML 2025 Tokenization Workshop] HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling
LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
GPT-4o-level, real-time spoken dialogue system.
Janus-Series: Unified Multimodal Understanding and Generation Models
[ACM CCS'24] SafeEar: Content Privacy-Preserving Audio Deepfake Detection
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
A generative speech model for daily dialogue.
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
[ICLR 2024] Official code for the paper 'Elucidating the Exposure Bias in Diffusion Models'
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation