Stars
Official repository for the WenetSpeech-Chuan dataset.
The awesome collection of OpenClaw skills. 5,400+ skills filtered and categorized from the official OpenClaw Skills Registry.🦞
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
MiniCPM5-1B: A SOTA 1B on-device LLM, small yet powerful.
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo
OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not …
Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive.
Python bindings for FFmpeg - with complex filtering support
A high-throughput and memory-efficient inference and serving engine for LLMs
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型