Skip to content
View zwglory's full-sized avatar
  • University of Chinese Academy of Science
  • Beijing in China

Block or report zwglory

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 2,246 162 Updated Dec 19, 2025

An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone

Python 18,391 2,882 Updated Dec 19, 2025

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

Python 754 92 Updated Dec 17, 2025

GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters

Python 586 52 Updated Dec 12, 2025

Versatile audio super resolution (any -> 48kHz) with AudioSR.

Python 1,682 179 Updated Aug 27, 2025

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Python 2,790 252 Updated Jun 25, 2025

🤯 LobeHub - an open-source, modern design AI Agent Workspace. Supports multiple AI providers, Knowledge Base (file upload / RAG ), one click install MCP Marketplace and Artifacts / Thinking. One-cl…

TypeScript 69,293 14,274 Updated Dec 21, 2025

GELab: GUI Exploration Lab. One of the best GUI agent solutions in the galaxy, built by the StepFun-GELab team and powered by Step’s research capabilities.

Python 1,649 136 Updated Dec 19, 2025

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 66,628 9,530 Updated Dec 16, 2025
Python 7,548 445 Updated Dec 14, 2025
Python 426 28 Updated Nov 27, 2025

Lightning-Fast, On-Device TTS — running natively via ONNX.

JavaScript 1,873 171 Updated Dec 15, 2025

Official implementation of YingMusic-SVC.

Python 91 7 Updated Dec 15, 2025

τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment

Python 555 123 Updated Dec 18, 2025

We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

Python 167 11 Updated Dec 16, 2025

FLM-Audio is a audio-language subversion of RoboEgo/FLM-Ego -- an omnimodal model with native full duplexity.

Python 53 8 Updated Dec 9, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,143 193 Updated Oct 9, 2025

The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Jupyter Notebook 122 10 Updated Aug 16, 2025

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

Python 70,173 7,614 Updated Dec 19, 2025

DeepResearchAgent is a hierarchical multi-agent system designed not only for deep research tasks but also for general-purpose task solving. The framework leverages a top-level planning agent to coo…

JavaScript 3,001 403 Updated Sep 29, 2025

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 905 87 Updated Sep 20, 2025

A multimodal RAG application that enables semantic search on multimedia sources like audio, video and images

Python 41 5 Updated Nov 18, 2023
Python 111 5 Updated Oct 21, 2025

A lightweight, powerful framework for multi-agent workflows

Python 17,903 3,001 Updated Dec 20, 2025

Official Repository of "OmniTry: Virtual Try-On Anything without Masks"

Python 234 29 Updated Aug 29, 2025

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 232 25 Updated Nov 11, 2025

Accepted as [NeurIPS 2024] Spotlight Presentation Paper

Jupyter Notebook 6,374 652 Updated Sep 26, 2024

LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

Go 9,797 1,052 Updated Dec 19, 2025
Python 21 1 Updated Aug 12, 2025
Next