-
UESTC
- ChengDu,China
Lists (14)
Sort Name ascending (A-Z)
Stars
DDGS | Dux Distributed Global Search. A metasearch library that aggregates results from diverse web search services
The most powerful local music generation model that outperforms most commercial alternatives
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.
A deep research framework based on progressive search and cross-evaluation.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
A TTS that fits in your CPU (and pocket)
"DeepTutor: AI-Powered Personalized Learning Assistant"
AI Manus is a general-purpose AI Agent system that supports running various tools and operations in a sandbox environment.
Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems by Antonio Gulli
A free, open source, and extensible speech-to-text application that works completely offline.
Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation
[arXiv 2025] TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System
The official Implementation of PeriodWave and PeriodWave-Turbo
AI 驱动的 SVG 演示文稿生成系统,支持 PPT、小红书、朋友圈等多格式 | 15 个示例 | 229 页 | 生成可编辑的 ppt 格式
A lightweight, open-source, and intelligent wake word detection engine. Train custom, high-accuracy models with minimal effort.
Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Su…
Light Image Video Generation Inference Framework
Multilingual TTS model with voice cloning and duration control, based on T5Gemma encoder-decoder LLM
Open-source framework for conversational voice AI agents
[INTERSPEECH 2025] The official implementation of DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech