-
UESTC
- ChengDu,China
Lists (14)
Sort Name ascending (A-Z)
Stars
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
🔥 OneThinker: All-in-one Reasoning Model for Image and Video
📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程
[ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
Optimized Whisper models for streaming and on-device use
[NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents. RAG + Knowledge Graphs + Personali…
This is the official implementation for Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1.
Official implementation of YingMusic-SVC.
Extracting time features from text using a Finite State Transducer (FST) in Python
TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control
Lightning-Fast, On-Device TTS — running natively via ONNX.
EverMemOS is an open-source, enterprise-grade intelligent memory system. Our mission is to build AI memory that never forgets, making every conversation built on previous understanding.
The repository provides code for running inference with the SAM 3D Body Model (3DB), links for downloading the trained model checkpoints and datasets, and example notebooks that show how to use the…
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
Momentum Human Rig is an anatomically-inspired parametric full-body digital human model developed at Meta. It includes: A parametric body skeletal model; A realistic 3D mesh skinned to the skeleton…
A minimal yet professional single agent demo project that showcases the core execution pipeline and production-grade features of agents.
MiroMind Research Agent: Fully Open-Source Deep Research Agent with Reproducible State-of-the-Art Performance on FutureX, GAIA, HLE, BrowserComp and xBench.
MiroThinker is a series of open-source agentic models trained for deep research and complex tool use scenarios.
The official repo of BridgeVoC, which explores using the Schrödinger Bridge framework for neural vocoding.
Neural Accent Conversion via Disentangled Speech Representations
a Dify plugin to convert markdown text into docx file
Official Repository of Paper: "Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios"(AAAI 2026)
🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 推送,1分钟手机通知,无需…
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages