Stars
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
[NeurIPS 2025] OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication
fay是一个帮助数字人(2.5d、3d、移动、pc、网页)或大语言模型(openai兼容、deepseek)连通业务系统的mcp框架。
一个使用 Typescript + Electron 实现的类 Manus 桌面端
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
No fortress, purely open ground. OpenManus is Coming.
Your AI Operator for Web, Android, Automation & Testing.
Unitree robot sdk version 2. https://support.unitree.com/home/zh/developer
Use GitHub Actions to automatically get Microsoft Edge offline installation package
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
[CVPR 2025] Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
This is a speech interaction system built on an open-source model, integrating ASR, LLM, and TTS in sequence. The ASR model is SenceVoice, the LLM models are QWen2.5-0.5B/1.5B, and there are three …
实时语音交互数字人,支持语音端到端和级联方案。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cascaded solutions (ASR-LLM-TTS-THG). …
The customization marketplace for Windows programs: https://windhawk.net/
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
A framework helps you quickly build AI Native IDE products. MCP Client, supports Model Context Protocol (MCP) tools via MCP server.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Janus-Series: Unified Multimodal Understanding and Generation Models
Admin Web Interface for juanfont/headscale
An open source, self-hosted implementation of the Tailscale control server
The easiest, most secure way to use WireGuard and 2FA.