Highlights
- Pro
Lists (10)
Sort Name ascending (A-Z)
Starred repositories
A collection of high-quality models for the MuJoCo physics engine, curated by Google DeepMind.
Multi-Joint dynamics with Contact. A general purpose physics simulator.
Open-source framework for conversational voice AI agents
In-depth tutorials on LLMs, RAGs and real-world AI agent applications.
Get your documents ready for gen AI
Emotional Speech Conversion using Style Transfer and MUNIT
UniFace: A Comprehensive Library for Face Detection, Recognition, Landmark Analysis, Face Parsing, Gaze Estimation, Age, and Gender Detection
AI-driven predictive maintenance for vehicles using GBM models on real-time sensor data. Proactive fleet management, cost reduction, and efficient transportation enabled by forecasting maintenance …
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation
Send a phone call from AI agent, in an API call. Or, directly call the bot from the configured phone number!
Official repository for CVPR 2024 highlight paper 4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations.
[CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. Official Repository.
A Paper List for Humanoid Robot Learning.
A humanoid bipedal walking control repo using NMPC and WBC, and simulation with mujoco. Contact: xuejl2001@mail.ustc.edu.cn
🤖 The Full Process Python Package for Robot Learning from Demonstration and Robot Manipulation
Ready-to-use SRT / WebRTC / RTSP / RTMP / LL-HLS / MPEG-TS / RTP media server and media proxy that allows to read, publish, proxy, record and playback video and audio streams.
[IROS 2025 Award Finalist] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Learning resources for Universal Scene Description (OpenUSD) including tutorials, examples, and reference materials to help developers understand and work with OpenUSD effectively
🎒 Token-Oriented Object Notation (TOON) – Compact, human-readable, schema-aware JSON for LLM prompts. Spec, benchmarks, TypeScript SDK.
"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai Tech Report Link: https://arxiv.org/abs/2512.10971
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
Awesome World Model for Robotics Papers
Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
HunyuanVideo: A Systematic Framework For Large Video Generation Model