Stars
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
A flexible framework for orchestrating deep learning models with Ray . It dynamically schedules and serves multiple models — from NLP (e.g., FastText) to CV (e.g., YOLO, SAM) — enabling scalable, d…
同事.skill、老板.skill、前任.skill、自己.skill、永生.skill、女娲.skill……
ByteDance's All-in-One Video Generation Model for Human-Object Interaction Video Generation
Awesome Multimodal Modeling [Covers MLLM, UMM, and NMM]
Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?
Enjoy the magic of Diffusion models!
A Curated List of Awesome Video World Models with AR Diffusion: Covering Algorithms, Applications, and Infrastructure, Aimed at Serving as a Comprehensive Resource for Researchers, Practitioners, a…
仅需Python基础,从0构建自己的具身智能机器人;从0逐步构建VLA/OpenVLA/SmolVLA/Pi0, 深入理解具身智能
Vero: An Open RL Recipe for General Visual Reasoning
Light Image Video Generation Inference Framework
ZGI is an open-source platform for building AI applications. Its intuitive interface combines workflow design, agent orchestration, dataset management, and model integration—allowing you to quickly…
Make Any Website & Tool Your CLI. A universal CLI Hub and AI-native runtime. Transform any website, Electron app, or local binary into a standardized command-line interface. Built for AI Agents to …
Official implementation of "OmniForcing: Unleashing Real-time Joint Audio-Visual Generation"[arXiv:2603.11647]. OmniForcing is the first framework to distill bidirectional audio-visual diffusion mo…
Lightweight coding agent that runs in your terminal
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
[ICLR 2026] LongLive: Real-time Interactive Long Video Generation
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
An agentic skills framework & software development methodology that works.
A diffusion-based framework for document OCR that replaces autoregressive decoding with block-level parallel diffusion decoding.
Try X-Dub to sync any character in a video with any audio you like | Official repository for "From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping"
Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory
Automated system for LLM evaluation via agents.
AI Agent Assistant that integrates lots of IM platforms, LLMs, plugins and AI feature, and can be your openclaw alternative. ✨
mjlab-native port of InstinctLab for humanoid RL and Project-Instinct workflows.