- Hong Kong
-
11:04
(UTC +08:00) - http://fuxiao0719.github.io/
- @lemonaddie0909
Stars
Cosmos-Transfer2.5, built on top of Cosmos-Predict2.5, produces high-quality world simulations conditioned on multiple spatial control inputs.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
A unified inference and post-training framework for accelerated video generation.
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
A generative world for general-purpose robotics & embodied AI learning.
Making large AI models cheaper, faster and more accessible
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
verl: Volcano Engine Reinforcement Learning for LLMs
A Paper List for Humanoid Robot Learning.
UnrealZoo / unrealzoo-gym
Forked from zfw1226/gym-unrealcv[ICCV 2025 Highlights] Large-scale photo-realistic virtual worlds for embodied AI
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
rCM: SOTA Diffusion Distillation & Few-Step Video Generation
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
LongLive: Real-time Interactive Long Video Generation
[NeurIPS 2025 D&B] Open-source Multi-agent Poster Generation from Papers
A growing curation of Text-to-3D, Diffusion-to-3D works.
Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
[IROS 2025 Award Finalist] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
[ICLR 2024 Spotlight] SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
[NeurIPS 2025] WorldMem: Long-term Consistent World Simulation with Memory
[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.