-
Computer Vision Lab, POSTECH
- kdwonn.github.io
Stars
Implementation of Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players
A Minimal and Elegant Framework & Tutorial for Real-Time Interactive World Models
Use Lerobot to collect piper robot arm data, and perform training and reasoning 使用lerobot采集piper机械臂数据,并训练和推理
[CVPR 2026] Affostruction: 3D Affordance Grounding with Generative Reconstruction
StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
A Minimalist, Batteries-included Repository for Advancing World Model Science.
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.
A curated list of papers and selected technical blogs on Loop Models.
Unfied World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets
[RSS 2026] LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion
An offline-first scientific writing workspace powered by Claude. LaTeX + Python + 100+ scientific skills all running locally.
[CVPR 2024] Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform
[CVPR 2026 Highlight] A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens
AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a querya…
A platform for reproducible world model research and evaluation
Code, data and weights for the paper **What drives success in physical planning with Joint-Embedding Predictive World Models?**
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
InstantDrag: Improving Interactivity in Drag-based Image Editing
Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals
PyTorch code and models for VJEPA2 self-supervised learning from video.
[NeurIPS 2025] CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification
[CoRL 2025] UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations
AutoGaze automatically removes redundant patches in a video, reducing #tokens in ViT/MLLM by 4x-100x.
Single-stage End-to-End Training for Tokenization and Generation
[RSS 2026] Causal video-action world model for generalist robot control