-
Peking University
- https://purshow.github.io/
Stars
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
Welcome to GR00T Whole-Body Control (WBC)! This is a unified platform for developing and deploying advanced humanoid controllers. This includes: Decoupled WBC models used in NVIDIA Isaac-Gr00t, Gr0…
Terrarium: Multi-turn data engine for evaluating and optimizing LLM agents in living environments.
Official code of Motus: A Unified Latent Action World Model
A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.
🦞 ClawMark: A Living-World Benchmark for Multi-Day, Multimodal Coworker Agents
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
HY-Embodied: Embodied Foundation Models for Real-World Agents
Elevate your AI research writing, no more tedious polishing ✨
科研写作助手 (Research Writing Assistant)
Gen-Searcher: Reinforcing Agentic Search for Image Generation
(ICCV 2025) "Principal Components" Enable A New Language of Images
[CVPR 2026 Highlight] A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens
Vero: An Open RL Recipe for General Visual Reasoning
📚 A curated collection of papers and open-source code repositories dedicated to the application of Vision-Language Models (VLMs) for streaming video.
A simple video streaming baseline that outperforms SOTAs.
Your behavior is the signal. Not your words. — Behavioral intelligence for AI agents, built into your MacBook notch.
FileGram: Grounding Agent Personalization in File-System Behavioral Traces
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing.
paper collection: alignment of diffusion models
A benchmark for evaluating contextual agents on realistic multimodal personal-computer environments with profiling and factual-retention tasks.
🐧 Unify-Agent: An end-to-end unified multimodal agent for faithful, knowledge-grounded image generation.
将冰冷的离别化为温暖的 Skill,欢迎加入数字生命1.0!Transforming cold farewells into warm skills? It's giving rebirth era. Welcome to Digital Life 1.0. 🫶
Codebase for InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression