-
Tsinghua University
- https://scholar.google.com/citations?hl=zh-CN&user=kMui170AAAAJ
Stars
From Vision-Language-Action Models to a Real-World Robot Learning Stack
UniRL is a Framework for Unified Multimodal Model Reinforcement Learning
Kimi Code CLI — The Starting Point for Next-Gen Agents
A collection of skills for AI financial analysis.
My learning notes for ML SYS.
[ECCV 2026] Official code of GEM: Generative Supervision Helps Embodied Intelligence
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects
Skill package for ML/CV/NLP paper writing, curated and adapted from Prof. Peng Sida's open notes for Codex, Claude Code, and Gemini.
Can Language Models Rebuild Programs From Scratch?
Beyond SFT-to-RL: Pre-alignment via Black-BoxOn-Policy Distillation for Multimodal RL
A benchmark for evaluating LLMs on Chinese traditional fortune telling — Bazi (八字) and Ziwei Doushu (紫微斗数).
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
Extracted system prompts from Anthropic - Claude Fable 5, Opus 4.8, Claude Code, Claude Design. OpenAI - ChatGPT 5.5 Thinking, GPT 5.5 Instant, Codex. Google - Gemini 3.5 Flash, 3.1 Pro, Antigravit…
SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles
Reference code for the Meta-Harness paper.
Terrarium: Multi-turn data engine for evaluating and optimizing LLM agents in living environments.
🦞 ClawMark: A Living-World Benchmark for Multi-Day, Multimodal Coworker Agents
A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.
The agent that grows with you
HY-Embodied: Embodied Foundation Models for Real-World Agents
The best-benchmarked open-source AI memory system. And it's free.
FileGram: Grounding Agent Personalization in File-System Behavioral Traces
Your behavior is the signal. Not your words. — Behavioral intelligence for AI agents, built into your MacBook notch.
Production-grade engineering skills for AI coding agents.
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
A benchmark for evaluating contextual agents on realistic multimodal personal-computer environments with profiling and factual-retention tasks.
SkillsBench evaluates how well skills work and how effective agents are at using them.