-
Professor, CSU
- China
-
23:20
(UTC -12:00) - https://fingerrec.github.io
Stars
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
Official repository of FlowInOne: Unifying Multimodal Generation as Image-In Image-Out Flow Matching
Bash is all you need - A nano claude code–like 「agent harness」, built from 0 to 1
The first open-domain closed-loop revisited benchmark for evaluating memory consistency and action control in world models.
Glance: Accelerating Diffusion Models with 1 Sample
[CVPR 2026] An official implementation of Adv-GRPO. The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation.
[ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"
Native Multimodal Models are World Learners
[ACL-main-2026]We introduce Chart2Code, the first user-driven, hierarchical benchmark that systematically evaluates Large Multimodal Models on chart-to-code tasks of increasing difficulty.
PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.
Automatic Video Generation from Scientific Papers
Code for Data Collection & Training in Sim+Real Envs: [RSS 2024] Natural Language Can Help Bridge the Sim2Real Gap
Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).
📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.
[ICCV 2025] Balanced Image Stylization with Style Matching Score
Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Mu…
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
[ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"
Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling
Janus-Series: Unified Multimodal Understanding and Generation Models
Unified layout planning and image generation, ICCV2025