-
HKUST(GZ)
- Guangdong, China
- KHao123.github.io
- @KaneChen9707
Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Starred repositories
Official Implementation of Paper [DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation]
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
FIBO is a SOTA, first open-source, JSON-native text-to-image model built for controllable, predictable, and legally safe image generation.
Official repository for “Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space”
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
Official implementation of "Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention"
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Automatic Video Generation from Scientific Papers
📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
A Survey of Reinforcement Learning for Large Reasoning Models
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Code and website for "GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation"
RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. 🎉🎉🎉
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
Intelligent automation and multi-agent orchestration for Claude Code
CLI tool for configuring and monitoring Claude Code
A curated list of awesome commands, files, and workflows for Claude Code
Best Claude Code framework that actually save time. Built by a dev tired of typing "please act like a senior engineer" in every conversation.
Code search MCP for Claude Code. Make entire codebase the context for any coding agent.
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.