KHao123

Kanghao Chen KHao123

Ph.D student in AI Thrust, HKUST(GZ)

17 followers · 9 following

HKUST(GZ)
Guangdong, China
KHao123.github.io
@KaneChen9707

Achievements

Highlights

Lists (3)

Sort

Starred repositories

EnVision-Research / DualCamCtrl

Official Implementation of Paper [DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation]

Python 66 1 Updated Dec 12, 2025

EnVision-Research / TiViBench

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Python 62 1 Updated Nov 27, 2025

Bria-AI / FIBO

FIBO is a SOTA, first open-source, JSON-native text-to-image model built for controllable, predictable, and legally safe image generation.

Python 287 12 Updated Dec 4, 2025

FYYDCC / IVT-LR

Official repository for “Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space”

Python 14 1 Updated Oct 17, 2025

zhangquanchen / 3DThinker

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

Python 110 4 Updated Dec 9, 2025

bytedance / mammothmoda

Python 36 2 Updated Dec 11, 2025

EnVision-Research / MTI

Official implementation of "Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention"

Python 32 Updated Oct 21, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,760 1,071 Updated Dec 21, 2025

EnVision-Research / PhysToolBench

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Python 27 3 Updated Oct 20, 2025

showlab / Paper2Video

Automatic Video Generation from Scientific Papers

Python 2,009 297 Updated Oct 20, 2025

weijiawu / Awesome-Visual-Reinforcement-Learning

📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.

369 19 Updated Nov 29, 2025

Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,309 58 Updated Dec 7, 2025

TsinghuaC3I / Awesome-RL-for-LRMs

A Survey of Reinforcement Learning for Large Reasoning Models

TeX 2,182 120 Updated Nov 9, 2025

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,299 1,445 Updated Nov 28, 2025

EvolvingLMMs-Lab / lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,397 461 Updated Dec 18, 2025

hq-King / Affordance-R1

code for affordance-r1

Python 48 1 Updated Dec 21, 2025

zai-org / GLM-V

GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Python 2,062 140 Updated Dec 18, 2025

allenai / molmoact

Official Repository for MolmoAct

Python 275 29 Updated Dec 11, 2025

abhaybd / GraspMolmo

Code and website for "GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation"

Python 33 2 Updated Oct 9, 2025

FlagOpen / RoboBrain2.0

RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. 🎉🎉🎉

Python 731 61 Updated Dec 16, 2025

UMass-Embodied-AGI / 3D-VLA

[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model

Python 609 22 Updated Oct 29, 2024

wshobson / agents

Intelligent automation and multi-agent orchestration for Claude Code

Python 23,170 2,566 Updated Dec 21, 2025

davila7 / claude-code-templates

CLI tool for configuring and monitoring Claude Code

Python 12,851 1,138 Updated Dec 21, 2025

hesreallyhim / awesome-claude-code

A curated list of awesome commands, files, and workflows for Claude Code

Python 18,377 1,037 Updated Dec 21, 2025

AgibotTech / Genie-Envisioner

Python 341 17 Updated Dec 19, 2025

brennercruvinel / CCPlugins

Best Claude Code framework that actually save time. Built by a dev tired of typing "please act like a senior engineer" in every conversation.

Python 2,591 159 Updated Oct 7, 2025

baaivision / UniVLA

Unified Vision-Language-Action Model

Python 256 18 Updated Oct 15, 2025

ByteDance-Seed / Bagel

Open-source unified multimodal model

Python 5,490 481 Updated Oct 27, 2025

zilliztech / claude-context

Code search MCP for Claude Code. Make entire codebase the context for any coding agent.

TypeScript 4,815 440 Updated Sep 16, 2025

jonyzhang2023 / awesome-embodied-vla-va-vln

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

2,195 94 Updated Dec 17, 2025

Kanghao Chen KHao123

Highlights

Lists (3)

🔮 Future ideas

✨ Inspiration

🚀 My stack

Starred repositories

3d-reconstruction