Skip to content
View KHao123's full-sized avatar

Highlights

  • Pro

Block or report KHao123

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

LatentMorph: Morphing Latent Reasoning into Image Generation

Python 25 Updated Feb 3, 2026

The source code for "Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach" (CVPR24 Oral & TPAMI25)

Python 98 6 Updated Feb 2, 2026

Official Implementation of Paper [DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation]

Python 74 1 Updated Dec 29, 2025

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Python 64 1 Updated Nov 27, 2025

FIBO is a SOTA, first open-source, JSON-native text-to-image model built for controllable, predictable, and legally safe image generation.

Python 303 14 Updated Jan 7, 2026

Official repository for “Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space”

Python 18 1 Updated Jan 27, 2026

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

Python 182 5 Updated Dec 9, 2025
Python 57 4 Updated Jan 30, 2026

Official implementation of "Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention"

Python 35 Updated Jan 8, 2026

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 12,555 1,188 Updated Feb 5, 2026

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Python 27 3 Updated Oct 20, 2025

Automatic Video Generation from Scientific Papers

Python 2,108 304 Updated Oct 20, 2025

📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.

410 20 Updated Feb 5, 2026

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,348 60 Updated Dec 7, 2025

A Survey of Reinforcement Learning for Large Reasoning Models

TeX 2,316 129 Updated Nov 9, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,165 1,577 Updated Jan 30, 2026

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,624 512 Updated Feb 4, 2026

code for affordance-r1

Python 54 1 Updated Dec 21, 2025

GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Python 2,159 145 Updated Jan 27, 2026

Official Repository for MolmoAct

Python 295 33 Updated Jan 13, 2026

Code and website for "GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation"

Python 35 3 Updated Oct 9, 2025

RoboBrain 2.5: Advanced version of RoboBrain. Depth in Sight, Time in Mind. 🎉🎉🎉

Python 815 65 Updated Jan 27, 2026

[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model

Python 617 22 Updated Oct 29, 2024

Intelligent automation and multi-agent orchestration for Claude Code

C# 27,863 3,072 Updated Feb 2, 2026

CLI tool for configuring and monitoring Claude Code

Python 19,510 1,828 Updated Feb 5, 2026

A curated list of awesome skills, hooks, slash-commands, agent orchestrators, applications, and plugins for Claude Code by Anthropic

Python 22,910 1,319 Updated Feb 5, 2026

Best Claude Code framework that actually save time. Built by a dev tired of typing "please act like a senior engineer" in every conversation.

Python 2,662 160 Updated Oct 7, 2025

[ICLR 2026] Unified Vision-Language-Action Model

Python 273 20 Updated Oct 15, 2025

Open-source unified multimodal model

Python 5,638 499 Updated Oct 27, 2025
Next