Skip to content
View KHao123's full-sized avatar

Highlights

  • Pro

Block or report KHao123

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Official Implementation of Paper [DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation]

Python 66 1 Updated Dec 12, 2025

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Python 62 1 Updated Nov 27, 2025

FIBO is a SOTA, first open-source, JSON-native text-to-image model built for controllable, predictable, and legally safe image generation.

Python 287 12 Updated Dec 4, 2025

Official repository for “Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space”

Python 14 1 Updated Oct 17, 2025

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

Python 110 4 Updated Dec 9, 2025
Python 36 2 Updated Dec 11, 2025

Official implementation of "Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention"

Python 32 Updated Oct 21, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,760 1,071 Updated Dec 21, 2025

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Python 27 3 Updated Oct 20, 2025

Automatic Video Generation from Scientific Papers

Python 2,009 297 Updated Oct 20, 2025

📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.

369 19 Updated Nov 29, 2025

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,309 58 Updated Dec 7, 2025

A Survey of Reinforcement Learning for Large Reasoning Models

TeX 2,182 120 Updated Nov 9, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,299 1,445 Updated Nov 28, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,397 461 Updated Dec 18, 2025

code for affordance-r1

Python 48 1 Updated Dec 21, 2025

GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Python 2,062 140 Updated Dec 18, 2025

Official Repository for MolmoAct

Python 275 29 Updated Dec 11, 2025

Code and website for "GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation"

Python 33 2 Updated Oct 9, 2025

RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. 🎉🎉🎉

Python 731 61 Updated Dec 16, 2025

[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model

Python 609 22 Updated Oct 29, 2024

Intelligent automation and multi-agent orchestration for Claude Code

Python 23,170 2,566 Updated Dec 21, 2025

CLI tool for configuring and monitoring Claude Code

Python 12,851 1,138 Updated Dec 21, 2025

A curated list of awesome commands, files, and workflows for Claude Code

Python 18,377 1,037 Updated Dec 21, 2025

Best Claude Code framework that actually save time. Built by a dev tired of typing "please act like a senior engineer" in every conversation.

Python 2,591 159 Updated Oct 7, 2025

Unified Vision-Language-Action Model

Python 256 18 Updated Oct 15, 2025

Open-source unified multimodal model

Python 5,490 481 Updated Oct 27, 2025

Code search MCP for Claude Code. Make entire codebase the context for any coding agent.

TypeScript 4,815 440 Updated Sep 16, 2025

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

2,195 94 Updated Dec 17, 2025
Next