Skip to content
View xiaojieli0903's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report xiaojieli0903

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…

527 12 Updated Oct 9, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 23,685 2,640 Updated Aug 12, 2024

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 6,703 657 Updated Jan 22, 2025

[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Python 1,452 118 Updated Oct 7, 2025

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

Python 22,040 1,648 Updated Sep 24, 2025

InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation​

Python 43 Updated Sep 18, 2025

Fully Open Framework for Democratized Multimodal Training

Python 457 28 Updated Sep 30, 2025

InternVLA-M1: A Spatially Grounded Foundation Model for Generalist Robot Policy

Python 125 3 Updated Oct 3, 2025

InternRobotics' open platform for building generalized navigation foundation models.

Python 317 25 Updated Oct 9, 2025

[ICRA'24 Best UAV Paper Award Finalist] An Efficient Global Planner for Aerial Coverage

C++ 299 24 Updated Jul 13, 2025
Python 4,293 408 Updated Sep 14, 2025

Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"

Python 245 13 Updated Sep 28, 2025

Nav-R1: Reasoning and Navigation in Embodied Scenes

Python 55 Updated Sep 30, 2025

A paper list of some recent Mamba-based CV works.

409 20 Updated Oct 6, 2025

The new spin-off of Visual Language Navigation.

29 Updated Jul 7, 2025
Jupyter Notebook 249 25 Updated Jan 14, 2025

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 7,563 476 Updated Oct 3, 2025

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,006 36 Updated Oct 4, 2025
Python 88 7 Updated Feb 11, 2025

Latest Papers, Codes and Datasets on VTG-LLMs.

37 Updated Sep 25, 2025

Awesome collection of resources and papers on Diffusion Models for Robotic Manipulation.

714 33 Updated Aug 31, 2025

[TMLR 2025] Efficient Reasoning Models: A Survey

Python 265 16 Updated Sep 30, 2025

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 706 38 Updated Sep 19, 2025

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

705 36 Updated Sep 27, 2025

About Awesome things towards foundation agents. Papers / Repos / Blogs / ...

1,756 171 Updated Jul 28, 2025

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,202 56 Updated Oct 1, 2025

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

1,712 72 Updated Oct 8, 2025
Next