Skip to content
View sj-li's full-sized avatar

Block or report sj-li

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

The official implementation of "Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs"

Python 240 12 Updated Oct 31, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 64,340 7,796 Updated Dec 21, 2025

[AAAI 2026] OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model

Python 536 43 Updated Nov 30, 2025

[CVPR 2025] Official Repository for Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments

Python 218 15 Updated Dec 17, 2025

[ICCV 2025] Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation.

Python 49 Updated Aug 27, 2025

All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.

Python 1,187 49 Updated Dec 22, 2025

Code for the CVPR DriveX 2025 paper V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving

Python 3 Updated May 2, 2025

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

2,210 95 Updated Dec 17, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,770 375 Updated Oct 21, 2025

Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"

Python 285 14 Updated Apr 23, 2025

This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reas…

Python 744 20 Updated Sep 10, 2025

[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling

Python 4,141 322 Updated Sep 26, 2025

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,496 189 Updated Dec 19, 2025

[TCSVT] DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction

Python 93 5 Updated Oct 28, 2025

[CVPR 2024 Highlight] Visual Point Cloud Forecasting

Python 344 22 Updated Jul 2, 2025

[ICCV 2023 Oral]: Scaling Data Generation in Vision-and-Language Navigation

Python 206 5 Updated Jul 2, 2025

GPD-1: Generative Pre-training for Driving

Python 81 1 Updated Dec 12, 2024
Python 130 8 Updated Dec 4, 2025

[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Python 228 13 Updated Sep 20, 2024
Python 185 11 Updated Mar 29, 2025

This is the official repo of CVPR 2024 paper "Multimodal Sense-Informed Prediction of 3D Human Motions"

Python 24 2 Updated May 31, 2024

[ICCV 2025] Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding

Python 65 4 Updated Jan 10, 2025

[ICCV 2025] Official implementation of SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Python 30 2 Updated Dec 17, 2025

[NeurIPS 2024] SCube: Instant Large-Scale Scene Reconstruction using VoxSplats

Python 511 23 Updated Oct 14, 2025

[CVPR 2025 Highlight] Material Anything: Generating Materials for Any 3D Object via Diffusion

Python 331 15 Updated Aug 19, 2025

Introduce Multiscope Conception to Sequential Descision Learning

Python 73 11 Updated Dec 16, 2025

[CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Python 198 4 Updated Apr 21, 2025
Python 31 Updated Sep 20, 2024

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

Python 42 2 Updated Dec 9, 2024

[ICML'25] Official Implementation of "PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting"

Python 232 8 Updated Jul 22, 2025
Next