sj-li

Follow

Shijie Li sj-li

Follow

PhD student in Bonn University

59 followers · 4 following

Bonn University
Bonn
http://sj-li.com

Achievements

Achievements

Starred repositories

Gorilla-Lab-SCUT / PaDT

The official implementation of "Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs"

Python 240 12 Updated Oct 31, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 64,340 7,796 Updated Dec 21, 2025

DriveVLA / OpenDriveVLA

[AAAI 2026] OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model

Python 536 43 Updated Nov 30, 2025

princeton-computational-imaging / scenario-dreamer

[CVPR 2025] Official Repository for Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments

Python 218 15 Updated Dec 17, 2025

OrangeSodahub / InfGen

[ICCV 2025] Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation.

Python 49 Updated Aug 27, 2025

lightly-ai / lightly-train

All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.

Python 1,187 49 Updated Dec 22, 2025

jalu1870 / V3LMA-Visual-3D-enhanced-Language-Model-for-Autonomous-Driving

Code for the CVPR DriveX 2025 paper V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving

Python 3 Updated May 2, 2025

jonyzhang2023 / awesome-embodied-vla-va-vln

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

2,210 95 Updated Dec 17, 2025

om-ai-lab / VLM-R1

Solve Visual Understanding with Reinforced VLMs

Python 5,770 375 Updated Oct 21, 2025

showlab / FAR

Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"

Python 285 14 Updated Apr 23, 2025

Osilly / Vision-R1

This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reas…

Python 744 20 Updated Sep 10, 2025

manycore-research / SpatialLM

[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling

Python 4,141 322 Updated Sep 26, 2025

2U1 / Qwen-VL-Series-Finetune

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,496 189 Updated Dec 19, 2025

AlphaPlusTT / DAOcc

[TCSVT] DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction

Python 93 5 Updated Oct 28, 2025

OpenDriveLab / ViDAR

[CVPR 2024 Highlight] Visual Point Cloud Forecasting

Python 344 22 Updated Jul 2, 2025

wz0919 / ScaleVLN

[ICCV 2023 Oral]: Scaling Data Generation in Vision-and-Language Navigation

Python 206 5 Updated Jul 2, 2025

wzzheng / GPD

GPD-1: Generative Pre-training for Driving

Python 81 1 Updated Dec 12, 2024

zuosc19 / GaussianWorld

Python 130 8 Updated Dec 4, 2025

GengzeZhou / NavGPT-2

[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Python 228 13 Updated Sep 20, 2024

LYX0501 / InstructNav

Python 185 11 Updated Mar 29, 2025

kjle6 / SIF3D-master

This is the official repo of CVPR 2024 paper "Multimodal Sense-Informed Prediction of 3D Human Motions"

Python 24 2 Updated May 31, 2024

YkiWu / EmbodiedOcc

[ICCV 2025] Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding

Python 65 4 Updated Jan 10, 2025

GengzeZhou / SAME

[ICCV 2025] Official implementation of SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Python 30 2 Updated Dec 17, 2025

nv-tlabs / SCube

[NeurIPS 2024] SCube: Instant Large-Scale Scene Reconstruction using VoxSplats

Python 511 23 Updated Oct 14, 2025

3DTopia / MaterialAnything

[CVPR 2025 Highlight] Material Anything: Generating Materials for Any 3D Object via Diffusion

Python 331 15 Updated Aug 19, 2025

Rex-sys-hk / PlanScope

Introduce Multiscope Conception to Sequential Descision Learning

Python 73 11 Updated Dec 16, 2025

iris0329 / SeeGround

[CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Python 198 4 Updated Apr 21, 2025

3DAgentWorld / VisAnything

Python 31 Updated Sep 20, 2024

YunzeMan / Situation3D

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

Python 42 2 Updated Dec 9, 2024

cvlab-kaist / PF3plat

[ICML'25] Official Implementation of "PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting"

Python 232 8 Updated Jul 22, 2025

Starred topics

cvpr2019