-
Shanghai Jiao Tong University
- Shanghai, China
Lists (14)
Sort Name ascending (A-Z)
Starred repositories
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[NeurIPS 2025] AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈
《动手学大模型Dive into LLMs》系列编程实践教程
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
SAILOR is an inverse RL algorithm that learns world and reward models to search at test-time and recover from mistakes.
Awesome Papers about World Models in Autonomous Driving
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
Collect some World Models for Autonomous Driving (and Robotic, etc.) papers.
This repository summarizes recent advances in the VLA + RL paradigm and provides a taxonomic classification of relevant works.
Official code for the CVPR 2025 paper "Navigation World Models".
📄 Configuration files that enhance Cursor AI editor experience with custom rules and behaviors
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
No fortress, purely open ground. OpenManus is Coming.
ICRA 2019 "Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera"
[CVPR 2025] GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
[AAAI 2025] GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction
The repo for "Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator"
[CVPR 2025] Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting
[CVPR 2025] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
欢迎来到 LLM-Dojo,这里是一个开源大模型学习场所,使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩🎓👨🎓
[CVPR2025] CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
C++/CUDA library for Multi-View Stereo
[3DV 2025] Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
Stereo-LiDAR Depth Estimation with Deformable Propagation and Learned Disparity-Depth Conversion (ICRA2024)
MoBA: Mixture of Block Attention for Long-Context LLMs