Stars
Implementation of "SimVLA: A Simple VLA Baseline for Robotic Manipulation"
[ICLR 2026 🔥 ] Official implementation of "UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing"
[NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
The new spin-off of Visual Language Navigation.
Awesome Unified Multimodal Models
🌐 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
A paper list for spatial reasoning
😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.
A curated list of academic papers and resources on Physical AI — focusing on Vision-Language-Action (VLA) models, world models, embodied ai, and robotic foundation models.
A collection of token reduction (token pruning, merging, clustering, etc.) techniques for ML/AI
A paper list of some recent works about Token Compress for Vit and VLM
👀「大模型」2小时从0训练65M参数的视觉多模态VLM!Train a 65M-parameter VLM from scratch in just 2h!
🧠「大模型」2小时完全从0训练64M的小参数LLM!Train a 64M-parameter LLM from scratch in just 2h!
《Build a Large Language Model (From Scratch)》是一本深入探讨大语言模型原理与实现的电子书,适合希望深入了解 GPT 等大模型架构、训练过程及应用开发的学习者。为了让更多中文读者能够接触到这本极具价值的教材,我决定将其翻译成中文,并通过 GitHub 进行开源共享。
Isaac Gym Environments for Legged Robots
Deep RL for MPC control of Quadruped Robot Locomotion
Builds 2D signed distance fields from images, 3D signed distance fields from pointclouds, 3D signed distance fields from Octomap, provides a lightweight signed distance field library, message types…
Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.
Vision-and-Language Navigation in Continuous Environments using Habitat
Official repository for the paper "Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance" by Bhattacharya, et al. (2024) from GRASP, Penn.