Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
🧠「大模型」2小时完全从0训练64M的小参数LLM!Train a 64M-parameter LLM from scratch in just 2h!
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
👀「大模型」2小时从0训练65M参数的视觉多模态VLM!Train a 65M-parameter VLM from scratch in just 2h!
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
The simplest, fastest repository for training/finetuning small-sized VLMs.
[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8…
A Code Release for Mip-NeRF 360, Ref-NeRF, and RawNeRF
A PyTorch Library for Multi-Task Learning
[ICLR 2026] SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
This repository contains the code for the paper "Occupancy Networks - Learning 3D Reconstruction in Function Space"
Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (ECCV 2020)
State-of-the-art, simple, fast unbounded / large-scale NeRFs.
[ICCV 2023 & ICLR 2026] VAD: Vectorized Scene Representation for Efficient Autonomous Driving
A 3DGS framework for omni urban scene reconstruction and simulation.
A list of recent papers, libraries and datasets about 3D shape/scene analysis (by topics, updating).
Code for "Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans" CVPR 2021 best paper candidate
improves over nerf in 360 capture of unbounded scenes
OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.
[NeurIPS 2024] A Generalizable World Model for Autonomous Driving
Fisheye or Normal Camera Intrinsic and Extrinsic Calibration. Surround Camera Bird Eye View Generator.