-
Cornell Tech
- Shanghai ↔️ New York
- www.yimingdou.com
- @_YimingDou
Lists (17)
Sort Name ascending (A-Z)
Stars
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Realtime & high-frequency control interfaces for various robot arms including bi-manual I2RT YAM, Franka Panda, with manual tele-operation control or autonomous policy control
Dexbotic: Open-Source Vision-Language-Action Toolbox
Modern, minimal, and modular LaTeX CV template ✨ 📄
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
Official implementation of Diffusion Policy Policy Optimization, arxiv 2024
PyTorch code and models for VJEPA2 self-supervised learning from video.
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Reference PyTorch implementation and models for DINOv3
Distributed Robot Interaction Dataset.
An efficient video loader for deep learning with smart shuffling that's super easy to digest
Official code for the CVPR 2025 paper "Navigation World Models".
robomimic: A Modular Framework for Robot Learning from Demonstration
DROID Policy Learning and Evaluation
DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation
Official Implementation of Paper Transfer between Modalities with MetaQueries
Official PyTorch implementation of One-Minute Video Generation with Test-Time Training