-
FAIR @ Meta
- Paris
- angelvillarcorrales.com
- @angelvillar96
Highlights
- Pro
Stars
Official codebase for DMT-JEPA (Discriminative Masked Targets for Joint-Embedding Predictive Architecture)
Implementation of the Large Behavioral Model architecture for dexterous manipulation from Toyota Research Institute
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"
Codebase for the Neurips 2023 spotlight paper "Curriculum Learning with Infant Egocentric Videos"
Menagerie of video models trained on various video datasets
Official implementation of `Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning`, CVPR 2025
Evaluation benchmarks for BabyVLM, adapted from lmms-eval
Official implementation of EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation
Reference PyTorch implementation and models for DINOv3
This is the code repository for IntPhys 2, a video benchmark designed to evaluate the intuitive physics understanding of deep learning models.
A python library for efficient depth compression to increase throughput and minimize memory footprint
[ICML2025] Canonical Rank Adaptation: An Efficient Fine-Tuning Strategy for Vision Transformers
Nvidia GEAR Lab's initiative to solve the robotics data problem using world models
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
🚀 Efficient implementations of state-of-the-art linear attention models
Official PyTorch implementation for "FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis".
A simple testbed for robotics manipulation policies
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
Simple application to serve various vision models via http.
Reimplementation of GR-1, a generalized policy for robotics manipulation.
Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"
🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.
The simplest, fastest repository for training/finetuning small-sized VLMs.