Stars
An open source library designed to provide community examples of Joint Embedding Predictive Architectures (JEPAs). It contains code and examples for learning representations from images, video, and…
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
A PyTorch native platform for training generative AI models
Reference PyTorch implementation and models for DINOv3
🔥 A minimal training framework for scaling FLA models
PyTorch code and models for VJEPA2 self-supervised learning from video.
Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Official code for the CVPR 2025 paper "Navigation World Models".
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy
GenEval: An object-focused framework for evaluating text-to-image alignment
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
Load tensorboard event logs as pandas DataFrames for scientific plotting; Supports both PyTorch and TensorFlow
FFmpeg libav tutorial - learn how media works from basic to transmuxing, transcoding and more. Translations: 🇺🇸 🇨🇳 🇰🇷 🇪🇸 🇻🇳 🇧🇷 🇷🇺
This is an official implementation of TubeR: Tubelet Transformer for Video Action Detection
Unofficial implementation of: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics
An end-to-end PyTorch framework for image and video classification
Inflate DenseNet and ResNet as per I3D with ImageNet weight transfer
Out of time: automated lip sync in the wild
Code to reproduce the results in the FAIR research papers "Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples" https://arxiv.org/abs/…
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO