Stars
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
StyleGAN - Official TensorFlow Implementation
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
Easy-to-use image segmentation library with awesome pre-trained model zoo, supporting wide-range of practical tasks in Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image …
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Google AI 2018 BERT pytorch implementation
High-resolution models for human tasks.
A PyTorch native platform for training generative AI models
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
PyTorch code and models for V-JEPA self-supervised learning from video.
PyTorch code and models for VJEPA2 self-supervised learning from video.
Efficient vision foundation models for high-resolution generation and perception.
[IROS 2025 Best Paper Award Finalist & IEEE TRO 2026] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
Muon is an optimizer for hidden layers in neural networks
CVNets: A library for training computer vision networks
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI