Starred repositories
Efficient vision foundation models for high-resolution generation and perception.
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive arch…
CUDA Python: Performance meets Productivity
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
Code and dataset for photorealistic Codec Avatars driven from audio
Integrate ChatGPT into your own discord bot
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
A curated list for Efficient Large Language Models
A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
To eventually become an unofficial Pytorch implementation / replication of Alphafold2, as details of the architecture get released
Implementation of Alphafold 3 from Google Deepmind in Pytorch
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
An open-source impl. of Large Reconstruction Models
[CoRL 2024] Open-TeleVision: Teleoperation with Immersive Active Visual Feedback
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
Open source implementation of AlphaFold3
Mamba-Chat: A chat LLM based on the state-space model architecture 🐍
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Fast inference from large lauguage models via speculative decoding
A distilled Segment Anything (SAM) model capable of running real-time with NVIDIA TensorRT
Implementation of Alpha Fold 3 from the paper: "Accurate structure prediction of biomolecular interactions with AlphaFold3" in PyTorch
Hosts the Multiface dataset, which is a multi-view dataset of multiple identities performing a sequence of facial expressions.
A open source reimplementation of Google's Tensor Processing Unit (TPU).
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"