Stars
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
High-performance In-browser LLM Inference Engine
Machine Learning Engineering Open Book
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
Official inference library for Mistral models
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)
Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) fo…
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Speed up Stable Diffusion with this one simple trick!
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
A playbook for systematically maximizing the performance of deep learning models.
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Transformer related optimization, including BERT, GPT
Summaries and resources for Designing Machine Learning Systems book (Chip Huyen, O'Reilly 2022)
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…
Generic Neural Architecture Search via Regression (NeurIPS'21 Spotlight)
Hackable and optimized Transformers building blocks, supporting a composable construction.
🚀 PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)"
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
Towards Unified Keyframe Propagation Models
Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
Fast and memory-efficient exact attention