Stars
A highly efficient implementation of Gaussian Processes in PyTorch
PyTorch extensions for high performance and large scale training.
Training and serving large-scale neural networks with auto parallelization.
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
Minimalistic large language model 3D-parallelism training
deepspeedai / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
DDGS | Dux Distributed Global Search. A metasearch library that aggregates results from diverse web search services
Reference implementations of MLPerf® training benchmarks
Mesh TensorFlow: Model Parallelism Made Easier
Reference implementations of MLPerf® inference benchmarks
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
Recipes to scale inference-time compute of open models
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
[ICLR 2020; IPDPS 2019] Fast and accurate minibatch training for deep GNNs and large graphs (GraphSAINT: Graph Sampling Based Inductive Learning Method).
A low-latency & high-throughput serving engine for LLMs
PyTorch Library for Low-Latency, High-Throughput Graph Learning on GPUs.
Graph Diffusion Convolution, as proposed in "Diffusion Improves Graph Learning" (NeurIPS 2019)
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
Major CS conference publication stats (including accepted and submitted) by year.
An interference-aware scheduler for fine-grained GPU sharing
PyTorch implementation for "Parallel Sampling of Diffusion Models", NeurIPS 2023 Spotlight
Training neural networks in TensorFlow 2.0 with 5x less memory