Skip to content
View marwage's full-sized avatar

Block or report marwage

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
68 stars written in Python
Clear filter

A highly efficient implementation of Gaussian Processes in PyTorch

Python 3,786 575 Updated Oct 14, 2025

PyTorch extensions for high performance and large scale training.

Python 3,385 294 Updated Apr 26, 2025

Training and serving large-scale neural networks with auto parallelization.

Python 3,162 353 Updated Dec 9, 2023

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Python 3,141 613 Updated Jul 19, 2024

Lingvo

Python 2,853 452 Updated Oct 29, 2025

Minimalistic large language model 3D-parallelism training

Python 2,299 253 Updated Sep 3, 2025

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 2,186 365 Updated Aug 14, 2025

DDGS | Dux Distributed Global Search. A metasearch library that aggregates results from diverse web search services

Python 1,917 184 Updated Nov 7, 2025

Reference implementations of MLPerf® training benchmarks

Python 1,722 584 Updated Nov 5, 2025

Mesh TensorFlow: Model Parallelism Made Easier

Python 1,620 258 Updated Nov 17, 2023

Reference implementations of MLPerf® inference benchmarks

Python 1,480 588 Updated Nov 6, 2025

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Python 1,232 169 Updated Nov 6, 2025

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows

Python 1,139 76 Updated Oct 31, 2025

Recipes to scale inference-time compute of open models

Python 1,115 125 Updated May 22, 2025

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

Python 1,066 129 Updated Apr 17, 2024

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

Python 939 47 Updated Jun 27, 2024

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 617 54 Updated Nov 5, 2025

[ICLR 2020; IPDPS 2019] Fast and accurate minibatch training for deep GNNs and large graphs (GraphSAINT: Graph Sampling Based Inductive Learning Method).

Python 499 89 Updated Aug 12, 2022

Parallel programming with Python

Python 459 62 Updated Jul 11, 2025

A low-latency & high-throughput serving engine for LLMs

Python 439 58 Updated Oct 16, 2025
Python 392 117 Updated Nov 4, 2022

PyTorch Library for Low-Latency, High-Throughput Graph Learning on GPUs.

Python 301 37 Updated Aug 17, 2023

Graph Diffusion Convolution, as proposed in "Diffusion Improves Graph Learning" (NeurIPS 2019)

Python 273 43 Updated Apr 26, 2023
Python 252 28 Updated Jul 25, 2024

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Python 190 9 Updated Sep 21, 2024

Major CS conference publication stats (including accepted and submitted) by year.

Python 155 11 Updated Sep 4, 2025

An interference-aware scheduler for fine-grained GPU sharing

Python 150 26 Updated Jan 26, 2025

PyTorch implementation for "Parallel Sampling of Diffusion Models", NeurIPS 2023 Spotlight

Python 148 11 Updated Oct 13, 2023

Training neural networks in TensorFlow 2.0 with 5x less memory

Python 136 17 Updated Feb 21, 2022