Starred repositories
Explore training for quantized models
A lightweight library for portable low-level GPU computation using WebGPU.
A high-throughput and memory-efficient inference and serving engine for LLMs
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
PyTorch code and models for V-JEPA self-supervised learning from video.
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for…
A micro Vulkan compute pipeline and a collection of benchmarking compute shaders
Machine Learning Engineering Open Book
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
A repository for log-time feedforward networks
Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) l…
A holistic way of understanding how WebRTC and its protocols run in practice, with code and detailed documentation.
An Ethereum-compatible smart contract parachain on Polkadot
Substrate: The platform for blockchain innovators
Ethereum JSON-RPC multi-transport client. Rust implementation of web3 library. ENS address: rust-web3.eth
NuCypher's reference implementation of Umbral (threshold proxy re-encryption) using OpenSSL and Cryptography.io
The reference implementation of the ERC-721 non-fungible token standard.