Lists (1)
Sort Name ascending (A-Z)
Stars
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
Fast and memory-efficient exact attention
A basic introduction to coding in modern C++.
Trio – a friendly Python library for async concurrency and I/O
Lightweight and extensible LLM Inference serving benchmark tool written in Rust.
Empowering everyone to build reliable and efficient software.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Projects for an undergraduate OS course
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
A listing of compiler, language and runtime teams for people looking for jobs in this area
A high-throughput and memory-efficient inference and serving engine for LLMs
A Easy-to-understand TensorOp Matmul Tutorial
Curated coding interview preparation materials for busy software engineers
Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Memory footprint reduction for transformer models
An open-source efficient deep learning framework/compiler, written in python.
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
In-Place Activated BatchNorm for Memory-Optimized Training of DNNs
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Course Page for Computer Graphics course