Stars
CXL Memory Resource Kit top-level repository
DFlash: Block Diffusion for Flash Speculative Decoding
Ongoing research training transformer models at scale
Official inference framework for 1-bit LLMs
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
sherrywong1220 / NPB-CPP
Forked from GMAP/NPB-CPPThe NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures
A benchmarking suite to evaluate the performance of persistent memory access (PerMA-Bench @ VLDB '22)
Prefetching and efficient data path for memory disaggregation
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
LLM papers I'm reading, mostly on inference and model compression
Latency and Memory Analysis of Transformer Models for Training and Inference
📺 Discover the latest machine learning / AI courses on YouTube.
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Anomaly detection related books, papers, videos, and toolboxes. Last update late 2025 for LLM and VLM works!
Running large language models on a single GPU for throughput-oriented scenarios.
A list of papers about distributed consensus.
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Pytorch domain library for recommendation systems
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2