Stars
All parts of Claude Code's system prompt, 18 builtin tool descriptions, sub agent prompts (Plan/Explore/Task), utility prompts (CLAUDE.md, compact, statusline, magic docs, WebFetch, Bash cmd, secur…
ReActNet: Towards Precise Binary NeuralNetwork with Generalized Activation Functions. In ECCV 2020.
PyTorch building blocks for the OLMo ecosystem
An unnecessarily tiny implementation of GPT-2 in NumPy.
Residual Context Diffusion (RCD): Repurposing discarded signals as structured priors for high-performance reasoning in dLLMs.
Code for the paper “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling”
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
[ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
[ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
A framework to compare low-bit integer and float-point formats
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
This repository contains low-bit quantization papers from 2020 to 2025 on top conference.
A selective knowledge distillation algorithm for efficient speculative decoders