Stars
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
A Library for Differentiable Logic Gate Networks
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Model Compression Toolbox for Large Language Models and Diffusion Models
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA (+ more DSLs)
✨ Elevate your GitHub Profile ReadMe with Minimalistic Retro Terminal GIFs 🚀
Train high-quality text-to-image diffusion models in a data & compute efficient manner
SymbiYosys (sby) -- Front-end for Yosys-based formal verification flows
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
A Text-Based Environment for Interactive Debugging
Machine-Learning Accelerator System Exploration Tools
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
Perun is a Python package that measures the energy consumption of your applications.
MAGE: A Multi-Agent Engine for Automated RTL Code Generation
jepeake / tinygrad
Forked from tinygrad/tinygradYou like pytorch? You like micrograd? You love tinygrad! ❤️