Stars
Machine Learning Engineering Open Book
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
DeepEP: an efficient expert-parallel communication library
An Open Source Machine Learning Framework for Everyone
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
oneAPI Deep Neural Network Library (oneDNN)
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
FlashMLA: Efficient Multi-head Latent Attention Kernels
Performance-portable, length-agnostic SIMD with runtime dispatch
Tiptop is a performance monitoring tool for Linux. It provides a dynamic real-time view of the tasks running in the system. tiptop is very similar to the top utility, but most of the information di…
High performance server-side application framework
A JIT assembler for x86/x64 architectures supporting FPU, MMX, SSE (1-4), AVX (1-2, 512), APX, and AVX10.2
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
A composable and fully extensible C++ execution engine library for data management systems.
A High-Performance JIT-Based C++ Expression/Script Execution Engine with SIMD Vectorization Support
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
DeepSeek Coder: Let the Code Write Itself
Library providing helpers for the Linux kernel io_uring support