Stars
The best-benchmarked open-source AI memory system. And it's free.
AI agents running research on single-GPU nanochat training automatically
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
本书为《C++17 the complete guide》的个人中文翻译,仅供学习和交流使用,侵删
2021年最新整理, C++ 学习资料,含C++ 11 / 14 / 17 / 20 / 23 新特性、入门教程、推荐书籍、优质文章、学习笔记、教学视频等
SGLang is a high-performance serving framework for large language models and multimodal models.
A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience
This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".
Hackable and optimized Transformers building blocks, supporting a composable construction.
PyTorch emulation library for Microscaling (MX)-compatible data formats
A minimal GPU design in Verilog to learn how GPUs work from the ground up
The official repository for the gem5 computer-system architecture simulator.
NumPy and SciPy on Multi-Node Multi-GPU systems
Approaching (Almost) Any Machine Learning Problem
Virtual whiteboard for sketching hand-drawn like diagrams
GNU toolchain for RISC-V, including GCC
FlashMLA: Efficient Multi-head Latent Attention Kernels
Verilator open-source SystemVerilog simulator and lint system
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
shallowrabbit / LexYaccTest
Forked from deepfox/LexYaccTestLex yacc tutorial
CUDA Templates and Python DSLs for High-Performance Linear Algebra