- San Jose
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
Pure Triton kernels for Qwen3.5-27B inference on NVIDIA B200
PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
Benchmarking code for running quantized kernels from vLLM and other libraries
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
Build and publish crates with pyo3, cffi and uniffi bindings as well as rust binaries as python packages
A library to analyze PyTorch traces.
Strongly-typed LLM Function Calling examples, run on OpenAI, Ollama, Mistral and others.
[ACL 2025] Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis
Artificial Neural Engine Machine Learning Library
相当不错的图书,例如《数学之美》《提问的智慧》《软件工程可靠性》《时间简史》《毛泽东选集【全四卷】》《浪潮之巅》《金字塔原理》《TCP/IP卷一/卷二/卷三》《[荐]深入浅出设计模式》等;一些大的上传受限制的文件《图解TCP_IP_第5版》等在README
Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.
The Rust primer for beginners. We need native English speaker help us modify the translation.
QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.
Pluggable in-process caching engine to build and scale high performance services
Drogon: A C++14/17/20 based HTTP web application framework running on Linux/macOS/Unix/Windows
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Learning materials for Stanford CS149 : Parallel Computing
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉