Skip to content
View kzjeef's full-sized avatar

Highlights

  • Pro

Block or report kzjeef

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.

Rust 194,016 109,950 Updated Jun 8, 2026

Pure Triton kernels for Qwen3.5-27B inference on NVIDIA B200

Python 115 9 Updated Feb 28, 2026

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 208 37 Updated Dec 24, 2025
Python 2 Updated May 30, 2026

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 964 88 Updated Jun 8, 2026

Materials for learning SGLang

845 64 Updated Jan 5, 2026

Benchmarking code for running quantized kernels from vLLM and other libraries

Python 13 2 Updated Dec 3, 2024

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,317 181 Updated Jul 29, 2023

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1 Updated Jun 14, 2024

模型加速/模型压缩(已完成所有Lab)

Jupyter Notebook 11 2 Updated Dec 24, 2023

Build and publish crates with pyo3, cffi and uniffi bindings as well as rust binaries as python packages

Rust 5,655 416 Updated Jun 17, 2026

A library to analyze PyTorch traces.

Python 529 94 Updated May 29, 2026

Strongly-typed LLM Function Calling examples, run on OpenAI, Ollama, Mistral and others.

TypeScript 24 4 Updated Jun 18, 2026

[ACL 2025] Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis

Python 11 Updated Oct 5, 2025

Artificial Neural Engine Machine Learning Library

Python 1,619 76 Updated Mar 10, 2026

SoTA open-source TTS

Python 25,113 3,330 Updated Jun 10, 2026
Python 72 11 Updated Dec 21, 2025

AI Tensor Engine for ROCm

Python 465 359 Updated Jun 18, 2026

相当不错的图书,例如《数学之美》《提问的智慧》《软件工程可靠性》《时间简史》《毛泽东选集【全四卷】》《浪潮之巅》《金字塔原理》《TCP/IP卷一/卷二/卷三》《[荐]深入浅出设计模式》等;一些大的上传受限制的文件《图解TCP_IP_第5版》等在README

3,066 1,004 Updated Apr 7, 2026

Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input

Python 1,281 87 Updated Jun 8, 2025

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Python 9,319 1,345 Updated Jun 18, 2026

This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.

Python 206 9 Updated Jul 7, 2025

The Rust primer for beginners. We need native English speaker help us modify the translation.

Rust 1,789 226 Updated Mar 8, 2024

QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.

C++ 38 8 Updated Aug 29, 2025

Pluggable in-process caching engine to build and scale high performance services

C++ 1,560 318 Updated Jun 17, 2026

Drogon: A C++14/17/20 based HTTP web application framework running on Linux/macOS/Unix/Windows

C++ 14,000 1,344 Updated Jun 5, 2026

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,328 104 Updated Aug 28, 2025

Learning materials for Stanford CS149 : Parallel Computing

C 295 49 Updated Jul 31, 2021

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 11,282 1,160 Updated Jun 18, 2026
Next