kzjeef

Jiejing kzjeef

27 followers · 32 following

San Jose

Achievements

x3 x2

Achievements

x3 x2

Highlights

Lists (1)

Sort

HPC-Tools

High performance compute tools

3 repositories

Stars

callanjfox / kv-cache-tester

Python 35 13 Updated Jun 3, 2026

ultraworkers / claw-code

An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.

Rust 194,016 109,950 Updated Jun 8, 2026

RightNow-AI / qwen3.5-triton

Pure Triton kernels for Qwen3.5-27B inference on NVIDIA B200

Python 115 9 Updated Feb 28, 2026

technillogue / ptx-isa-markdown

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 208 37 Updated Dec 24, 2025

segnosys / light-trace-benchmark

Python 2 Updated May 30, 2026

MoonshotAI / checkpoint-engine

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 964 88 Updated Jun 8, 2026

sgl-project / sgl-learning-materials

Materials for learning SGLang

845 64 Updated Jan 5, 2026

neuralmagic / quant_kernel_benchmarks

Benchmarking code for running quantized kernels from vLLM and other libraries

Python 13 2 Updated Dec 3, 2024

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,317 181 Updated Jul 29, 2023

dcaox / How_to_optimize_in_GPU

Forked from Liu-xiandong/How_to_optimize_in_GPU

Cuda 1 Updated Jun 14, 2024

dcaox / MIT6.5940

模型加速/模型压缩（已完成所有Lab）

Jupyter Notebook 11 2 Updated Dec 24, 2023

PyO3 / maturin

Build and publish crates with pyo3, cffi and uniffi bindings as well as rust binaries as python packages

Rust 5,655 416 Updated Jun 17, 2026

facebookresearch / HolisticTraceAnalysis

A library to analyze PyTorch traces.

Python 529 94 Updated May 29, 2026

yomorun / llm-function-calling-examples

Strongly-typed LLM Function Calling examples, run on OpenAI, Ollama, Mistral and others.

TypeScript 24 4 Updated Jun 18, 2026

Little0o0 / Quaff

[ACL 2025] Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis

Python 11 Updated Oct 5, 2025

Anemll / Anemll

Artificial Neural Engine Machine Learning Library

Python 1,619 76 Updated Mar 10, 2026

resemble-ai / chatterbox

SoTA open-source TTS

Python 25,113 3,330 Updated Jun 10, 2026

blahgeek / emacs-appimage

Python 72 11 Updated Dec 21, 2025

ROCm / aiter

AI Tensor Engine for ROCm

Python 465 359 Updated Jun 18, 2026

singgel / Study-Floder

相当不错的图书，例如《数学之美》《提问的智慧》《软件工程可靠性》《时间简史》《毛泽东选集【全四卷】》《浪潮之巅》《金字塔原理》《TCP/IP卷一/卷二/卷三》《[荐]深入浅出设计模式》等；一些大的上传受限制的文件《图解TCP_IP_第5版》等在README

3,066 1,004 Updated Apr 7, 2026

SakanaAI / text-to-lora

Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input

Python 1,281 87 Updated Jun 8, 2025

LMCache / LMCache

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Python 9,319 1,345 Updated Jun 18, 2026

iie-ycx / DEER

This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.

Python 206 9 Updated Jul 7, 2025

rustcc / RustPrimer

The Rust primer for beginners. We need native English speaker help us modify the translation.

Rust 1,789 226 Updated Mar 8, 2024

mk1-project / quickreduce

QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.

C++ 38 8 Updated Aug 29, 2025

facebook / CacheLib

Pluggable in-process caching engine to build and scale high performance services

C++ 1,560 318 Updated Jun 17, 2026

drogonframework / drogon

Drogon: A C++14/17/20 based HTTP web application framework running on Linux/macOS/Unix/Windows

C++ 14,000 1,344 Updated Jun 5, 2026

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,328 104 Updated Aug 28, 2025

PKUFlyingPig / CS149-parallel-computing

Learning materials for Stanford CS149 : Parallel Computing

C 295 49 Updated Jul 31, 2021

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 11,282 1,160 Updated Jun 18, 2026