Skip to content
View dengzheng-cloud's full-sized avatar

Block or report dengzheng-cloud

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 566 124 Updated Dec 23, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 6,418 812 Updated Dec 23, 2025

Nano vLLM

Python 10,043 1,256 Updated Nov 3, 2025

A light llama-like llm inference framework based on the triton kernel.

Python 167 25 Updated Sep 20, 2025

Collection of kernels written in Triton language

173 9 Updated Apr 5, 2025

Building DeepSeek R1 from Scratch

Jupyter Notebook 730 118 Updated Mar 21, 2025

[ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.

Python 253 22 Updated Oct 30, 2024

General technology for enabling AI capabilities w/ LLMs and MLLMs

Python 4,230 357 Updated Dec 22, 2025

Composable camera rigs

Rust 474 39 Updated Jul 22, 2024
C++ 60 7 Updated Sep 17, 2024

A curated list for Efficient Large Language Models

Python 1,918 147 Updated Jun 17, 2025

小彭老师特意从零开始研发的一款教学用,基于 C++17 回调函数的异步 HTTP 服务器

C++ 174 22 Updated Jul 24, 2024

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 791 75 Updated Jun 30, 2025

TTS

Jupyter Notebook 49 7 Updated Jun 4, 2024

A generative speech model for daily dialogue.

Python 38,384 4,167 Updated Dec 3, 2025

Tile primitives for speedy kernels

Cuda 3,012 219 Updated Dec 9, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,027 887 Updated Dec 4, 2025

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 8,997 704 Updated Aug 18, 2024

Material for gpu-mode lectures

Jupyter Notebook 5,446 552 Updated Dec 8, 2025

LLM training in simple, raw C/CUDA

Cuda 28,456 3,337 Updated Jun 26, 2025

Distribute and run LLMs with a single file.

C 23,547 1,253 Updated Dec 19, 2025

Header only c++ network library, based on asio,support tcp,udp,http,websocket,rpc,ssl,icmp,serial_port,socks5.

C++ 904 196 Updated Oct 28, 2025

Fast Multimodal LLM on Mobile Devices

C++ 1,292 156 Updated Dec 23, 2025

Open-Sora: Democratizing Efficient Video Production for All

Python 28,145 2,818 Updated Apr 30, 2025

Implementation of FlashAttention in PyTorch

Python 178 20 Updated Jan 12, 2025

An improved server based on MapleSolaxia (v83 MapleStory private server)

Java 1,149 864 Updated Dec 28, 2019
Python 6,807 1,151 Updated Dec 21, 2025

model convert extension for stable-diffusion-webui. supports convert fp16/bf16 no-ema/ema-only safetensors

Python 339 41 Updated Dec 24, 2024
Next