Skip to content
View aizyler's full-sized avatar

Block or report aizyler

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Flash Attention from Scratch on CUDA Ampere

Assembly 158 22 Updated Sep 1, 2025

This is an implementation of flash attention from scratch, without importing any external libraries.

Cuda 22 2 Updated Mar 15, 2026

Perplexity open source garden for inference technology

Rust 385 36 Updated Dec 25, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,256 383 Updated Jan 17, 2026

《Build a Large Language Model (From Scratch)》是一本深入探讨大语言模型原理与实现的电子书,适合希望深入了解 GPT 等大模型架构、训练过程及应用开发的学习者。为了让更多中文读者能够接触到这本极具价值的教材,我决定将其翻译成中文,并通过 GitHub 进行开源共享。

HTML 3,457 584 Updated Sep 7, 2025

The best ChatGPT that $100 can buy.

Python 50,516 6,623 Updated Mar 27, 2026

Ongoing research training transformer models at scale

Python 15,829 3,765 Updated Mar 28, 2026

Implement a Pytorch-like DL library in C++ from scratch, step by step

C++ 227 31 Updated Mar 26, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,541 1,006 Updated Feb 6, 2026

😱 从源码层面,剖析挖掘互联网行业主流技术的底层实现原理,为广大开发者 “提升技术深度” 提供便利。目前开放 Spring 全家桶,Mybatis、Netty、Dubbo 框架,及 Redis、Tomcat 中间件等

Java 23,137 4,256 Updated Jan 9, 2026

Fast and memory-efficient exact attention

Python 23,022 2,559 Updated Mar 28, 2026

🚀🚀 「大模型」2小时完全从0训练64M的小参数GPT!🌏 Train a 64M-parameter GPT from scratch in just 2h!

Python 44,254 5,316 Updated Mar 27, 2026

A tutorial for CUDA&PyTorch

Cuda 371 51 Updated Mar 23, 2026

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

C++ 2,666 248 Updated Jan 22, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 338,922 66,658 Updated Mar 28, 2026

My learning notes for ML SYS.

Python 5,788 375 Updated Mar 19, 2026

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 6,540 861 Updated Dec 22, 2025

分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等

Jupyter Notebook 1,397 109 Updated Mar 28, 2026

Machine Learning Engineering Open Book

Python 17,561 1,117 Updated Mar 16, 2026

Nano vLLM

Python 12,472 1,797 Updated Nov 3, 2025
Cuda 3 Updated Jan 23, 2026

Source code for the book Real-Time C++, by Christopher Kormanyos

C++ 776 189 Updated Feb 27, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,228 832 Updated Mar 28, 2026

High Performance LLM Inference Operator Library

C++ 800 74 Updated Feb 5, 2026

《Template Metaprogramming with C++ 》的非专业个人翻译

TeX 96 17 Updated May 25, 2023

《Designing Data-Intensive Application》DDIA 第一版 / 第二版 中文翻译

Python 22,827 4,524 Updated Feb 24, 2026

The seL4 microkernel

C 5,385 754 Updated Mar 24, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 74,533 14,877 Updated Mar 28, 2026

Tile primitives for speedy kernels

Cuda 3,279 267 Updated Mar 28, 2026

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 930 92 Updated Jan 14, 2026
Next