Stars
📖 作为对《C++ Concurrency in Action - SECOND EDITION》的中文翻译。
Learning Deep Representations of Data Distributions
Flash Attention from Scratch on CUDA Ampere
This is an implementation of flash attention from scratch, without importing any external libraries.
Perplexity open source garden for inference technology
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
《Build a Large Language Model (From Scratch)》是一本深入探讨大语言模型原理与实现的电子书,适合希望深入了解 GPT 等大模型架构、训练过程及应用开发的学习者。为了让更多中文读者能够接触到这本极具价值的教材,我决定将其翻译成中文,并通过 GitHub 进行开源共享。
Ongoing research training transformer models at scale
Implement a Pytorch-like DL library in C++ from scratch, step by step
FlashMLA: Efficient Multi-head Latent Attention Kernels
😱 从源码层面,剖析挖掘互联网行业主流技术的底层实现原理,为广大开发者 “提升技术深度” 提供便利。目前开放 Spring 全家桶,Mybatis、Netty、Dubbo 框架,及 Redis、Tomcat 中间件等
Fast and memory-efficient exact attention
🚀🚀 「大模型」2小时完全从0训练64M的小参数GPT!🌏 Train a 64M-parameter GPT from scratch in just 2h!
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
My learning notes for ML SYS.
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等
Machine Learning Engineering Open Book
Source code for the book Real-Time C++, by Christopher Kormanyos
FlashInfer: Kernel Library for LLM Serving
《Template Metaprogramming with C++ 》的非专业个人翻译
《Designing Data-Intensive Application》DDIA 第一版 / 第二版 中文翻译