AI Inference
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
dInfer: An Efficient Inference Framework for Diffusion Language Models
DFlash: Block Diffusion for Flash Speculative Decoding
It is said that, Ilya Sutskever gave John Carmack this reading list of ~ 30 research papers on deep learning.
Sutskever 30 implementations inspired by https://papercode.vercel.app/ | For Agents, use https://github.com/pageman/Sutskever-Agent | Polyglot / Multi-Backed version at https://github.com/pageman/s…
A lightweight inference engine supporting speculative speculative decoding (SSD).
AI 基础知识 - GPU 架构、CUDA 编程、大模型基础及AI Agent 相关知识。
Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language