Lists (7)
Sort Name ascending (A-Z)
Stars
TokenSpeed is a speed-of-light LLM inference engine.
A Triton JIT runtime and ffi provider in C++
Efficient Triton Kernels for LLM Training
Towards Holistic evaluation of Generative Diffusion Transformers!
An LLM post-training framework with vLLM for RL Scaling
Skills for writing tilelang and debugging with CUDA toolkits.
Large Language Model (LLM) Systems Paper List
从零开始玩转OpenClaw:最全面的中文教程,涵盖安装、配置、实战案例和避坑指南(github版)
TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)
let coding agents use ncu skills analysis cuda program automatically!
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Fast and memory efficient c++ flat hash table/map/set
Claude Code agent skill for autonomous AI/scientific research workflows
Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepowe…
KsanaDiT: High-Performance DiT (Diffusion Transformer) Inference Framework for Video & Image Generation
from vibe coding to agentic engineering - practice makes claude perfect
how to optimize some algorithm in cuda.
A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs dir…
LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model
Create beautiful slides on the web using a coding agent's frontend skills