Stars
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
Supercharge Your LLM with the Fastest KV Cache Layer
A light llama-like llm inference framework based on the triton kernel.
Collection of kernels written in Triton language
Building DeepSeek R1 from Scratch
[ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.
General technology for enabling AI capabilities w/ LLMs and MLLMs
A curated list for Efficient Large Language Models
A self-learning tutorail for CUDA High Performance Programing.
A generative speech model for daily dialogue.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Distribute and run LLMs with a single file.
Header only c++ network library, based on asio,support tcp,udp,http,websocket,rpc,ssl,icmp,serial_port,socks5.
Open-Sora: Democratizing Efficient Video Production for All
Implementation of FlashAttention in PyTorch
An improved server based on MapleSolaxia (v83 MapleStory private server)
model convert extension for stable-diffusion-webui. supports convert fp16/bf16 no-ema/ema-only safetensors