Stars
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
A framework for building native applications using React
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Flash Attention in ~100 lines of CUDA (forward pass only)
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
《中国食物成分表标准版(第6版)》中“能量和食物一般营养成分”部分的表格截图,以及转换为特定格式的json文件。
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
🐳 A curated list of Docker resources and projects
MathJax source code for version 3 and beyond
The official Python library for the OpenAI API
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
A Datacenter Scale Distributed Inference Serving Framework
FlashMLA: Efficient Multi-head Latent Attention Kernels
Optimized primitives for collective multi-GPU communication
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Tensors and Dynamic neural networks in Python with strong GPU acceleration
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
</> htmx - high power tools for HTML
Ascend PyTorch adapter (torch_npu). Mirror of https://gitcode.com/Ascend/pytorch
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程