Lists (4)
Sort Name ascending (A-Z)
Stars
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashInfer: Kernel Library for LLM Serving
Claude Code 中文全面上手指南。基于 luongnv89/claude-howto 本土化重写,面向中国小白用户,保留命令与配置兼容性,并附学习路径与本地化校验护栏。
分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等
Ascend PyTorch adapter (torch_npu). Mirror of https://gitcode.com/Ascend/pytorch
Fast and memory-efficient exact attention
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Repo for Qwen Image Finetune
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …
Train transformer language models with reinforcement learning.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Enjoy the magic of Diffusion models!
Repo for Qwen Image Finetune
[NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models
Officiel code for PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献!💥(100+ LLM/RL Algorithm Maps )
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
Fast and memory-efficient exact attention
LLM Finetuning with peft
Awesome list for LLM quantization
Awesome LLM compression research papers and tools.
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Summarize existing representative LLMs text datasets.
A curated list of awesome open-source libraries for production LLM
A curated list for Efficient Large Language Models
Windows 和 Office 激活工具 MAS (Microsoft-Activation-Scripts) 的汉化版