Skip to content
View jinhuix's full-sized avatar
💜
Focusing
💜
Focusing

Block or report jinhuix

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

从 NLP 到 LLM 的算法全栈教程,在线阅读地址:https://datawhalechina.github.io/base-llm/

Jupyter Notebook 711 73 Updated Apr 22, 2026

Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞

Python 11,724 1,353 Updated Apr 23, 2026

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 9,118 2,321 Updated Mar 30, 2026

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,061 87 Updated Sep 4, 2024

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,793 1,090 Updated Apr 20, 2026

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 1,506 264 Updated Apr 14, 2026

GPU programming related news and material links

2,114 126 Updated Mar 8, 2026

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 6,873 895 Updated Dec 22, 2025

Summary of some awesome work for optimizing LLM inference

241 9 Updated Feb 14, 2026

Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]

Python 56 6 Updated Mar 5, 2025

分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等

Jupyter Notebook 1,935 155 Updated Apr 26, 2026

Nano vLLM

Python 13,152 2,005 Updated Apr 26, 2026
Python 7 5 Updated May 28, 2025

A Throughput-Optimized Pipeline Parallel Inference System for Large Language Models

Python 49 3 Updated Dec 24, 2025

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 12,151 942 Updated Mar 11, 2025

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. ACM Computing Surveys, 2026.

721 43 Updated Apr 27, 2026

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 6,941 773 Updated Apr 20, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 4,071 599 Updated Mar 13, 2026
Python 311 34 Updated Jul 10, 2025

InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference

C 17 1 Updated Mar 30, 2025
Python 41 5 Updated Oct 16, 2025

Persist and reuse KV Cache to speedup your LLM.

Python 274 73 Updated Apr 27, 2026
Python 179 31 Updated Jul 15, 2025

Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''

Python 31 2 Updated Oct 24, 2024

Official code repo for the O'Reilly Book - "Hands-On Large Language Models"

Jupyter Notebook 25,437 5,900 Updated Apr 24, 2026

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 57,262 9,812 Updated Nov 12, 2025

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 24,248 3,219 Updated Aug 15, 2024

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

695 24 Updated Apr 15, 2026

Supercharge Your LLM with the Fastest KV Cache Layer

Python 8,136 1,133 Updated Apr 27, 2026
Next