Skip to content
View 2017ZYS's full-sized avatar

Block or report 2017ZYS

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A heap memory profiler for Linux

C++ 4,030 234 Updated Mar 21, 2026

基于《cuda编程-基础与实践》(樊哲勇 著)的cuda学习之路。

Cuda 413 83 Updated Jan 15, 2024

Fast and customizable text tokenization library with BPE and SentencePiece support

C++ 331 80 Updated Jan 10, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,781 517 Updated Mar 13, 2026

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...)…

Python 13,326 1,295 Updated Mar 24, 2026

📚 从零开始构建大模型

Jupyter Notebook 27,870 2,573 Updated Mar 16, 2026

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程

Jupyter Notebook 29,240 2,876 Updated Mar 22, 2026

Nano vLLM

Python 12,405 1,775 Updated Nov 3, 2025

UniFace: A Unified Face Analysis Library in Python built on ONNX Runtime

Python 634 86 Updated Mar 24, 2026

MobileGaze: Real-Time Gaze Estimation models using ResNet 18/34/50, MobileNet v2 and MobileOne s0-s4 | In PyTorch >> ONNX Runtime Inference

Python 178 36 Updated Feb 14, 2026

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 89,164 13,597 Updated Mar 21, 2026

这是一个基于C++实现的从零开始的大模型推理框架

C++ 10 1 Updated Nov 18, 2024

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 630 83 Updated Sep 11, 2024

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

JavaScript 1,618 198 Updated Nov 19, 2025

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

Cuda 255 14 Updated Feb 13, 2026

Fast C++ logging library.

C++ 28,537 5,093 Updated Mar 14, 2026

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 699 146 Updated Mar 16, 2026

Implement custom operators in PyTorch with cuda/c++

Python 77 11 Updated Jan 1, 2023

Fast, Flexible and Portable Structured Generation

C++ 1,595 133 Updated Mar 24, 2026

Community maintained hardware plugin for vLLM on Ascend

Python 1,824 972 Updated Mar 24, 2026

fast log and exp functions for AVX2/AVX-512

Python 243 37 Updated Mar 12, 2025

校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

C++ 519 134 Updated Oct 28, 2025

The Hugging Face course on Transformers

MDX 3,793 1,285 Updated Mar 17, 2026

🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime

Python 128 41 Updated Mar 12, 2026

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Python 3,341 629 Updated Mar 13, 2026

《动手学大模型Dive into LLMs》系列编程实践教程

Jupyter Notebook 23,663 2,792 Updated Oct 10, 2025

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,966 312 Updated Mar 24, 2026

Abseil Common Libraries (C++)

C++ 17,139 2,986 Updated Mar 24, 2026

Lightweight C++ command line option parser

C++ 4,720 640 Updated Mar 18, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,280 840 Updated Mar 22, 2026
Next