Skip to content
View wu-kan's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Sun Yat-sen University
  • Guangzhou, Guangdong, China
  • 02:55 (UTC +08:00)

Highlights

  • Pro

Organizations

@SYSU-SCC

Block or report wu-kan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模型的功能

Python 80 17 Updated Aug 11, 2025

Nano vLLM

Python 7,007 891 Updated Aug 31, 2025

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 15,268 2,196 Updated Sep 3, 2025

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 4,671 657 Updated Oct 9, 2025

example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory

C 145 36 Updated Jul 30, 2024

UCX Demo Application

C++ 1 1 Updated Jan 4, 2023

SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.

C++ 1,756 72 Updated Jun 16, 2025

open-source coding LLM for software engineering tasks

Python 967 117 Updated Sep 30, 2025

Nvidia Instruction Set Specification Generator

Python 295 16 Updated Jul 9, 2024

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

64,655 7,190 Updated Jun 4, 2025

A printable low-profile 60% mechanical keyboard kit with 7mm front height and foldable footstand.

94 1 Updated May 17, 2025

Tile primitives for speedy kernels

Cuda 2,797 182 Updated Sep 21, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 11,799 1,787 Updated Oct 9, 2025

gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

Python 41 3 Updated Sep 29, 2025

UNR: Unified Notifiable RMA Library for HPC

C 8 Updated Aug 28, 2024
C++ 106 13 Updated May 16, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,154 102 Updated Oct 2, 2025

Material for gpu-mode lectures

Jupyter Notebook 5,144 513 Updated Sep 23, 2025

A tool for examining GPU scheduling behavior.

Cuda 88 20 Updated Aug 17, 2024

A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)

C++ 423 70 Updated Jan 13, 2025

HTML/JS port of CUDA Occupancy Calculator

CoffeeScript 17 8 Updated Nov 23, 2021

Parboil benchmark

C 4 8 Updated Nov 7, 2016

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 691 53 Updated Aug 6, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 896 43 Updated Sep 17, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 427 63 Updated Feb 11, 2025

FGO自动刷本,自动搓丸子以及将会实装的(抽取友情池、整理邮箱狗粮)

20 1 Updated Sep 7, 2021
Next