-
Sun Yat-sen University
- Guangzhou, Guangdong, China
-
02:55
(UTC +08:00) - https://wu-kan.cn/
Highlights
- Pro
Stars
注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模型的功能
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
open-source coding LLM for software engineering tasks
Nvidia Instruction Set Specification Generator
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
A printable low-profile 60% mechanical keyboard kit with 7mm front height and foldable footstand.
Tile primitives for speedy kernels
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling
Distributed Compiler based on Triton for Parallel Systems
A tool for examining GPU scheduling behavior.
A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
karthikeyann / cuda-calculator
Forked from szho42/cuda-calculatorHTML/JS port of CUDA Occupancy Calculator
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
A throughput-oriented high-performance serving framework for LLMs
collection of benchmarks to measure basic GPU capabilities