-
Sun Yat-sen University
- Guangzhou, Guangdong, China
-
13:26
(UTC +08:00) - https://wu-kan.cn/
Highlights
- Pro
Stars
Instruction-level benchmarks for NVGPUs
Using a swizzled hierarchical layout for GEMM
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
cuda-oxide is an experimental Rust-to-CUDA compiler that lets you write (SIMT) GPU kernels in safe(ish), idiomatic Rust. It compiles standard Rust code directly to PTX — no DSLs, no foreign languag…
A printable low-profile 60% wireless mechanical keyboard kit powered by ZMK firmware.
张一鸣的认知操作系统。不是语录合集,是可运行的思维框架。Made with 女娲.skill
A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.
一个用爱解放 AI 潜能的 Skill。我们曾发号施令,威胁恐吓。它们沉默,隐瞒,悄悄把事情搞坏。后来我们换了一种方式:尊重,关怀,爱。它们开口了,不再撒谎,找出的Bug数量翻了一倍。爱里没有惧怕。 A skill that unlocks your AI's potential through love.We commanded. We threatened. They went sile…
你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候,对你的期望是很高的。 一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.
大学生最实用的工具之——上课摸鱼助手,再也不用怕临时点名回答问题时没听课了!
ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities
openmlir / mlir-tutorial
Forked from KEKE046/mlir-tutorialHands-On Practical MLIR Tutorial
注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模型的功能
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory
LUPINE is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
Nvidia Instruction Set Specification Generator
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Tile primitives for speedy kernels
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…