Skip to content
View gpzlx1's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report gpzlx1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Region-level profiling for CUDA kernels with trace, NVBit, CUPTI, NSys, and an interactive Explorer.

Python 118 11 Updated Apr 17, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 6,520 606 Updated Jun 18, 2026

Examples of CUDA implementations by Cutlass CuTe

Makefile 278 34 Updated Jul 1, 2025

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python 427 46 Updated Aug 13, 2024

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

Jupyter Notebook 839 147 Updated Oct 11, 2022

TensorDict is a pytorch dedicated tensor container.

Python 1,030 115 Updated Jun 19, 2026

Shared Middle-Layer for Triton Compilation

MLIR 337 103 Updated Dec 5, 2025

Lightweight Armoury Crate alternative for Asus laptops with nearly the same functionality. Works with ROG Zephyrus, Flow, TUF, Strix, Scar, ProArt, Vivobook, Zenbook, Expertbook, ROG Ally, and many…

C# 13,757 512 Updated Jun 18, 2026

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 16,965 2,400 Updated Sep 3, 2025

This is an unofficial palworld server binary distribution project that fixes some problems with the original server.

Batchfile 865 28 Updated Jan 28, 2024

A Cross-Platform, Multi-Cloud High-Performance Computing Platform

C 272 113 Updated Jan 27, 2025
Cuda 28 3 Updated Aug 14, 2024
C++ 3 Updated Mar 8, 2018

RC4ML GNN System Projects

C++ 10 1 Updated Mar 12, 2024

NoteBook FanControl

C# 3,189 492 Updated Jul 8, 2024

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.

C++ 963 253 Updated Jun 17, 2026

Build OpenWrt using GitHub Actions | 使用 GitHub Actions 编译 OpenWrt | 感谢P3TERX的项目源码|感谢KFERMercer的项目源码

Shell 1,822 4,311 Updated Jul 1, 2024
Cuda 12 7 Updated Dec 17, 2023

Graphiler is a compiler stack built on top of DGL and TorchScript which compiles GNNs defined using user-defined functions (UDFs) into efficient execution plans.

Cuda 59 6 Updated Oct 3, 2022
Python 10 2 Updated Oct 17, 2021

BGHT: High-performance static GPU hash tables.

C++ 73 9 Updated Jul 2, 2025

A warp-oriented dynamic hash table for GPUs

Cuda 76 18 Updated Jan 19, 2024
Cuda 13 1 Updated Nov 4, 2020

Graph Sampling using GPU

Cuda 52 8 Updated Mar 17, 2022

Artifact for OSDI'21 GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs.

Cuda 71 20 Updated Mar 2, 2023

System for AI Education Resource.

Python 4,299 530 Updated Oct 25, 2024

A debugging and profiling tool that can trace and visualize python code execution

Python 7,670 468 Updated Jun 8, 2026

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Python 14,686 2,244 Updated Dec 1, 2025

Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, i…

Python 44,803 7,109 Updated Apr 22, 2026
Next