-
Liger-Kernel Public
Forked from linkedin/Liger-KernelEfficient Triton Kernels for LLM Training
Python BSD 2-Clause "Simplified" License UpdatedNov 13, 2025 -
DeepEP Public
Forked from deepseek-ai/DeepEPDeepEP: an efficient expert-parallel communication library
Cuda MIT License UpdatedOct 17, 2025 -
-
2025 Public
Forked from asplos-contest/2025The ASPLOS 2025 / EuroSys 2025 Contest Track
-
pytorch Public
Forked from pytorch/pytorchTensors and Dynamic neural networks in Python with strong GPU acceleration
C++ Other UpdatedSep 12, 2024 -
gpt-fast Public
Forked from meta-pytorch/gpt-fastSimple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Python BSD 3-Clause "New" or "Revised" License UpdatedSep 7, 2024 -
hatchet-1 Public
Forked from llnl/hatchetGraph-indexed Pandas DataFrames for analyzing hierarchical performance data
JavaScript MIT License UpdatedMay 10, 2024 -
ao Public
Forked from pytorch/aoNative PyTorch library for quantization and sparsity
Python BSD 3-Clause "New" or "Revised" License UpdatedMay 6, 2024 -
pytorch_geometric Public
Forked from pyg-team/pytorch_geometricGraph Neural Network Library for PyTorch
Python MIT License UpdatedApr 17, 2024 -
Triton-Puzzles Public
Forked from srush/Triton-Puzzles -
triton Public
Forked from triton-lang/tritonDevelopment repository for the Triton language and compiler
C++ MIT License UpdatedJan 11, 2024 -
pyg-lib Public
Forked from pyg-team/pyg-libLow-Level Graph Neural Network Operators for PyG
-
-
llvm-project Public
Forked from llvm/llvm-projectThe LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Other UpdatedDec 14, 2023 -
-
hpctoolkit Public
Forked from HPCToolkit/hpctoolkitHPCToolkit performance tools: measurement and analysis components
-
DrGPUM Public
Forked from Lin-Mao/DrGPUMA memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
-
Awesome-GPU Public
Awesome resources for GPUs
-
RzLinear Public
Forked from apd10/RzLinearA compressed alternative to matrix multiplication using state-of-the art compression ROBE-Z
Jupyter Notebook MIT License UpdatedMay 20, 2023 -
-
-
Laghos Public
Forked from CEED/LaghosHigh-order Lagrangian Hydrodynamics Miniapp
-
GPA Public
GPU Performance Advisor
-
tabulate Public
Forked from p-ranav/tabulateTable Maker for Modern C++
C++ MIT License UpdatedJul 22, 2022 -
inference Public
Forked from mlcommons/inferenceReference implementations of MLPerf™ inference benchmarks
Python Other UpdatedMay 24, 2022 -
training Public
Forked from mlcommons/trainingReference implementations of MLPerf™ training benchmarks
Python Other UpdatedMay 21, 2022 -
cupti_test Public
Forked from jmellorcrummey/cupti-testTest overhead of CUPTI PC sampling for CUDA 10
C++ UpdatedMay 10, 2022 -
wowchemy-hugo-themes Public
Forked from HugoBlox/hugo-blox-builder🔥 Hugo website builder, Hugo themes & Hugo CMS. No code, build with widgets! 创建在线课程,学术简历或初创网站。
SCSS MIT License UpdatedApr 27, 2022 -
-
Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.
Python UpdatedAug 17, 2021