-
University of Science and Technology of China
- Hefei, Anhui, China
- https://en.ustc.edu.cn/
- https://orcid.org/0009-0001-4486-3495
Stars
Region-level profiling for CUDA kernels with trace, NVBit, CUPTI, NSys, and an interactive Explorer.
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Examples of CUDA implementations by Cutlass CuTe
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
TensorDict is a pytorch dedicated tensor container.
Shared Middle-Layer for Triton Compilation
Lightweight Armoury Crate alternative for Asus laptops with nearly the same functionality. Works with ROG Zephyrus, Flow, TUF, Strix, Scar, ProArt, Vivobook, Zenbook, Expertbook, ROG Ally, and many…
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
This is an unofficial palworld server binary distribution project that fixes some problems with the original server.
A Cross-Platform, Multi-Cloud High-Performance Computing Platform
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
Build OpenWrt using GitHub Actions | 使用 GitHub Actions 编译 OpenWrt | 感谢P3TERX的项目源码|感谢KFERMercer的项目源码
Graphiler is a compiler stack built on top of DGL and TorchScript which compiles GNNs defined using user-defined functions (UDFs) into efficient execution plans.
Artifact for OSDI'21 GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs.
A debugging and profiling tool that can trace and visualize python code execution
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, i…