Stars
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
FlashMLA: Efficient Multi-head Latent Attention Kernels
Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without lossing end-to-end metrics across language, image, and video models.
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Fast and memory-efficient exact attention
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
Development repository for the Triton language and compiler
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-…
An easy to use PyTorch to TensorRT converter
Support Yolov5(4.0)/Yolov5(5.0)/YoloR/YoloX/Yolov4/Yolov3/CenterNet/CenterFace/RetinaFace/Classify/Unet. use darknet/libtorch/pytorch/mxnet to onnx to tensorrt
🔥 (yolov3 yolov4 yolov5 unet ...)A mini pytorch inference framework which inspired from darknet.
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
AlexeyAB / darknet
Forked from pjreddie/darknetYOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )