-
12:46
(UTC +08:00) - micropuma.github.io
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
A domain-specific language (DSL) based on Triton but providing higher-level abstractions.
Mirror of https://gitcode.com/Ascend/AscendNPU-IR
Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend
Perplexity open source garden for inference technology
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
A framework for few-shot evaluation of language models.
A benchmark of real-world DL kernel problems
high-performance linear attention kernel library built on TileLang
Building General-Purpose Robots Based on Embodied Foundation Model
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
๐ Efficient implementations for emerging model architectures
Community maintained hardware plugin for vLLM on Ascend
Community maintained hardware plugin for vLLM on MetaX GPU
๐คFFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3รโ๐ vs SDPA, up to 430T๐ on H200.
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
A framework for efficient model inference with omni-modality models
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
[NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning
Building the Virtuous Cycle for AI-driven LLM Systems
A collection of memory efficient attention operators implemented in the Triton language.
Ring attention implementation with flash attention
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.