Skip to content
View micropuma's full-sized avatar

Highlights

  • Pro

Block or report micropuma

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A domain-specific language (DSL) based on Triton but providing higher-level abstractions.

Python 99 18 Updated May 16, 2026

Mirror of https://gitcode.com/Ascend/AscendNPU-IR

C++ 24 9 Updated May 18, 2026

Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend

MLIR 123 18 Updated May 15, 2026

Perplexity open source garden for inference technology

Rust 415 42 Updated Dec 25, 2025

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 13,120 1,460 Updated May 16, 2026

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Python 2,820 264 Updated Mar 25, 2025

A framework for few-shot evaluation of language models.

Python 12,599 3,277 Updated May 11, 2026

Speed of Light Analysis for ML Model Runtime

Python 65 11 Updated Apr 13, 2026

A benchmark of real-world DL kernel problems

Python 201 22 Updated Apr 15, 2026

high-performance linear attention kernel library built on TileLang

Python 489 37 Updated May 7, 2026

Building General-Purpose Robots Based on Embodied Foundation Model

Python 862 73 Updated Apr 7, 2026
Python 12 13 Updated May 18, 2026

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 355 82 Updated May 17, 2026

๐Ÿš€ Efficient implementations for emerging model architectures

Python 5,109 530 Updated May 17, 2026
Python 21 4 Updated May 16, 2025

Learning TileLang with 10 puzzles!

Python 273 32 Updated Apr 28, 2026

Community maintained hardware plugin for vLLM on Ascend

Python 2,096 1,224 Updated May 18, 2026

Community maintained hardware plugin for vLLM on MetaX GPU

Python 132 58 Updated May 15, 2026
Python 19 1 Updated Mar 17, 2026

๐Ÿค–FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3ร—โ†‘๐ŸŽ‰ vs SDPA, up to 430T๐ŸŽ‰ on H200.

Python 294 17 Updated May 18, 2026

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,370 418 Updated Jan 17, 2026

A framework for efficient model inference with omni-modality models

Python 4,788 936 Updated May 18, 2026

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,306 101 Updated Aug 28, 2025

[NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning

Python 98 Updated Apr 20, 2026

Building the Virtuous Cycle for AI-driven LLM Systems

Python 227 40 Updated May 1, 2026

A collection of memory efficient attention operators implemented in the Triton language.

Python 291 20 Updated Jun 5, 2024

Ring attention implementation with flash attention

Python 1,020 98 Updated Sep 10, 2025
Scala 33 3 Updated May 17, 2026

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 805 28 Updated Oct 13, 2025
Next