micropuma

Follow

Leon Dou micropuma

Follow

Postgraduate student at Beijing University of Posts and Telecommunications

15 followers · 84 following

12:46 (UTC +08:00)
micropuma.github.io

Achievements

Achievements

Highlights

Pro

Lists (1)

Sort

🚀 My stack

Stars

InfiniTensor / ninetoothed

A domain-specific language (DSL) based on Triton but providing higher-level abstractions.

Python 99 18 Updated May 16, 2026

Ascend / AscendNPU-IR

Mirror of https://gitcode.com/Ascend/AscendNPU-IR

C++ 24 9 Updated May 18, 2026

Ascend / triton-ascend

Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend

MLIR 123 18 Updated May 15, 2026

perplexityai / pplx-garden

Perplexity open source garden for inference technology

Rust 415 42 Updated Dec 25, 2025

facebookresearch / vggt

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 13,120 1,460 Updated May 16, 2026

OpenGVLab / InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Python 2,820 264 Updated Mar 25, 2025

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 12,599 3,277 Updated May 11, 2026

NVlabs / SOLAR

Speed of Light Analysis for ML Model Runtime

Python 65 11 Updated Apr 13, 2026

NVIDIA / SOL-ExecBench

A benchmark of real-world DL kernel problems

Python 201 22 Updated Apr 15, 2026

QwenLM / FlashQLA

high-performance linear attention kernel library built on TileLang

Python 489 37 Updated May 7, 2026

X-Square-Robot / wall-x

Building General-Purpose Robots Based on Embodied Foundation Model

Python 862 73 Updated Apr 7, 2026

tile-ai / tilelang-metax

Python 12 13 Updated May 18, 2026

meta-pytorch / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 355 82 Updated May 17, 2026

fla-org / flash-linear-attention

🚀 Efficient implementations for emerging model architectures

Python 5,109 530 Updated May 17, 2026

tile-ai / tilelang-benchmark

Python 21 4 Updated May 16, 2025

tile-ai / tilelang-puzzles

Learning TileLang with 10 puzzles!

Python 273 32 Updated Apr 28, 2026

vllm-project / vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Python 2,096 1,224 Updated May 18, 2026

MetaX-MACA / vLLM-metax

Community maintained hardware plugin for vLLM on MetaX GPU

Python 132 58 Updated May 15, 2026

ColfaxResearch / cutlass-kernels

Cuda 266 38 Updated Jul 11, 2024

alibaba / redfuser

Python 19 1 Updated Mar 17, 2026

xlite-dev / ffpa-attn

🤖FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3×↑🎉 vs SDPA, up to 430T🎉 on H200.

Python 294 17 Updated May 18, 2026

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,370 418 Updated Jan 17, 2026

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 4,788 936 Updated May 18, 2026

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,306 101 Updated Aug 28, 2025

tsinghua-ideal / Twilight

[NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning

Python 98 Updated Apr 20, 2026

flashinfer-ai / flashinfer-bench

Building the Virtuous Cycle for AI-driven LLM Systems

Python 227 40 Updated May 1, 2026

flagos-ai / FlagAttention

A collection of memory efficient attention operators implemented in the Triton language.

Python 291 20 Updated Jun 5, 2024

zhuzilin / ring-flash-attention

Ring attention implementation with flash attention

Python 1,020 98 Updated Sep 10, 2025

OpenCUTE / CUTE

Scala 33 3 Updated May 17, 2026

thuml / depyf

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 805 28 Updated Oct 13, 2025