MARD1NO

Follow

🎯

Focusing

ZZK MARD1NO

🎯

Focusing

Follow

I'm in a state of trance

401 followers · 465 following

SiliconFlow
Neverland
https://mard1no.github.io/

Achievements

Achievements

Lists (1)

Sort

🚀 My stack

Starred repositories

leepoly / sm-profiler

Python 60 5 Updated Feb 5, 2026

inclusionAI / humming

Python 59 4 Updated Apr 3, 2026

yilin-void / Sandbox

Python 1 Updated Mar 24, 2026

facebookresearch / tensor-layouts

A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.

Python 155 9 Updated Mar 31, 2026

tanweai / pua

你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候，对你的期望是很高的。一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.

TypeScript 15,167 848 Updated Mar 31, 2026

ROCm / ATOM

AiTer Optimized Model

Python 57 33 Updated Apr 5, 2026

ROCm / mori

Modular RDMA Interface

C++ 105 30 Updated Apr 3, 2026

Dao-AILab / AI-workflow

70 2 Updated Mar 24, 2026

technillogue / ptx-isa-markdown

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 143 26 Updated Dec 24, 2025

SandAI-org / MagiCompiler

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 273 20 Updated Apr 3, 2026

StuartSul / deltaspill

Delta-debugging minimizer for CUDA register spills.

Cuda 8 Updated Mar 21, 2026

KuangjuX / cuda-evolve-oss

Autonomous GPU kernel optimization system driven by AI agents.

Python 31 Updated Mar 29, 2026

openai / parameter-golf

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 4,653 3,013 Updated Mar 30, 2026

NVIDIA / SOL-ExecBench

A benchmark of real-world DL kernel problems

Python 162 13 Updated Apr 2, 2026

THUDM / IndexCache

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

72 6 Updated Mar 14, 2026

hustvl / MoDA

An hardware-aware Efficient Implementation for "Mixture-of-Depths Attention".

Python 152 3 Updated Mar 23, 2026

MoonshotAI / Attention-Residuals

2,990 153 Updated Mar 17, 2026

KuangjuX / ncu-cli

Automated CUDA kernel performance diagnostics from NVIDIA Nsight Compute (NCU) CSV exports.

Rust 27 2 Updated Mar 18, 2026

hw-native-sys / pypto

A community-driven pypto implementation

Python 46 52 Updated Apr 4, 2026

vllm-project / tpu-inference

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 284 147 Updated Apr 6, 2026

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 10,520 1,747 Updated Apr 3, 2026

Tencent / YOLO-Master

[CVPR2026]🚀🚀🚀Official code for the paper "YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection." *(YOLO = You Only Look Once)* 🔥🔥🔥

Python 464 53 Updated Mar 9, 2026

GindaChen / nsys-ai

Terminal UI for NVIDIA Nsight Systems profiles — timeline viewer, kernel navigator, NVTX hierarchy

Python 50 8 Updated Apr 5, 2026

Tencent-Hunyuan / HY-WU

HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing

Python 270 12 Updated Mar 18, 2026

NVIDIA / cuopt

GPU accelerated decision optimization

Cuda 801 153 Updated Apr 4, 2026

KuangjuX / curgit

A high-performance CLI tool written in Rust that acts as a standalone Git Agent.

Rust 11 Updated Mar 26, 2026

stepfun-ai / SteptronOss

A lightweight, AI-native training framework for large language models. Designed for fast iteration, reproducible experiments, and modular configuration across SFT, RLVR, and evaluation workflows.

Python 527 37 Updated Mar 31, 2026

GeeeekExplorer / kkbot

A Feishu/Lark AI agent bot

Python 13 Updated Feb 27, 2026

govctl-org / govctl

Governance-as-code for AI-assisted software development

Rust 103 7 Updated Apr 4, 2026

meta-pytorch / OpenEnv

An interface library for RL post training with environments.

Python 1,545 299 Updated Apr 2, 2026

Starred topics

Awesome Lists