Skip to content
View MARD1NO's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report MARD1NO

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
Python 60 5 Updated Feb 5, 2026
Python 59 4 Updated Apr 3, 2026
Python 1 Updated Mar 24, 2026

A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.

Python 155 9 Updated Mar 31, 2026

你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候,对你的期望是很高的。 一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.

TypeScript 15,167 848 Updated Mar 31, 2026

AiTer Optimized Model

Python 57 33 Updated Apr 5, 2026

Modular RDMA Interface

C++ 105 30 Updated Apr 3, 2026

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 143 26 Updated Dec 24, 2025

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 273 20 Updated Apr 3, 2026

Delta-debugging minimizer for CUDA register spills.

Cuda 8 Updated Mar 21, 2026

Autonomous GPU kernel optimization system driven by AI agents.

Python 31 Updated Mar 29, 2026

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 4,653 3,013 Updated Mar 30, 2026

A benchmark of real-world DL kernel problems

Python 162 13 Updated Apr 2, 2026

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

72 6 Updated Mar 14, 2026

An hardware-aware Efficient Implementation for "Mixture-of-Depths Attention".

Python 152 3 Updated Mar 23, 2026

Automated CUDA kernel performance diagnostics from NVIDIA Nsight Compute (NCU) CSV exports.

Rust 27 2 Updated Mar 18, 2026

A community-driven pypto implementation

Python 46 52 Updated Apr 4, 2026

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 284 147 Updated Apr 6, 2026

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 10,520 1,747 Updated Apr 3, 2026

[CVPR2026]🚀🚀🚀Official code for the paper "YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection." *(YOLO = You Only Look Once)* 🔥🔥🔥

Python 464 53 Updated Mar 9, 2026

Terminal UI for NVIDIA Nsight Systems profiles — timeline viewer, kernel navigator, NVTX hierarchy

Python 50 8 Updated Apr 5, 2026

HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing

Python 270 12 Updated Mar 18, 2026

GPU accelerated decision optimization

Cuda 801 153 Updated Apr 4, 2026

A high-performance CLI tool written in Rust that acts as a standalone Git Agent.

Rust 11 Updated Mar 26, 2026

A lightweight, AI-native training framework for large language models. Designed for fast iteration, reproducible experiments, and modular configuration across SFT, RLVR, and evaluation workflows.

Python 527 37 Updated Mar 31, 2026

A Feishu/Lark AI agent bot

Python 13 Updated Feb 27, 2026

Governance-as-code for AI-assisted software development

Rust 103 7 Updated Apr 4, 2026

An interface library for RL post training with environments.

Python 1,545 299 Updated Apr 2, 2026
Next