MARD1NO

🎯

Focusing

ZZK MARD1NO

🎯

Focusing

Paddle very good

389 followers · 444 following

SiliconFlow
Neverland
https://mard1no.github.io/

Achievements

x2 x3

Achievements

x2 x3

sonic-moe Public
Forked from Dao-AILab/sonic-moe

Python Apache License 2.0 Updated Dec 18, 2025
cuJSON Public
Forked from AutomataLab/cuJSON

cuJSON: A Highly Parallel JSON Parser for GPUs

C++ MIT License Updated Dec 12, 2025
iris Public
Forked from ROCm/iris

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Python MIT License Updated Dec 10, 2025
TileGym Public
Forked from NVIDIA/TileGym

Helpful kernel tutorials and examples for tile-based GPU programming

Python Other Updated Dec 5, 2025
cutile-python Public
Forked from NVIDIA/cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python Other Updated Dec 4, 2025
fouroversix Public
Forked from mit-han-lab/fouroversix

Code for the paper “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling”

Python MIT License Updated Dec 2, 2025
nsight-python Public
Forked from NVIDIA/nsight-python

Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools

Python Apache License 2.0 Updated Nov 27, 2025
asystem-amem Public
Forked from inclusionAI/asystem-amem

A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.

C++ Apache License 2.0 Updated Nov 27, 2025
flex-block-attn Public
Forked from Tencent-Hunyuan/flex-block-attn

flex-block-attn: an efficient block sparse attention computation library

Jupyter Notebook Other Updated Nov 20, 2025
tilelang-dsa Public
Forked from lemyx/tilelang-dsa

DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang

Python Other Updated Nov 19, 2025
NexVenusCL Public
Forked from nex-agi/NexVenusCL

Nex Venus Communication Library

C++ Apache License 2.0 Updated Nov 17, 2025
flash-moba Public
Forked from mit-han-lab/flash-moba

C++ 1 BSD 3-Clause "New" or "Revised" License Updated Nov 17, 2025
nanotrace Public
Forked from aikitoria/nanotrace

Low overhead tracing library and trace visualizer for pipelined CUDA kernels

C MIT License Updated Nov 9, 2025
pplx-garden Public
Forked from perplexityai/pplx-garden

Perplexity open source garden for inference technology

Rust MIT License Updated Nov 5, 2025
Kimi-Linear Public
Forked from MoonshotAI/Kimi-Linear

MIT License Updated Oct 30, 2025
FP-Quant Public
Forked from IST-DASLab/FP-Quant

Python Updated Oct 28, 2025
FlexKV Public

Python Other Updated Oct 28, 2025
FlexKV-official Public
Forked from taco-project/FlexKV

Python Other Updated Oct 28, 2025
flashpack Public
Forked from fal-ai/flashpack

High-throughput tensor loading for PyTorch

Python MIT License Updated Oct 27, 2025
Penny Public
Forked from SzymonOzog/Penny

Hand-Rolled GPU communications library

Cuda Updated Oct 23, 2025
hoti-2025-gpu-comms-tutorial Public
Forked from NVIDIA/hoti-2025-gpu-comms-tutorial

Tutorial Exercises and Code for GPU Communications Tutorial at HOT Interconnects 2025

C++ Other Updated Oct 22, 2025
ai-performance-engineering Public
Forked from cfregly/ai-performance-engineering

Python Apache License 2.0 Updated Oct 22, 2025
DeepSeek-OCR Public
Forked from deepseek-ai/DeepSeek-OCR

Contexts Optical Compression

Python MIT License Updated Oct 20, 2025
HamiltonAttention Public
Forked from infinigence/HamiltonAttention

Python Updated Oct 15, 2025
reasoning-from-scratch Public
Forked from rasbt/reasoning-from-scratch

Implement a reasoning LLM in PyTorch from scratch, step by step

Jupyter Notebook Apache License 2.0 Updated Oct 10, 2025
i_am_dsp Public
Forked from IAMMRGODIE/i_am_dsp

A simple DSP crate

Rust Mozilla Public License 2.0 Updated Oct 6, 2025
gpu-experiments Public
Forked from StuartSul/gpu-experiments

A collection of GPU tests and benchmarks for my own research.

Cuda Updated Oct 5, 2025
gpunetio Public
Forked from NVIDIA-DOCA/gpunetio

Open source version of DOCA GPUNetIO and DOCA Verbs libraries (limited features) to enable GDAKI technology on RDMA (IB and RoCE)

C++ Other Updated Oct 2, 2025
FlashMoE Public
Forked from osayamenja/FlashMoE

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 2 Other Updated Sep 30, 2025
DLSlime Public
Forked from DeepLink-org/DLSlime

DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit

C++ BSD 3-Clause "New" or "Revised" License Updated Sep 18, 2025

ZZK MARD1NO

Achievements

Achievements

sonic-moe Public

Uh oh!

cuJSON Public

Uh oh!

iris Public

Uh oh!

TileGym Public

Uh oh!

cutile-python Public

Uh oh!

fouroversix Public

Uh oh!

nsight-python Public

Uh oh!

asystem-amem Public

Uh oh!

flex-block-attn Public

Uh oh!

tilelang-dsa Public

Uh oh!

NexVenusCL Public

Uh oh!

flash-moba Public

Uh oh!

nanotrace Public

Uh oh!

pplx-garden Public

Uh oh!

Kimi-Linear Public

Uh oh!

FP-Quant Public

Uh oh!

FlexKV Public

Uh oh!

FlexKV-official Public

Uh oh!

flashpack Public

Uh oh!

Penny Public

Uh oh!

hoti-2025-gpu-comms-tutorial Public

Uh oh!

ai-performance-engineering Public

Uh oh!

DeepSeek-OCR Public

Uh oh!

HamiltonAttention Public

Uh oh!

reasoning-from-scratch Public

Uh oh!

i_am_dsp Public

Uh oh!

gpu-experiments Public

Uh oh!

gpunetio Public

Uh oh!

FlashMoE Public

Uh oh!

DLSlime Public

Uh oh!