MARD1NO

🎯

Focusing

ZZK MARD1NO

🎯

Focusing

I'm in a state of trance

401 followers · 463 following

SiliconFlow
Neverland
https://mard1no.github.io/

Achievements

x2 x3

Achievements

x2 x3

flex-block-attn Public
Forked from Tencent-Hunyuan/flex-block-attn

flex-block-attn: an efficient block sparse attention computation library

Jupyter Notebook Other Updated Nov 20, 2025
tilelang-dsa Public
Forked from lemyx/tilelang-dsa

DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang

Python Other Updated Nov 19, 2025
NexVenusCL Public
Forked from nex-agi/NexVenusCL

Nex Venus Communication Library

C++ Apache License 2.0 Updated Nov 17, 2025
flash-moba Public
Forked from mit-han-lab/flash-moba

C++ 1 BSD 3-Clause "New" or "Revised" License Updated Nov 17, 2025
nanotrace Public
Forked from aikitoria/nanotrace

Low overhead tracing library and trace visualizer for pipelined CUDA kernels

C MIT License Updated Nov 9, 2025
pplx-garden Public
Forked from perplexityai/pplx-garden

Perplexity open source garden for inference technology

Rust MIT License Updated Nov 5, 2025
Kimi-Linear Public
Forked from MoonshotAI/Kimi-Linear

MIT License Updated Oct 30, 2025
FP-Quant Public
Forked from IST-DASLab/FP-Quant

Python Updated Oct 28, 2025
FlexKV Public

Python Other Updated Oct 28, 2025
FlexKV-official Public
Forked from taco-project/FlexKV

Python Other Updated Oct 28, 2025
flashpack Public
Forked from fal-ai/flashpack

High-throughput tensor loading for PyTorch

Python MIT License Updated Oct 27, 2025
Penny Public
Forked from SzymonOzog/Penny

Hand-Rolled GPU communications library

Cuda Updated Oct 23, 2025
hoti-2025-gpu-comms-tutorial Public
Forked from NVIDIA/hoti-2025-gpu-comms-tutorial

Tutorial Exercises and Code for GPU Communications Tutorial at HOT Interconnects 2025

C++ Other Updated Oct 22, 2025
ai-performance-engineering Public
Forked from cfregly/ai-performance-engineering

Python Apache License 2.0 Updated Oct 22, 2025
DeepSeek-OCR Public
Forked from deepseek-ai/DeepSeek-OCR

Contexts Optical Compression

Python MIT License Updated Oct 20, 2025
HamiltonAttention Public
Forked from infinigence/HamiltonAttention

Python Updated Oct 15, 2025
reasoning-from-scratch Public
Forked from rasbt/reasoning-from-scratch

Implement a reasoning LLM in PyTorch from scratch, step by step

Jupyter Notebook Apache License 2.0 Updated Oct 10, 2025
i_am_dsp Public
Forked from IAMMRGODIE/i_am_dsp

A simple DSP crate

Rust Mozilla Public License 2.0 Updated Oct 6, 2025
gpu-experiments Public
Forked from StuartSul/gpu-experiments

A collection of GPU tests and benchmarks for my own research.

Cuda Updated Oct 5, 2025
gpunetio Public
Forked from NVIDIA-DOCA/gpunetio

Open source version of DOCA GPUNetIO and DOCA Verbs libraries (limited features) to enable GDAKI technology on RDMA (IB and RoCE)

C++ Other Updated Oct 2, 2025
FlashMoE Public
Forked from osayamenja/FlashMoE

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 2 Other Updated Sep 30, 2025
DLSlime Public
Forked from DeepLink-org/DLSlime

DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit

C++ BSD 3-Clause "New" or "Revised" License Updated Sep 18, 2025
NVSHMEM-Tutorial Public
Forked from KuangjuX/NVSHMEM-Tutorial

Cuda Updated Sep 16, 2025
tvm-ffi Public
Forked from apache/tvm-ffi

TVM FFI

C++ Apache License 2.0 Updated Sep 15, 2025
checkpoint-engine Public
Forked from MoonshotAI/checkpoint-engine

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python MIT License Updated Sep 10, 2025
batch_invariant_ops Public
Forked from thinking-machines-lab/batch_invariant_ops

Python MIT License Updated Sep 10, 2025
NVSHMEM Public
Forked from NVIDIA/nvshmem

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ Other Updated Sep 6, 2025
flash_attention_from_scratch Public
Forked from sonnyli/flash_attention_from_scratch

Flash Attention from Scratch on CUDA Ampere

Assembly 1 Updated Sep 1, 2025
uccl Public
Forked from uccl-project/uccl

Ultra and Unified CCL

C++ Apache License 2.0 Updated Aug 15, 2025
VeOmni Public
Forked from ByteDance-Seed/VeOmni

VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework

Python Apache License 2.0 Updated Aug 12, 2025

ZZK MARD1NO

Achievements

Achievements

flex-block-attn Public

Uh oh!

tilelang-dsa Public

Uh oh!

NexVenusCL Public

Uh oh!

flash-moba Public

Uh oh!

nanotrace Public

Uh oh!

pplx-garden Public

Uh oh!

Kimi-Linear Public

Uh oh!

FP-Quant Public

Uh oh!

FlexKV Public

Uh oh!

FlexKV-official Public

Uh oh!

flashpack Public

Uh oh!

Penny Public

Uh oh!

hoti-2025-gpu-comms-tutorial Public

Uh oh!

ai-performance-engineering Public

Uh oh!

DeepSeek-OCR Public

Uh oh!

HamiltonAttention Public

Uh oh!

reasoning-from-scratch Public

Uh oh!

i_am_dsp Public

Uh oh!

gpu-experiments Public

Uh oh!

gpunetio Public

Uh oh!

FlashMoE Public

Uh oh!

DLSlime Public

Uh oh!

NVSHMEM-Tutorial Public

Uh oh!

tvm-ffi Public

Uh oh!

checkpoint-engine Public

Uh oh!

batch_invariant_ops Public

Uh oh!

NVSHMEM Public

Uh oh!

flash_attention_from_scratch Public

Uh oh!

uccl Public

Uh oh!

VeOmni Public

Uh oh!