Skip to content
View dfyz's full-sized avatar

Block or report dfyz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 2 2 Updated Jan 29, 2026

NVidia sass disassembler/inline patcher

C++ 63 13 Updated Apr 2, 2026
Cuda 54 11 Updated Dec 10, 2025

Pwning Santa before the bad guys do 🎅

Python 3 1 Updated Dec 10, 2025

A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.

Python 38 6 Updated Mar 31, 2026

MoE training for Me and You and maybe other people

Python 380 32 Updated Mar 15, 2026

The MATLAB Tensor Core: a set of models of tensor cores written in MATLAB

MATLAB 16 3 Updated Apr 3, 2026

Fast arithmetic modulo `2^k`, `2^k - 1`, and `2^k - d`.

Rust 19 2 Updated Dec 22, 2025

Unofficial description of the CUDA assembly (SASS) instruction sets.

Python 208 19 Updated Jul 18, 2025

Low-overhead tracing of all Linux kernel-user transitions, for serious performance analysis. Includes kernel patches, loadable module, and post-processing software. Output is HTML/SVG per-CPU-core …

HTML 686 70 Updated Sep 1, 2024

Awesome Object Capabilities and Capability Security

JavaScript 394 27 Updated Apr 1, 2026

A UNIX-like kernel for the i386 architecture

C 633 45 Updated Apr 4, 2026

A fast, small C/C++ function call tracer for x86-64/Linux, supports clang & gcc, ftrace, threads, exceptions & shared libraries

C++ 196 3 Updated Mar 25, 2025
Rust 4 Updated Jan 25, 2024

TexLive programs bundled into a single static binary for x86_64-linux / WASM

Makefile 66 8 Updated Mar 24, 2025

Inspect and dissect an ELF file with pretty formatting.

Rust 118 10 Updated Feb 25, 2024

My solutions for CTF challenges

C 74 14 Updated Dec 16, 2025

LLM inference in C/C++

C++ 101,441 16,361 Updated Apr 5, 2026

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 2,295 478 Updated Apr 4, 2026
C++ 323 93 Updated Feb 17, 2026

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,549 730 Updated Apr 4, 2026

Transformer related optimization, including BERT, GPT

C++ 6,410 934 Updated Mar 27, 2024

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,530 1,769 Updated Apr 2, 2026

Open standard for machine learning interoperability

Python 20,586 3,914 Updated Apr 4, 2026

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 19,757 3,806 Updated Apr 5, 2026

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 98,801 27,402 Updated Apr 5, 2026

BLAS-like Library Instantiation Software Framework

C 2,621 416 Updated Nov 11, 2025

oneAPI Deep Neural Network Library (oneDNN)

C++ 3,974 1,117 Updated Apr 4, 2026

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

C 7,362 1,659 Updated Apr 2, 2026
Next