Skip to content
View dfyz's full-sized avatar

Block or report dfyz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 2 2 Updated Jan 29, 2026

NVidia sass disassembler/inline patcher

C++ 66 13 Updated Apr 12, 2026
Cuda 55 11 Updated Dec 10, 2025

Pwning Santa before the bad guys do 🎅

Python 3 1 Updated Dec 10, 2025

A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.

Python 58 6 Updated Apr 10, 2026

MoE training for Me and You and maybe other people

Python 380 32 Updated Mar 15, 2026

The MATLAB Tensor Core: a set of models of tensor cores written in MATLAB

MATLAB 17 2 Updated Apr 7, 2026

Fast arithmetic modulo `2^k`, `2^k - 1`, and `2^k - d`.

Rust 19 2 Updated Dec 22, 2025

Unofficial description of the CUDA assembly (SASS) instruction sets.

Python 208 19 Updated Jul 18, 2025

Low-overhead tracing of all Linux kernel-user transitions, for serious performance analysis. Includes kernel patches, loadable module, and post-processing software. Output is HTML/SVG per-CPU-core …

HTML 686 70 Updated Sep 1, 2024

Awesome Object Capabilities and Capability Security

JavaScript 396 27 Updated Apr 1, 2026

A UNIX-like kernel for the i386 architecture

C 639 45 Updated Apr 8, 2026

A fast, small C/C++ function call tracer for x86-64/Linux, supports clang & gcc, ftrace, threads, exceptions & shared libraries

C++ 196 3 Updated Mar 25, 2025
Rust 4 Updated Jan 25, 2024

TexLive programs bundled into a single static binary for x86_64-linux / WASM

Makefile 66 8 Updated Mar 24, 2025

Inspect and dissect an ELF file with pretty formatting.

Rust 119 10 Updated Feb 25, 2024

My solutions for CTF challenges

C 74 14 Updated Dec 16, 2025

LLM inference in C/C++

C++ 103,371 16,768 Updated Apr 13, 2026

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 2,305 481 Updated Apr 13, 2026
C++ 323 93 Updated Feb 17, 2026

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,552 731 Updated Apr 12, 2026

Transformer related optimization, including BERT, GPT

C++ 6,412 935 Updated Mar 27, 2024

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,562 1,785 Updated Apr 9, 2026

Open standard for machine learning interoperability

Python 20,648 3,916 Updated Apr 10, 2026

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 19,849 3,822 Updated Apr 13, 2026

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 99,078 27,473 Updated Apr 13, 2026

BLAS-like Library Instantiation Software Framework

C 2,625 417 Updated Nov 11, 2025

oneAPI Deep Neural Network Library (oneDNN)

C++ 3,979 1,119 Updated Apr 13, 2026

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

C 7,372 1,661 Updated Apr 12, 2026
Next