- Belgrade
-
08:43
(UTC +02:00) - http://dfyz.info/
- @i_komarov
Stars
Pwning Santa before the bad guys do 🎅
A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.
MoE training for Me and You and maybe other people
The MATLAB Tensor Core: a set of models of tensor cores written in MATLAB
Fast arithmetic modulo `2^k`, `2^k - 1`, and `2^k - d`.
Unofficial description of the CUDA assembly (SASS) instruction sets.
Low-overhead tracing of all Linux kernel-user transitions, for serious performance analysis. Includes kernel patches, loadable module, and post-processing software. Output is HTML/SVG per-CPU-core …
Awesome Object Capabilities and Capability Security
A fast, small C/C++ function call tracer for x86-64/Linux, supports clang & gcc, ftrace, threads, exceptions & shared libraries
TexLive programs bundled into a single static binary for x86_64-linux / WASM
Inspect and dissect an ELF file with pretty formatting.
High-efficiency floating-point neural network inference operators for mobile, server, and Web
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Transformer related optimization, including BERT, GPT
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Open standard for machine learning interoperability
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Tensors and Dynamic neural networks in Python with strong GPU acceleration
oneAPI Deep Neural Network Library (oneDNN)
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.