Skip to content
View syed-ahmed's full-sized avatar

Block or report syed-ahmed

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 100,853 28,045 Updated Jun 18, 2026

Cataloging released Triton kernels.

308 16 Updated Sep 9, 2025

Efficient Triton Kernels for LLM Training

Python 6,444 541 Updated Jun 17, 2026

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 15,229 1,284 Updated May 23, 2024

PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.

Python 1,462 114 Updated Jun 15, 2026

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

10,189 786 Updated Apr 8, 2026

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

C++ 392 81 Updated May 31, 2026
Jupyter Notebook 506 45 Updated Oct 18, 2024

Run compilers interactively from your web browser and interact with the assembly

TypeScript 18,841 2,058 Updated Jun 18, 2026

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 419 52 Updated Jan 2, 2025

A tool based on Excalidraw to create stop motion animations and slides.

TypeScript 565 45 Updated Jun 8, 2026

Making DAG construction easier

Python 285 13 Updated Jun 4, 2026
VHDL 3 Updated Apr 20, 2021

helper scripts for vivado and vivado_hls build with cmake.

CMake 1 Updated Jan 27, 2021

Examples shown as part of the tutorial "Productive parallel programming on FPGA with high-level synthesis".

C++ 206 46 Updated Nov 14, 2021

a cheat-sheet for mathematical notation in code form

15,479 1,090 Updated Mar 8, 2022

FPGA+SoC+Linux+Device Tree Overlay+FPGA Manager U-Boot&Linux Kernel&Debian11 Images (for Xilinx:Zynq Ultrascale+ MPSoC)

134 38 Updated Aug 14, 2025

Example for ZynqMP-FPGA-XRT(Xilinx RunTime for ZynqMP-FPGA-Linux)

Ruby 6 1 Updated Jul 6, 2020

XRT(Xilinx Runtime) for ZynqMP-FPGA-Linux

Makefile 6 1 Updated May 19, 2023

Tool for updating the contents of BlockRAMs found in Xilinx 7 series bitstreams.

LLVM 19 6 Updated Feb 9, 2022

Scalable systolic array-based matrix-matrix multiplication implemented in Vivado HLS for Xilinx FPGAs.

C++ 386 61 Updated Jan 20, 2025

Soba frontend

1 Updated Feb 23, 2020
Python 1 Updated Feb 23, 2020

A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems", which is `dmls-book`

HTML 10,431 1,616 Updated Apr 15, 2023

A collection of out-of-tree LLVM passes for teaching and learning

C++ 3,402 448 Updated May 17, 2026

Intro to Creative Coding workshop with p5.js and Tone.js

770 52 Updated Nov 22, 2022

A high-level performance analysis tool for FPGA-based accelerators

C++ 19 7 Updated Jun 2, 2017
Next