syed-ahmed

Syed Tousif Ahmed syed-ahmed

I work on PyTorch

73 followers · 19 following

Achievements

Stars

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 100,853 28,045 Updated Jun 18, 2026

gpu-mode / triton-index

Cataloging released Triton kernels.

308 16 Updated Sep 9, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 6,444 541 Updated Jun 17, 2026

naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 15,229 1,284 Updated May 23, 2024

Lightning-AI / lightning-thunder

PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.

Python 1,462 114 Updated Jun 15, 2026

Mooler0410 / LLMsPracticalGuide

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

10,189 786 Updated Apr 8, 2026

NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

C++ 392 81 Updated May 31, 2026

srush / Autodiff-Puzzles

Jupyter Notebook 506 45 Updated Oct 18, 2024

compiler-explorer / compiler-explorer

Run compilers interactively from your web browser and interact with the assembly

TypeScript 18,841 2,058 Updated Jun 18, 2026

yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 419 52 Updated Jan 2, 2025

dai-shi / excalidraw-claymate

A tool based on Excalidraw to create stop motion animations and slides.

TypeScript 565 45 Updated Jun 8, 2026

pipeline-tools / gusty

Making DAG construction easier

Python 285 13 Updated Jun 4, 2026

icgrp / estream4fccm2021

VHDL 3 Updated Apr 20, 2021

akira-nishiyama / vitis_library_compilation_sample

C++ 1 Updated Jan 29, 2021

akira-nishiyama / vivado_cmake_helper

helper scripts for vivado and vivado_hls build with cmake.

CMake 1 Updated Jan 27, 2021

spcl / hls_tutorial_examples

Examples shown as part of the tutorial "Productive parallel programming on FPGA with high-level synthesis".

C++ 206 46 Updated Nov 14, 2021

Experience-Monks / math-as-code

a cheat-sheet for mathematical notation in code form

15,479 1,090 Updated Mar 8, 2022

ikwzm / ZynqMP-FPGA-Linux

FPGA+SoC+Linux+Device Tree Overlay+FPGA Manager U-Boot&Linux Kernel&Debian11 Images (for Xilinx:Zynq Ultrascale+ MPSoC)

134 38 Updated Aug 14, 2025

ikwzm / ZynqMP-FPGA-XRT-Example-1-Ultra96

Example for ZynqMP-FPGA-XRT(Xilinx RunTime for ZynqMP-FPGA-Linux)

Ruby 6 1 Updated Jul 6, 2020

ikwzm / ZynqMP-FPGA-XRT

XRT(Xilinx Runtime) for ZynqMP-FPGA-Linux

Makefile 6 1 Updated May 19, 2023

chipsalliance / f4pga-xc7-bram-patch

Tool for updating the contents of BlockRAMs found in Xilinx 7 series bitstreams.

LLVM 19 6 Updated Feb 9, 2022

spcl / gemm_hls

Scalable systolic array-based matrix-matrix multiplication implemented in Vivado HLS for Xilinx FPGAs.

C++ 386 61 Updated Jan 20, 2025

socraticbananas / soba_web

Soba frontend

1 Updated Feb 23, 2020

socraticbananas / soba_api

Python 1 Updated Feb 23, 2020

socraticbananas / orchestration

1 Updated Feb 23, 2020

chiphuyen / machine-learning-systems-design

A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems", which is `dmls-book`

HTML 10,431 1,616 Updated Apr 15, 2023

banach-space / llvm-tutor

A collection of out-of-tree LLVM passes for teaching and learning

C++ 3,402 448 Updated May 17, 2026

mattdesl / workshop-p5-intro

Intro to Creative Coding workshop with p5.js and Tone.js

770 52 Updated Nov 22, 2022

ThibautMarty / conv-hls-overclocking

C++ 8 3 Updated Jan 24, 2019

zhguanw / lin-analyzer

A high-level performance analysis tool for FPGA-based accelerators

C++ 19 7 Updated Jun 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syed Tousif Ahmed syed-ahmed

Achievements

Achievements

Block or report syed-ahmed

Stars

pytorch / pytorch

gpu-mode / triton-index

linkedin / Liger-Kernel

naklecha / llama3-from-scratch

Lightning-AI / lightning-thunder

Mooler0410 / LLMsPracticalGuide

NVIDIA / Fuser

srush / Autodiff-Puzzles

compiler-explorer / compiler-explorer

yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

dai-shi / excalidraw-claymate

pipeline-tools / gusty

icgrp / estream4fccm2021

akira-nishiyama / vitis_library_compilation_sample

akira-nishiyama / vivado_cmake_helper

spcl / hls_tutorial_examples

Experience-Monks / math-as-code

ikwzm / ZynqMP-FPGA-Linux

ikwzm / ZynqMP-FPGA-XRT-Example-1-Ultra96

ikwzm / ZynqMP-FPGA-XRT

chipsalliance / f4pga-xc7-bram-patch

spcl / gemm_hls

socraticbananas / soba_web

socraticbananas / soba_api

socraticbananas / orchestration

chiphuyen / machine-learning-systems-design

banach-space / llvm-tutor

mattdesl / workshop-p5-intro

ThibautMarty / conv-hls-overclocking

zhguanw / lin-analyzer