Skip to content
View akothen's full-sized avatar

Highlights

  • Pro

Block or report akothen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
56 results for source starred repositories written in Python
Clear filter

Fast and memory-efficient exact attention

Python 23,247 2,602 Updated Apr 8, 2026

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,069 3,402 Updated Apr 9, 2026

Efficient Triton Kernels for LLM Training

Python 6,265 510 Updated Apr 8, 2026

Efficient vision foundation models for high-resolution generation and perception.

Python 3,279 240 Updated Sep 5, 2025

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 1,004 61 Updated Mar 3, 2026

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 825 139 Updated Apr 9, 2026

AMD Ryzen™ AI Software includes the tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs.

Python 802 120 Updated Mar 27, 2026

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 470 21 Updated Apr 9, 2026

Repository to host and maintain SCALE-Sim code

Python 446 147 Updated Feb 2, 2026

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Python 392 13 Updated Jul 9, 2024

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 342 77 Updated Apr 9, 2026

[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring

Python 276 21 Updated Jul 6, 2025

A Simulation Framework for Memristive Deep Learning Systems

Python 184 61 Updated May 13, 2024

SparseTIR: Sparse Tensor Compiler for Deep Learning

Python 144 14 Updated Mar 31, 2023
Python 143 29 Updated Jun 24, 2024

Training neural networks in TensorFlow 2.0 with 5x less memory

Python 137 17 Updated Feb 21, 2022

Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025

Python 129 26 Updated May 3, 2025

Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators

Python 123 13 Updated Oct 26, 2022

Benchmark harness and baseline results for the NeuroBench algorithm track.

Python 112 27 Updated Jan 3, 2026

H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference

Python 96 10 Updated Apr 26, 2025

Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"

Python 94 6 Updated Mar 22, 2026

A scheduler for spatial DNN accelerators that generate high-performance schedules in one shot using mixed integer programming (MIP)

Python 86 21 Updated Aug 28, 2023

[ASPLOS 2019] PUMA-simulator provides a detailed simulation model of a dataflow architecture built with NVM (non-volatile memory), and runs ML models compiled using the puma compiler.

Python 67 46 Updated Apr 17, 2023

Artifact material for [HPCA 2025] #2108 "UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures"

Python 54 14 Updated Sep 1, 2025

[ASPLOS 2024] CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

Python 45 10 Updated May 25, 2024

UPMEM LLM Framework allows profiling PyTorch layers and functions and simulate those layers/functions with a given hardware profile.

Python 41 14 Updated Apr 8, 2026

Next-Generation AI-Assisted Kernel Engineering for Multi-Chip Systems

Python 38 8 Updated Apr 8, 2026

AccelOpt: Self-improving Agents for AI Accelerator Kernel Optimization

Python 32 5 Updated Apr 6, 2026
Python 30 5 Updated Apr 8, 2026

Simulator for LLM inference on an abstract 3D AIMC-based accelerator

Python 27 6 Updated Sep 18, 2025
Next