Skip to content
View akothen's full-sized avatar

Highlights

  • Pro

Block or report akothen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
61 stars written in Python
Clear filter

Fast and memory-efficient exact attention

Python 23,602 2,667 Updated Apr 30, 2026

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,153 3,401 Updated Apr 30, 2026

Efficient Triton Kernels for LLM Training

Python 6,319 525 Updated Apr 30, 2026

Efficient vision foundation models for high-resolution generation and perception.

Python 3,295 242 Updated Sep 5, 2025

Helpful tools and examples for working with flex-attention

Python 1,180 76 Updated Apr 13, 2026

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 1,010 61 Updated Mar 3, 2026

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 854 144 Updated May 1, 2026

AMD Ryzen™ AI Software includes the tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs.

Python 808 124 Updated Apr 17, 2026

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 482 25 Updated Apr 27, 2026

Repository to host and maintain SCALE-Sim code

Python 454 149 Updated Feb 2, 2026

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Python 392 13 Updated Jul 9, 2024

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 351 81 Updated Apr 30, 2026

[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring

Python 277 24 Updated Jul 6, 2025

A Simulation Framework for Memristive Deep Learning Systems

Python 185 62 Updated May 13, 2024
Python 149 30 Updated Jun 24, 2024

SparseTIR: Sparse Tensor Compiler for Deep Learning

Python 143 14 Updated Mar 31, 2023

Training neural networks in TensorFlow 2.0 with 5x less memory

Python 137 17 Updated Feb 21, 2022

Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025

Python 134 26 Updated May 3, 2025

Autocomp: Optimize any AI kernel, anywhere.

Python 126 8 Updated Apr 29, 2026

Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators

Python 124 13 Updated Oct 26, 2022

Benchmark harness and baseline results for the NeuroBench algorithm track.

Python 115 27 Updated Apr 24, 2026

H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference

Python 100 10 Updated Apr 26, 2025

Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"

Python 96 6 Updated Apr 13, 2026

A scheduler for spatial DNN accelerators that generate high-performance schedules in one shot using mixed integer programming (MIP)

Python 86 21 Updated Aug 28, 2023

[ASPLOS 2019] PUMA-simulator provides a detailed simulation model of a dataflow architecture built with NVM (non-volatile memory), and runs ML models compiled using the puma compiler.

Python 68 46 Updated Apr 17, 2023

Artifact material for [HPCA 2025] #2108 "UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures"

Python 54 14 Updated Sep 1, 2025

MultiArchKernelBench: A Multi-Platform Benchmark for Kernel Generation

Python 52 13 Updated Mar 25, 2026

Next-Generation AI-Assisted Kernel Engineering for Multi-Chip Systems

Python 47 10 Updated Apr 15, 2026

[ASPLOS 2024] CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

Python 45 10 Updated May 25, 2024

UPMEM LLM Framework allows profiling PyTorch layers and functions and simulate those layers/functions with a given hardware profile.

Python 41 15 Updated Apr 8, 2026
Next