Skip to content
View akothen's full-sized avatar

Highlights

  • Pro

Block or report akothen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

AccelOpt: Self-improving Agents for AI Accelerator Kernel Optimization

Python 30 3 Updated Feb 18, 2026
Python 17 Updated Mar 17, 2026
SystemVerilog 7 Updated Feb 20, 2026

A Scala equality saturation library

Scala 8 Updated Dec 16, 2025
Python 23 4 Updated Mar 21, 2026

A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching

C++ 79 12 Updated Mar 20, 2026

SymEngine is a fast symbolic manipulation library, written in C++

C++ 1,345 311 Updated Feb 13, 2026

Reference Code Implementation of paper "Evolution of Kernels: Automated RISC-V Kernel Optimization with Large Language Models"

Python 5 3 Updated Dec 2, 2025

Simulator for LLM inference on an abstract 3D AIMC-based accelerator

Python 27 6 Updated Sep 18, 2025

[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

Scala 128 11 Updated Aug 27, 2024

A Simulation Framework for Memristive Deep Learning Systems

Python 183 61 Updated May 13, 2024

Memory Array Simulation Testbed for Organization, Data, Operations, and Networks

C++ 3 Updated Sep 12, 2024

Verilog used to evaluate the FASED dot product hardware unit [IEEE CAL 2026]

SystemVerilog 8 2 Updated Jan 9, 2026

🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.

C++ 1,385 387 Updated Mar 23, 2026

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 456 18 Updated Mar 23, 2026
Python 2 Updated Jun 22, 2025
Python 12 2 Updated Dec 9, 2025

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 1,004 165 Updated Sep 19, 2024

Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators

Python 122 13 Updated Oct 26, 2022

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 966 88 Updated Feb 25, 2026

Find shape errors before you run your code!

Swift 152 9 Updated Dec 27, 2020

Artifact of MICRO'25 paper Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device

C 6 1 Updated Aug 3, 2025

A DL compiler fuzzer

Python 14 Updated Nov 1, 2024

An end-to-end Transformer fusion integrating DAG-based pipeline scheduling and whole encoder and decoder fusion.

Python 5 1 Updated Jul 17, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,969 3,384 Updated Mar 23, 2026

Official repo for the paper "An Effective Training Framework for Light-Weight Automatic Speech Recognition Models" accepted at InterSpeech 2025.

Lex 8 Updated Aug 15, 2025

Efficient vision foundation models for high-resolution generation and perception.

Python 3,270 235 Updated Sep 5, 2025

[PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and mini-batch training. Provides unification of full-/mini-batch t…

Python 10 2 Updated Aug 13, 2024

Pfeife: Automatic Pipeline Parallelism for PyTorch

Python 5 Updated Oct 27, 2025
Next