Skip to content
View akothen's full-sized avatar

Highlights

  • Pro

Block or report akothen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 433 14 Updated Dec 16, 2025
Python 1 Updated Jun 22, 2025
Python 7 2 Updated Dec 9, 2025

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 1,004 165 Updated Sep 19, 2024

Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators

Python 119 13 Updated Oct 26, 2022

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 850 72 Updated Dec 17, 2025

Find shape errors before you run your code!

Swift 151 9 Updated Dec 27, 2020

Artifact of MICRO'25 paper Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device

C 7 1 Updated Aug 3, 2025

A DL compiler fuzzer

Python 13 Updated Nov 1, 2024

An end-to-end Transformer fusion integrating DAG-based pipeline scheduling and whole encoder and decoder fusion.

Python 5 Updated Jul 17, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,338 3,242 Updated Dec 20, 2025

Official repo for the paper "An Effective Training Framework for Light-Weight Automatic Speech Recognition Models" accepted at InterSpeech 2025.

Lex 4 Updated Aug 15, 2025

Efficient vision foundation models for high-resolution generation and perception.

Python 3,181 229 Updated Sep 5, 2025

[PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and mini-batch training. Provides unification of full-/mini-batch t…

Python 10 2 Updated Aug 13, 2024

Pfeife: Automatic Pipeline Parallelism for PyTorch

Python 5 Updated Oct 27, 2025

A verification tool for ensuring parallelization equivalence in distributed model training.

Python 14 1 Updated Sep 1, 2025

A verification tool for ensuring parallelization equivalence in distributed model training.

Python 11 Updated Sep 17, 2025

Lists of company wise questions available on leetcode premium. Every csv file in the companies directory corresponds to a list of questions on leetcode for a specific company based on the leetcode …

10,633 2,194 Updated Jun 20, 2025

UniSparse: An Intermediate Language for General Sparse Format Customization (OOPSLA'24)

MLIR 33 Updated Nov 12, 2024

SparseTIR: Sparse Tensor Compiler for Deep Learning

Python 141 14 Updated Mar 31, 2023

This project includes a prototype implementation of BOLT—a bandwidth-optimized, lightning-fast Oblivious Map—along with benchmarking code for performance comparisons.

C++ 1 Updated Aug 9, 2025
Jupyter Notebook 2 Updated Nov 23, 2025
Python 12 4 Updated Jun 2, 2025

This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.

2,044 388 Updated Nov 8, 2025

A compiler for homomorphic encryption

C++ 627 109 Updated Dec 22, 2025
Python 115 21 Updated Jun 24, 2024
C++ 15 5 Updated Sep 10, 2025

Official code repository for "Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving [MICRO'25]"

Python 22 2 Updated Oct 23, 2025
Next