akothen

Akash K. akothen

18 followers · 4 following

Achievements

Highlights

Stars

61 stars written in Python

Clear filter

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 23,602 2,667 Updated Apr 30, 2026

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,153 3,401 Updated Apr 30, 2026

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 6,319 525 Updated Apr 30, 2026

mit-han-lab / efficientvit

Efficient vision foundation models for high-resolution generation and perception.

Python 3,295 242 Updated Sep 5, 2025

meta-pytorch / attention-gym

Helpful tools and examples for working with flex-attention

Python 1,180 76 Updated Apr 13, 2026

volcengine / veScale

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 1,010 61 Updated Mar 3, 2026

pytorch / helion

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 854 144 Updated May 1, 2026

amd / RyzenAI-SW

AMD Ryzen™ AI Software includes the tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs.

Python 808 124 Updated Apr 17, 2026

NVIDIA / tilus

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 482 25 Updated Apr 27, 2026

scalesim-project / SCALE-Sim

Repository to host and maintain SCALE-Sim code

Python 454 149 Updated Feb 2, 2026

Meituan-AutoML / VisionLLaMA

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Python 392 13 Updated Jul 9, 2024

meta-pytorch / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 351 81 Updated Apr 30, 2026

mit-han-lab / x-attention

[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring

Python 277 24 Updated Jul 6, 2025

coreylammie / MemTorch

A Simulation Framework for Memristive Deep Learning Systems

Python 185 62 Updated May 13, 2024

scale-snu / attacc_simulator

Python 149 30 Updated Jun 24, 2024

uwsampl / SparseTIR

SparseTIR: Sparse Tensor Compiler for Deep Learning

Python 143 14 Updated Mar 31, 2023

parasj / checkmate

Training neural networks in TensorFlow 2.0 with 5x less memory

Python 137 17 Updated Feb 21, 2022

Yufeng98 / CENT

Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025

Python 134 26 Updated May 3, 2025

ucb-bar / autocomp

Autocomp: Optimize any AI kernel, anywhere.

Python 126 8 Updated Apr 29, 2026

pku-liang / AMOS

Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators

Python 124 13 Updated Oct 26, 2022

NeuroBench / neurobench

Benchmark harness and baseline results for the NeuroBench algorithm track.

Python 115 27 Updated Apr 24, 2026

leesou / H2-LLM-ISCA-2025

H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference

Python 100 10 Updated Apr 26, 2025

kyegomez / SparseAttention

Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"

Python 96 6 Updated Apr 13, 2026

ucb-bar / cosa

A scheduler for spatial DNN accelerators that generate high-performance schedules in one shot using mixed integer programming (MIP)

Python 86 21 Updated Aug 28, 2023

Aayush-Ankit / puma-simulator

[ASPLOS 2019] PUMA-simulator provides a detailed simulation model of a dataflow architecture built with NVM (non-volatile memory), and runs ML models compiled using the puma compiler.

Python 68 46 Updated Apr 17, 2023

godfather991 / UniNDP

Artifact material for [HPCA 2025] #2108 "UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures"

Python 54 14 Updated Sep 1, 2025

wzzll123 / MultiKernelBench

MultiArchKernelBench: A Multi-Platform Benchmark for Kernel Generation

Python 52 13 Updated Mar 25, 2026

flagos-ai / KernelGen

Next-Generation AI-Assisted Kernel Engineering for Multi-Chip Systems

Python 47 10 Updated Apr 15, 2026

Zhaoshixin-sky / CIM-MLC

[ASPLOS 2024] CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

Python 45 10 Updated May 25, 2024

upmem / upmem_llm_framework

UPMEM LLM Framework allows profiling PyTorch layers and functions and simulate those layers/functions with a given hardware profile.

Python 41 15 Updated Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Akash K. akothen

Achievements

Achievements

Highlights

Block or report akothen

Stars

Dao-AILab / flash-attention

NVIDIA-NeMo / NeMo

linkedin / Liger-Kernel

mit-han-lab / efficientvit

meta-pytorch / attention-gym

volcengine / veScale

pytorch / helion

amd / RyzenAI-SW

NVIDIA / tilus

scalesim-project / SCALE-Sim

Meituan-AutoML / VisionLLaMA

meta-pytorch / tritonbench

mit-han-lab / x-attention

coreylammie / MemTorch

scale-snu / attacc_simulator

uwsampl / SparseTIR

parasj / checkmate

Yufeng98 / CENT

ucb-bar / autocomp

pku-liang / AMOS

NeuroBench / neurobench

leesou / H2-LLM-ISCA-2025

kyegomez / SparseAttention

ucb-bar / cosa

Aayush-Ankit / puma-simulator

godfather991 / UniNDP

wzzll123 / MultiKernelBench

flagos-ai / KernelGen

Zhaoshixin-sky / CIM-MLC

upmem / upmem_llm_framework