akothen

Akash K. akothen

18 followers · 4 following

Achievements

Highlights

Stars

56 results for source starred repositories written in Python

Clear filter

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 23,247 2,602 Updated Apr 8, 2026

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,069 3,402 Updated Apr 9, 2026

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 6,265 510 Updated Apr 8, 2026

mit-han-lab / efficientvit

Efficient vision foundation models for high-resolution generation and perception.

Python 3,279 240 Updated Sep 5, 2025

volcengine / veScale

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 1,004 61 Updated Mar 3, 2026

pytorch / helion

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 825 139 Updated Apr 9, 2026

amd / RyzenAI-SW

AMD Ryzen™ AI Software includes the tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs.

Python 802 120 Updated Mar 27, 2026

NVIDIA / tilus

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 470 21 Updated Apr 9, 2026

scalesim-project / SCALE-Sim

Repository to host and maintain SCALE-Sim code

Python 446 147 Updated Feb 2, 2026

Meituan-AutoML / VisionLLaMA

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Python 392 13 Updated Jul 9, 2024

meta-pytorch / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 342 77 Updated Apr 9, 2026

mit-han-lab / x-attention

[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring

Python 276 21 Updated Jul 6, 2025

coreylammie / MemTorch

A Simulation Framework for Memristive Deep Learning Systems

Python 184 61 Updated May 13, 2024

uwsampl / SparseTIR

SparseTIR: Sparse Tensor Compiler for Deep Learning

Python 144 14 Updated Mar 31, 2023

scale-snu / attacc_simulator

Python 143 29 Updated Jun 24, 2024

parasj / checkmate

Training neural networks in TensorFlow 2.0 with 5x less memory

Python 137 17 Updated Feb 21, 2022

Yufeng98 / CENT

Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025

Python 129 26 Updated May 3, 2025

pku-liang / AMOS

Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators

Python 123 13 Updated Oct 26, 2022

NeuroBench / neurobench

Benchmark harness and baseline results for the NeuroBench algorithm track.

Python 112 27 Updated Jan 3, 2026

leesou / H2-LLM-ISCA-2025

H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference

Python 96 10 Updated Apr 26, 2025

kyegomez / SparseAttention

Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"

Python 94 6 Updated Mar 22, 2026

ucb-bar / cosa

A scheduler for spatial DNN accelerators that generate high-performance schedules in one shot using mixed integer programming (MIP)

Python 86 21 Updated Aug 28, 2023

Aayush-Ankit / puma-simulator

[ASPLOS 2019] PUMA-simulator provides a detailed simulation model of a dataflow architecture built with NVM (non-volatile memory), and runs ML models compiled using the puma compiler.

Python 67 46 Updated Apr 17, 2023

godfather991 / UniNDP

Artifact material for [HPCA 2025] #2108 "UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures"

Python 54 14 Updated Sep 1, 2025

Zhaoshixin-sky / CIM-MLC

[ASPLOS 2024] CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

Python 45 10 Updated May 25, 2024

upmem / upmem_llm_framework

UPMEM LLM Framework allows profiling PyTorch layers and functions and simulate those layers/functions with a given hardware profile.

Python 41 14 Updated Apr 8, 2026

flagos-ai / KernelGen

Next-Generation AI-Assisted Kernel Engineering for Multi-Chip Systems

Python 38 8 Updated Apr 8, 2026

zhang677 / AccelOpt

AccelOpt: Self-improving Agents for AI Accelerator Kernel Optimization

Python 32 5 Updated Apr 6, 2026

amd / Triton-XDNA

Python 30 5 Updated Apr 8, 2026

IBM / 3D-CiM-LLM-Inference-Simulator

Simulator for LLM inference on an abstract 3D AIMC-based accelerator

Python 27 6 Updated Sep 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Akash K. akothen

Achievements

Achievements

Highlights

Block or report akothen

Stars

Dao-AILab / flash-attention

NVIDIA-NeMo / NeMo

linkedin / Liger-Kernel

mit-han-lab / efficientvit

volcengine / veScale

pytorch / helion

amd / RyzenAI-SW

NVIDIA / tilus

scalesim-project / SCALE-Sim

Meituan-AutoML / VisionLLaMA

meta-pytorch / tritonbench

mit-han-lab / x-attention

coreylammie / MemTorch

uwsampl / SparseTIR

scale-snu / attacc_simulator

parasj / checkmate

Yufeng98 / CENT

pku-liang / AMOS

NeuroBench / neurobench

leesou / H2-LLM-ISCA-2025

kyegomez / SparseAttention

ucb-bar / cosa

Aayush-Ankit / puma-simulator

godfather991 / UniNDP

Zhaoshixin-sky / CIM-MLC

upmem / upmem_llm_framework

flagos-ai / KernelGen

zhang677 / AccelOpt

amd / Triton-XDNA

IBM / 3D-CiM-LLM-Inference-Simulator