Stars
Disaggregated serving system for Large Language Models (LLMs).
Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!
A machine learning accelerator core designed for energy-efficient AI at the edge.
A heterogeneous architecture timing model simulator.
A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep
A Datacenter Scale Distributed Inference Serving Framework
LLM serving cluster simulator
Official code repository for "Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving [MICRO'25]"
This repository contains the code for this paper: Chiplet-Gym: An RL-based Optimization Framework for Chiplet-based AI Accelerator
A toolchain for rapid design space exploration of chiplet architectures
Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
A high-throughput and memory-efficient inference and serving engine for LLMs
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
A collection of AWESOME things about mixture-of-experts
Training Sparse Autoencoders on Language Models
ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference
[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark
Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718
collection of diffusion model papers categorized by their subareas