Skip to content
View gimenu's full-sized avatar

Block or report gimenu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 754 82 Updated Apr 6, 2025

Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!

Python 39,404 7,335 Updated Nov 27, 2022

A machine learning accelerator core designed for energy-efficient AI at the edge.

Emacs Lisp 1,964 217 Updated Dec 19, 2025
C++ 2 Updated Jun 5, 2024

MMSpGEMM

Cuda 1 Updated Aug 11, 2025

A heterogeneous architecture timing model simulator.

C++ 173 61 Updated Sep 11, 2025
Verilog 1,830 421 Updated Dec 22, 2025

A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep

Jupyter Notebook 4,350 1,176 Updated Aug 31, 2024

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,683 753 Updated Dec 25, 2025

LLM serving cluster simulator

Jupyter Notebook 128 13 Updated Apr 25, 2024

Official code repository for "Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving [MICRO'25]"

Python 22 2 Updated Oct 23, 2025

This repository contains the code for this paper: Chiplet-Gym: An RL-based Optimization Framework for Chiplet-based AI Accelerator

Python 21 3 Updated Sep 28, 2024

A toolchain for rapid design space exploration of chiplet architectures

C++ 71 14 Updated Jul 25, 2025
C++ 24 2 Updated Oct 14, 2025
Python 717 47 Updated Nov 30, 2025

Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"

Python 1,602 191 Updated Aug 12, 2020

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 66,155 12,181 Updated Dec 25, 2025

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 36,085 15,584 Updated Dec 25, 2025

A collection of AWESOME things about mixture-of-experts

1,245 82 Updated Dec 8, 2024

Training Sparse Autoencoders on Language Models

Python 1,129 208 Updated Dec 24, 2025

ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference

C++ 178 32 Updated Dec 9, 2025

[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark

Python 391 13 Updated Jul 9, 2024
Jupyter Notebook 29 7 Updated Oct 4, 2025

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

Python 364 30 Updated Sep 25, 2024

Repository of LV-Eval Benchmark

Python 73 10 Updated Aug 31, 2024
Jupyter Notebook 126 12 Updated Nov 11, 2024

LongBench v2 and LongBench (ACL 25'&24')

Python 1,050 112 Updated Jan 15, 2025

collection of diffusion model papers categorized by their subareas

2,093 95 Updated Dec 25, 2025

My collection of machine learning papers

295 21 Updated Aug 10, 2023
Next