Stars
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Topics in Machine Learning Accelerator Design
VeriSilicon / triton-shared
Forked from microsoft/triton-sharedShared Middle-Layer for Triton Compilation
FlagGems is an operator library for large language models implemented in the Triton Language.
A PyTorch native platform for training generative AI models
PrIM (Processing-In-Memory benchmarks) is the first benchmark suite for a real-world processing-in-memory (PIM) architecture. PrIM is developed to evaluate, analyze, and characterize the first publ…
Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM stan…
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
The pjrt-plugin implementation for VeriSIlicon NPU IP for Tensorflow/PyTorch/Other ecosystem.
A high-throughput and memory-efficient inference and serving engine for LLMs
Empower VeriSilicon's NPU on Android Platform by NNAPI
zjd1988 / TIM-VX-python
Forked from VeriSilicon/TIM-VXVerisilicon Tensor Interface Module
Shared Middle-Layer for Triton Compilation
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
Acuitylite is an end-to-end neural network deployment tool
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Visually explore, understand, and present your data.
Lean Algorithmic Trading Engine by QuantConnect (Python, C#)
A GUI client for Windows, Linux and macOS, support Xray and sing-box and others