Skip to content
View sunshinemyson's full-sized avatar
  • verisilicon.com
  • CD

Block or report sunshinemyson

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Python 168 14 Updated Sep 23, 2025

Topics in Machine Learning Accelerator Design

89 22 Updated Feb 16, 2023

Shared Middle-Layer for Triton Compilation

MLIR 1 Updated Jan 7, 2025

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 747 147 Updated Nov 5, 2025

A PyTorch native platform for training generative AI models

Python 4,653 595 Updated Nov 5, 2025
Jupyter Notebook 126 39 Updated Oct 31, 2025

PrIM (Processing-In-Memory benchmarks) is the first benchmark suite for a real-world processing-in-memory (PIM) architecture. PrIM is developed to evaluate, analyze, and characterize the first publ…

C 162 58 Updated Apr 29, 2024

Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM stan…

C++ 422 111 Updated Oct 20, 2025

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 571 70 Updated Sep 11, 2024

LLM training in simple, raw C/CUDA

Cuda 28,073 3,264 Updated Jun 26, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,660 319 Updated Aug 19, 2025

The pjrt-plugin implementation for VeriSIlicon NPU IP for Tensorflow/PyTorch/Other ecosystem.

C++ 7 1 Updated Apr 28, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 62,101 11,042 Updated Nov 5, 2025

Empower VeriSilicon's NPU on Android Platform by NNAPI

C++ 5 5 Updated Jan 21, 2025

我的自学笔记,终身更新

Python 3,919 483 Updated Nov 5, 2025

Verisilicon Tensor Interface Module

C 5 Updated Oct 13, 2023

Shared Middle-Layer for Triton Compilation

MLIR 302 79 Updated Oct 27, 2025
Shell 77 8 Updated Sep 9, 2022

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 152,117 31,045 Updated Nov 5, 2025

Code release for ConvNeXt model

Python 6,178 725 Updated Jan 8, 2023

GitHub Action for clang-format checking

C 125 40 Updated Nov 1, 2025

VeriSilicon Tensor Interface Module

C 239 88 Updated Oct 13, 2025

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1,669 610 Updated Nov 4, 2025

Acuitylite is an end-to-end neural network deployment tool

Roff 18 5 Updated Nov 4, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,728 1,514 Updated Nov 5, 2025
Python 242 60 Updated Mar 31, 2023

Visually explore, understand, and present your data.

TypeScript 6,875 557 Updated Oct 17, 2025

Lean Algorithmic Trading Engine by QuantConnect (Python, C#)

C# 12,686 3,788 Updated Nov 4, 2025

A GUI client for Windows, Linux and macOS, support Xray and sing-box and others

C# 89,443 13,577 Updated Nov 5, 2025
Next