Skip to content
View shifeiwen's full-sized avatar
  • 北京
  • 19:12 (UTC -12:00)

Block or report shifeiwen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code repo for the paper "SpinQuant LLM quantization with learned rotations"

Python 405 90 Updated Feb 14, 2025

[EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models

Python 68 6 Updated Sep 22, 2024

Qualcomm® AI Hub Models is our collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

Python 1,133 202 Updated Jun 23, 2026

On-device AI across mobile, embedded and edge for PyTorch

Python 4,747 1,041 Updated Jun 23, 2026

LLM inference in C/C++

C++ 117,748 19,835 Updated Jun 23, 2026

Large Language Model Text Generation Inference

Python 10,864 1,270 Updated Mar 21, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 83,608 18,346 Updated Jun 23, 2026