Skip to content
View xyfgemini's full-sized avatar
🎯
Focusing
🎯
Focusing
  • China
  • 18:57 (UTC +08:00)

Highlights

  • Pro

Block or report xyfgemini

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Building a full-fledged code editor for iPad

Swift 34 Updated Oct 29, 2025

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 718 80 Updated Apr 6, 2025

从无名小卒到大模型(LLM)大英雄~ 欢迎关注后续!!!

Jupyter Notebook 1,794 124 Updated Oct 19, 2025

A low-latency & high-throughput serving engine for LLMs

Python 436 58 Updated Oct 16, 2025

Implement some method of LLM KV Cache Sparsity

Python 40 2 Updated Jun 6, 2024

A machine learning accelerator core designed for energy-efficient AI at the edge.

Emacs Lisp 1,718 169 Updated Nov 3, 2025

[DATE'2025, TCAD'2025] Terafly : A Multi-Node FPGA Based Accelerator Design for Efficient Cooperative Inference in LLMs

C++ 16 1 Updated Oct 24, 2025

[DATE'25, ICCAD'25] An embedded FPGA-based LLM accelerator capable of supporting Llama2-7B

Verilog 43 3 Updated Nov 4, 2025

LeetGPU Challenges

Python 415 31 Updated Nov 2, 2025

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 1,970 220 Updated Nov 5, 2025

A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience

TypeScript 80,527 5,963 Updated Nov 5, 2025

Awesome Pruning. ✅ Curated Resources for Neural Network Pruning.

170 14 Updated Aug 30, 2024

强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/

Jupyter Notebook 12,917 2,146 Updated Sep 6, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,622 256 Updated Oct 28, 2025

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python 857 124 Updated Nov 5, 2025

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,983 525 Updated Apr 11, 2025

GEMV implementation with CUTLASS

C++ 15 1 Updated Aug 21, 2025

Course Project for High Level Chip Design (高层次芯片设计)

C++ 16 6 Updated Jan 2, 2025

Contexts Optical Compression

Python 19,557 1,370 Updated Oct 25, 2025

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 32,499 3,758 Updated Nov 2, 2025

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

397 39 Updated Aug 2, 2025

high-performance RTL simulator

Scala 181 15 Updated Jun 19, 2024

Let's Learn AI SYStem

Python 11 46 Updated Sep 4, 2025

LLM大模型(重点)以及搜广推等 AI 算法中手写的面试题,(非 LeetCode),比如 Self-Attention, AUC等,一般比 LeetCode 更考察一个人的综合能力,又更贴近业务和基础知识一点

Jupyter Notebook 429 22 Updated Dec 29, 2024

Bringing Language Models to the Most Resource Constrained Devices

Python 42 7 Updated Dec 23, 2024
Python 40 4 Updated Oct 21, 2025

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

C++ 9,159 2,795 Updated Nov 5, 2025

Fast inference from large lauguage models via speculative decoding

Python 846 87 Updated Aug 22, 2024
Next