Skip to content
View fengli1702's full-sized avatar

Highlights

  • Pro

Block or report fengli1702

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Optimized FP16/BF16 x FP4 GPU kernels for AMD GPUs

C++ 35 6 Updated Oct 9, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,920 918 Updated Dec 15, 2025

🚀🚀 Efficient implementations of Native Sparse Attention

Python 1,039 12 Updated Sep 29, 2025

The best ChatGPT that $100 can buy.

Python 38,858 4,902 Updated Dec 9, 2025

中国科学技术大学数字电路实验入门指南,2022年由马子睿助教创建。本仓库旨在让各位后续助教能够不断对其进行迭代

7 1 Updated Oct 10, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,982 1,586 Updated Dec 18, 2025
C++ 14 7 Updated Nov 9, 2024

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,965 877 Updated Dec 4, 2025

🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.

JavaScript 22,543 2,282 Updated Oct 17, 2025

在融合ArkFS+vivo50后,加入IPFS/Filecoin的分布式存储设计

Python 2 Updated Jul 3, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,435 1,991 Updated Nov 1, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,723 12,047 Updated Dec 18, 2025

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,371 1,349 Updated Jul 9, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,253 350 Updated Dec 18, 2025

Development repository for the Triton language and compiler

MLIR 17,876 2,455 Updated Dec 18, 2025

中科大计算机学院部分课程的试卷

89 6 Updated Jul 25, 2025

a SymDrive copy

C 2 1 Updated Jan 26, 2018

2025年12月更新,目前国内可用Docker镜像源汇总,DockerHub国内镜像加速列表,🚀DockerHub镜像加速器

6,891 326 Updated Dec 16, 2025

Fast and memory-efficient exact attention

Python 21,178 2,230 Updated Dec 18, 2025

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 1,023 100 Updated Dec 30, 2024
Assembly 1 Updated Jun 2, 2024

Stanford computer networking lab, an elegant TCP/IP implementation

C++ 132 24 Updated May 27, 2023

存放中国科大2023春季学期《数字图像处理与分析》课程部分资源

MATLAB 8 Updated Sep 17, 2023

Linux kernel source tree

C 211,128 59,543 Updated Dec 18, 2025

A wrapper script to build whole-program LLVM bitcode files

Python 724 132 Updated Dec 11, 2024

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

Python 249 30 Updated Nov 18, 2024

Stanford CS149 -- Assignment 1

C++ 139 176 Updated Oct 15, 2025

中国科学技术大学龙芯杯参赛作品仓库合集

15 Updated Oct 2, 2024

Fully open reproduction of DeepSeek-R1

Python 25,737 2,405 Updated Nov 24, 2025
Next