Skip to content
View JamesTheZ's full-sized avatar

Highlights

  • Pro

Block or report JamesTheZ

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Quirky Assortment of CuTe Kernels

Python 1,027 136 Updated Jun 20, 2026

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 855 58 Updated Jun 22, 2026
Python 764 190 Updated Apr 17, 2026

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,331 104 Updated Aug 28, 2025

LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

C++ 6 2 Updated Mar 31, 2026

Fastest kernels written from scratch

Cuda 583 76 Updated Sep 18, 2025

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

Python 29,034 5,074 Updated Aug 18, 2024

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,538 1,312 Updated Jul 9, 2025

Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4

C 994 109 Updated Jun 21, 2026

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,381 8,855 Updated Jun 22, 2026
Python 1 Updated Oct 6, 2025

台湾大学李宏毅老师机器学习

Jupyter Notebook 1,178 383 Updated Jul 15, 2019

Awesome LLM compression research papers and tools.

1,848 128 Updated Feb 23, 2026

Understanding Deep Learning - Simon J.D. Prince

Jupyter Notebook 9,579 2,260 Updated Feb 24, 2026

Fast low-bit matmul kernels in Triton

Python 474 35 Updated May 15, 2026

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 24,564 2,809 Updated May 25, 2026

Code repo for the paper "SpinQuant LLM quantization with learned rotations"

Python 405 90 Updated Feb 14, 2025

My learning notes for ML SYS.

Python 6,565 448 Updated Jun 18, 2026

Code for the paper "Evaluating Large Language Models Trained on Code"

Python 3,270 444 Updated Jan 17, 2025

A framework for the evaluation of autoregressive code generation language models.

Python 1,048 264 Updated Jul 22, 2025

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

Python 2,663 309 Updated Jun 22, 2026

A low-latency & high-throughput serving engine for LLMs

Python 506 64 Updated Jan 8, 2026

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 963 50 Updated Mar 29, 2026

Fast and memory-efficient exact attention

Python 233 78 Updated Jun 22, 2026

8-bit CUDA functions for PyTorch

Python 72 13 Updated Jun 16, 2026

[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation

Python 254 20 Updated Dec 16, 2024

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,363 591 Updated Oct 28, 2024

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

718 26 Updated Apr 15, 2026

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

C++ 245 21 Updated Sep 30, 2024
Next