Advanced quantization toolkit for LLMs and VLMs. Support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Transformers, vLLM, SGLang, and llm-compressor

Python 772 64 Updated Dec 19, 2025

mit-han-lab / flash-moba

C++ 209 6 Updated Nov 19, 2025

IST-DASLab / FP-Quant

Python 88 12 Updated Nov 16, 2025

aikitoria / nanotrace

Low overhead tracing library and trace visualizer for pipelined CUDA kernels

C 127 5 Updated Nov 26, 2025

gpu-mode / resource-stream

GPU programming related news and material links

1,874 110 Updated Sep 17, 2025

NVlabs / QeRL

QeRL enables RL for 32B LLMs on a single H100 GPU.

Python 468 44 Updated Nov 27, 2025

meta-pytorch / kraken

Triton-based Symmetric Memory operators and examples

Python 67 11 Updated Oct 17, 2025

ChenMnZ / INT_vs_FP

A framework to compare low-bit integer and float-point formats

Python 50 5 Updated Nov 1, 2025

MoonshotAI / Kimi-Linear

1,236 56 Updated Nov 17, 2025

meituan-longcat / LongCat-Video

Python 1,519 198 Updated Dec 20, 2025

mit-han-lab / streaming-vlm

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Python 774 51 Updated Oct 15, 2025

deepseek-ai / DeepSeek-V3.2-Exp

Python 1,376 111 Updated Nov 18, 2025

toyaix / triton-runner

Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.

Python 78 1 Updated Dec 13, 2025

NVIDIA / nvshmem

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 418 48 Updated Dec 20, 2025

thinking-machines-lab / batch_invariant_ops

Python 937 72 Updated Nov 4, 2025

IST-DASLab / qutlass

QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning

C++ 148 13 Updated Nov 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HandH1998 HandH1998

Achievements

Achievements

Block or report HandH1998

Stars

Dao-AILab / sonic-moe

sgl-project / mini-sglang

thu-ml / SLA

thu-ml / TurboDiffusion

tile-ai / TileRT

flashinfer-ai / cubloaty

dsl-learn / cutile-learn

NVIDIA / cutile-python

NVIDIA / TileGym

mit-han-lab / fouroversix

deepseek-ai / DeepSeek-Math-V2

mit-han-lab / fastrl

osayamenja / FlashMoE

radixark / miles

intel / auto-round