Skip to content
View gau-nernst's full-sized avatar

Block or report gau-nernst

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 7 Updated Dec 19, 2025
Cuda 43 10 Updated Dec 10, 2025
C++ 1 Updated Dec 16, 2025
Python 7,800 458 Updated Dec 24, 2025

Official inference repo for FLUX.2 models

Python 1,267 64 Updated Dec 1, 2025

Super basic implementation (gist-like) of RLMs with REPL environments.

Python 289 43 Updated Oct 17, 2025

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,649 55 Updated Nov 15, 2025

PyTorch-native post-training at scale

Python 574 71 Updated Dec 24, 2025

A framework for few-shot evaluation of language models.

Python 11,013 2,919 Updated Dec 23, 2025

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Python 140 27 Updated Dec 22, 2025

Ship correct and fast LLM kernels to PyTorch

Python 127 15 Updated Dec 18, 2025

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 845 145 Updated Sep 26, 2025

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 6,502 367 Updated Dec 24, 2025

Python tool for converting files and office documents to Markdown.

Python 84,553 4,869 Updated Dec 1, 2025
Python 4,247 461 Updated Jul 31, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,919 291 Updated Dec 22, 2025

Super fast FP32 matrix multiplication on RDNA3

Assembly 81 3 Updated Mar 30, 2025

AI Tensor Engine for ROCm

Python 327 164 Updated Dec 24, 2025

Simple high-throughput inference library

Python 153 10 Updated May 14, 2025

Development repository for the Triton language and compiler

Python 138 37 Updated Dec 23, 2025
Rust 82 18 Updated Dec 6, 2025

Official implementation of Half-Quadratic Quantization (HQQ)

Python 902 88 Updated Dec 18, 2025

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 18,983 1,653 Updated Nov 19, 2025

Framework to reduce autotune overhead to zero for well known deployments.

Python 91 16 Updated Sep 19, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,289 114 Updated Dec 16, 2025

Gzip Decompression and Random Access for Modern Multi-Core Machines

Python 441 13 Updated Nov 30, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 6 4 Updated Dec 24, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,934 922 Updated Dec 15, 2025

Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!

Python 179 87 Updated Dec 23, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,481 503 Updated Dec 23, 2025
Next