gau-nernst

Thien Tran gau-nernst

198 followers · 83 following

Achievements

x2 x3 x3

Achievements

x2 x3 x3

Stars

benenzhu / learn-ptx

Python 7 Updated Dec 19, 2025

Aleph-Alpha / Alpha-MoE

Cuda 43 10 Updated Dec 10, 2025

SzymonOzog / CPU_Matmul

C++ 1 Updated Dec 16, 2025

Tongyi-MAI / Z-Image

Python 7,800 458 Updated Dec 24, 2025

black-forest-labs / flux2

Official inference repo for FLUX.2 models

Python 1,267 64 Updated Dec 1, 2025

alexzhang13 / rlm-minimal

Super basic implementation (gist-like) of RLMs with REPL environments.

Python 289 43 Updated Oct 17, 2025

bytetriper / RAE

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,649 55 Updated Nov 15, 2025

meta-pytorch / torchforge

PyTorch-native post-training at scale

Python 574 71 Updated Dec 24, 2025

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 11,013 2,919 Updated Dec 23, 2025

ROCm / iris

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Python 140 27 Updated Dec 22, 2025

meta-pytorch / BackendBench

Ship correct and fast LLM kernels to PyTorch

Python 127 15 Updated Dec 18, 2025

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 845 145 Updated Sep 26, 2025

QwenLM / Qwen-Image

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 6,502 367 Updated Dec 24, 2025

microsoft / markitdown

Python tool for converting files and office documents to Markdown.

Python 84,553 4,869 Updated Dec 1, 2025

openai / simple-evals

Python 4,247 461 Updated Jul 31, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,919 291 Updated Dec 22, 2025

seb-v / fp32_sgemm_amd

Super fast FP32 matrix multiplication on RDNA3

Assembly 81 3 Updated Mar 30, 2025

ROCm / aiter

AI Tensor Engine for ROCm

Python 327 164 Updated Dec 24, 2025

facebookresearch / fastgen

Simple high-throughput inference library

Python 153 10 Updated May 14, 2025

ROCm / triton

Forked from triton-lang/triton

Development repository for the Triton language and compiler

Python 138 37 Updated Dec 23, 2025

gpu-mode / popcorn-cli

Rust 82 18 Updated Dec 6, 2025

dropbox / hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Python 902 88 Updated Dec 18, 2025

nari-labs / dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 18,983 1,653 Updated Nov 19, 2025

IBM / triton-dejavu

Framework to reduce autotune overhead to zero for well known deployments.

Python 91 16 Updated Sep 19, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,289 114 Updated Dec 16, 2025

mxmlnkn / rapidgzip

Gzip Decompression and Random Access for Modern Multi-Core Machines

Python 441 13 Updated Nov 30, 2025

mingfeima / sglang

Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 6 4 Updated Dec 24, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,934 922 Updated Dec 15, 2025

gpu-mode / reference-kernels

Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!

Python 179 87 Updated Dec 23, 2025

vllm-project / aibrix

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,481 503 Updated Dec 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thien Tran gau-nernst

Achievements

Achievements

Block or report gau-nernst

Stars

benenzhu / learn-ptx

Aleph-Alpha / Alpha-MoE

SzymonOzog / CPU_Matmul

Tongyi-MAI / Z-Image

black-forest-labs / flux2

alexzhang13 / rlm-minimal

bytetriper / RAE

meta-pytorch / torchforge

EleutherAI / lm-evaluation-harness

ROCm / iris

meta-pytorch / BackendBench

NVIDIA / multi-gpu-programming-models

QwenLM / Qwen-Image

microsoft / markitdown

openai / simple-evals

thu-ml / SageAttention

seb-v / fp32_sgemm_amd

ROCm / aiter

facebookresearch / fastgen

ROCm / triton

gpu-mode / popcorn-cli

dropbox / hqq

nari-labs / dia

IBM / triton-dejavu

ByteDance-Seed / Triton-distributed

mxmlnkn / rapidgzip

mingfeima / sglang

deepseek-ai / FlashMLA

gpu-mode / reference-kernels

vllm-project / aibrix