Hygge02

Hygge Hygge02

10 followers · 13 following

Achievements

Stars

LeanModels / DFloat11

DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference

Python 600 36 Updated Nov 24, 2025

google-research / sputnik

A library of GPU kernels for sparse matrix operations.

C++ 283 53 Updated Nov 24, 2020

xxyux / SpInfer

SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs

Cuda 61 16 Updated Mar 25, 2025

NVIDIA / TileGym

Helpful kernel tutorials and examples for tile-based GPU programming

Python 625 43 Updated Feb 2, 2026

RUC-NLPIR / FlashRAG

⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)

Python 3,302 284 Updated Nov 26, 2025

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 567 51 Updated Jan 19, 2026

mit-han-lab / fastrl

[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

Python 131 12 Updated Dec 5, 2025

mit-han-lab / flash-moba

C++ 221 7 Updated Nov 19, 2025

HeXiao-55 / PTQTP

The official repository for PTQTP implementation

10 Updated Sep 24, 2025

flash-algo / flash-sparse-attention

Trainable fast and memory-efficient sparse attention

Python 525 49 Updated Feb 1, 2026

Relaxed-System-Lab / Flash-Sparse-Attention

🚀🚀 Efficient implementations of Native Sparse Attention

Python 1,042 12 Updated Sep 29, 2025

microsoft / SparTA

Python 164 12 Updated Jul 22, 2024

fanshiqing / grouped_gemm

Forked from tgale96/grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 184 48 Updated Dec 16, 2025

interestingLSY / swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 313 34 Updated Jun 10, 2025

bytetriper / RAE

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,743 64 Updated Jan 20, 2026

NVlabs / QeRL

QeRL enables RL for 32B LLMs on a single H100 GPU.

Python 480 48 Updated Nov 27, 2025

mit-han-lab / streaming-vlm

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Python 865 57 Updated Oct 15, 2025

lm-sys / RouteLLM

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality

Python 4,575 360 Updated Aug 10, 2024

allenai / OLMo

Modeling, training, eval, and inference code for OLMo

Python 6,301 701 Updated Nov 24, 2025

ucker / why-low-precision-training-fails

Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Python 39 3 Updated Oct 16, 2025

IST-DASLab / Quartet

Jupyter Notebook 118 11 Updated Jan 8, 2026

hustvl / LightningDiT

[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Python 1,385 51 Updated Dec 16, 2025

IST-DASLab / llmq

Quantized LLM training in pure CUDA/C++.

C++ 237 14 Updated Jan 20, 2026

Labman42 / JetEngine

A lightweight Inference Engine built for block diffusion models

Python 40 5 Updated Dec 9, 2025

ziplab / VolSplat

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction

Python 193 6 Updated Jan 23, 2026

yaof20 / Flash-RL

Implementation for FP8/INT8 Rollout for RL training without performence drop.

Python 288 19 Updated Nov 7, 2025

thinking-machines-lab / batch_invariant_ops

Python 957 73 Updated Nov 4, 2025

AI-Infra-Team / awesome-papers

Paper reading and discussion notes, covering AI frameworks, distributed systems, cluster management, etc.

53 1 Updated Nov 11, 2025

Multi-LLM / prism-research

Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.

Python 57 2 Updated Aug 15, 2025

sii-research / Metis

Python 12 6 Updated Jan 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hygge Hygge02

Achievements

Achievements

Block or report Hygge02

Stars

LeanModels / DFloat11

google-research / sputnik

xxyux / SpInfer

NVIDIA / TileGym

RUC-NLPIR / FlashRAG

Dao-AILab / sonic-moe

mit-han-lab / fastrl

mit-han-lab / flash-moba

HeXiao-55 / PTQTP

flash-algo / flash-sparse-attention

Relaxed-System-Lab / Flash-Sparse-Attention

microsoft / SparTA

fanshiqing / grouped_gemm

interestingLSY / swiftLLM

bytetriper / RAE

NVlabs / QeRL

mit-han-lab / streaming-vlm

lm-sys / RouteLLM

allenai / OLMo

ucker / why-low-precision-training-fails

IST-DASLab / Quartet

hustvl / LightningDiT

IST-DASLab / llmq

Labman42 / JetEngine

ziplab / VolSplat

yaof20 / Flash-RL

thinking-machines-lab / batch_invariant_ops

AI-Infra-Team / awesome-papers

Multi-LLM / prism-research

sii-research / Metis