Skip to content
View Hygge02's full-sized avatar

Block or report Hygge02

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This is the official PyTorch implementation of "Video-BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation."

Python 1 Updated Nov 6, 2025

Project homepage of Pyramid sparse attention

TeX 4 Updated Dec 14, 2025

This is the official PyTorch implementation of "BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation."

Python 41 5 Updated Oct 9, 2025

A project implementing various agentic RL based on the Slime post-training framework

Python 336 18 Updated Apr 11, 2026

TriAttention — Efficient long reasoning with trigonometric KV cache compression. Enables OpenClaw local deployment on memory-constrained GPUs.

Python 580 44 Updated Apr 15, 2026

[CVPRW 2026 Oral] Less Detail, Better Answers: Degradation-Driven Prompting for VQA

Python 19 1 Updated Mar 31, 2026

A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-performance systems.

Python 115 6 Updated Apr 15, 2026

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 176 30 Updated Dec 24, 2025

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 474 23 Updated Apr 18, 2026

🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.

Jupyter Notebook 3,523 290 Updated Mar 27, 2026

LM engine is a library for pretraining/finetuning LLMs

Python 164 29 Updated Apr 17, 2026

A Model Context Protocol (MCP) server for creating, reading, and manipulating Microsoft Word documents. This server enables AI assistants to work with Word documents through a standardized interfac…

Python 1,863 249 Updated Dec 31, 2025

Official repository of paper [FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic, NeurIPS 2025]

Python 21 Updated Dec 2, 2025
Python 127 13 Updated Feb 17, 2026

Artifact for PPoPP'26 "RoMeo: Mitigating Dual-dimensional Outliers with Rotated Mixed Precision Quantization"

Python 9 Updated Jan 9, 2026

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 503 53 Updated Jan 20, 2026

DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference

Python 618 38 Updated Nov 24, 2025

A library of GPU kernels for sparse matrix operations.

C++ 286 53 Updated Nov 24, 2020

SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs

Cuda 64 16 Updated Mar 25, 2025

Helpful kernel tutorials and examples for tile-based GPU programming

Python 703 66 Updated Apr 17, 2026

⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)

Python 3,460 297 Updated Apr 10, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 635 73 Updated Apr 17, 2026

[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

Python 162 17 Updated Feb 27, 2026

The official repository for PTQTP implementation

12 Updated Sep 24, 2025

Trainable fast and memory-efficient sparse attention

Python 623 55 Updated Apr 14, 2026

🚀🚀 Efficient implementations of Native Sparse Attention

Python 747 15 Updated Sep 29, 2025
Python 167 12 Updated Jul 22, 2024

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 186 50 Updated Apr 8, 2026
Next