Skip to content
View Hygge02's full-sized avatar

Block or report Hygge02

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This is the official PyTorch implementation of "Video-BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation."

Python 1 Updated Nov 6, 2025

Project homepage of Pyramid sparse attention

TeX 4 Updated Dec 14, 2025

This is the official PyTorch implementation of "BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation."

Python 41 5 Updated Oct 9, 2025

A project implementing various agentic RL based on the Slime post-training framework

Python 333 18 Updated Apr 11, 2026

TriAttention — Efficient long reasoning with trigonometric KV cache compression. Enables OpenClaw local deployment on memory-constrained GPUs.

Python 559 42 Updated Apr 15, 2026

[CVPRW 2026 Oral] Less Detail, Better Answers: Degradation-Driven Prompting for VQA

Python 19 1 Updated Mar 31, 2026

A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-performance systems.

Python 114 6 Updated Apr 15, 2026

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 176 30 Updated Dec 24, 2025

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 474 23 Updated Apr 17, 2026

🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.

Jupyter Notebook 3,515 289 Updated Mar 27, 2026

LM engine is a library for pretraining/finetuning LLMs

Python 164 29 Updated Apr 17, 2026

A Model Context Protocol (MCP) server for creating, reading, and manipulating Microsoft Word documents. This server enables AI assistants to work with Word documents through a standardized interfac…

Python 1,863 249 Updated Dec 31, 2025

Official repository of paper [FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic, NeurIPS 2025]

Python 21 Updated Dec 2, 2025
Python 127 13 Updated Feb 17, 2026

Artifact for PPoPP'26 "RoMeo: Mitigating Dual-dimensional Outliers with Rotated Mixed Precision Quantization"

Python 9 Updated Jan 9, 2026

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 502 53 Updated Jan 20, 2026

DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference

Python 618 38 Updated Nov 24, 2025

A library of GPU kernels for sparse matrix operations.

C++ 286 53 Updated Nov 24, 2020

SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs

Cuda 64 16 Updated Mar 25, 2025

Helpful kernel tutorials and examples for tile-based GPU programming

Python 702 65 Updated Apr 17, 2026

⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)

Python 3,458 297 Updated Apr 10, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 635 73 Updated Apr 15, 2026

[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

Python 162 17 Updated Feb 27, 2026

The official repository for PTQTP implementation

12 Updated Sep 24, 2025

Trainable fast and memory-efficient sparse attention

Python 617 55 Updated Apr 14, 2026

🚀🚀 Efficient implementations of Native Sparse Attention

Python 748 15 Updated Sep 29, 2025
Python 167 12 Updated Jul 22, 2024

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 186 50 Updated Apr 8, 2026
Next