Skip to content
View Fighter141's full-sized avatar
🎏
Focusing
🎏
Focusing

Block or report Fighter141

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

反重力Agent代理一键脚本,支持WSL、SSH远程

Shell 86 6 Updated Dec 21, 2025

Sparser Block-Sparse Attention via Token Permutation

Python 28 1 Updated Oct 27, 2025

The evaluation framework for training-free sparse attention in LLMs

Python 108 8 Updated Oct 13, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 96,121 26,352 Updated Dec 24, 2025

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Python 180 15 Updated Sep 23, 2025

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,267 1,234 Updated Nov 4, 2025

KaHIP -- Karlsruhe HIGH Quality Partitioning.

C++ 469 105 Updated Nov 4, 2025

A Python wrapper around Metis, a graph partitioning package

C 192 36 Updated Dec 15, 2025

ParMETIS - Parallel Graph Partitioning and Fill-reducing Matrix Ordering

C 167 62 Updated Dec 8, 2023

METIS - Serial Graph Partitioning and Fill-reducing Matrix Ordering

C 958 199 Updated Jul 4, 2025

Intercept Google Antigravity IDE API calls and use your own Gemini API token

Python 34 10 Updated Dec 15, 2025

these are custom recipes of nvidia nsight system post collection analysis.

Python 15 1 Updated Nov 7, 2025

Official Implementation of APB (ACL 2025 main Oral)

C++ 32 4 Updated Feb 22, 2025

Code repository for the SOSP'25 paper DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism.

Python 13 2 Updated Nov 28, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 21,943 3,856 Updated Dec 24, 2025

Fast and memory-efficient exact kmeans

Python 131 8 Updated Nov 11, 2025

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Python 606 32 Updated Dec 9, 2025

Large Context Attention

Python 755 52 Updated Oct 13, 2025

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 796 56 Updated Mar 6, 2025

Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"

Python 716 63 Updated Jan 7, 2024

A sparse attention kernel supporting mix sparse patterns

C++ 411 39 Updated Dec 16, 2025

一款基于 sing-box 的游戏加速器,采用 Wails 框架开发

Go 53 8 Updated Oct 8, 2025

[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring

Python 261 19 Updated Jul 6, 2025

Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Python 159 9 Updated Oct 13, 2025

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,169 73 Updated Sep 30, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 726 73 Updated Nov 30, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,347 614 Updated Dec 24, 2025

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Python 549 33 Updated May 16, 2025
Next