Skip to content
View foraxe's full-sized avatar
😀
Keep improving!!!
😀
Keep improving!!!

Block or report foraxe

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Triton-only attention backend for vLLM

Python 26 7 Updated Mar 17, 2026

VUA stands for 'VAST Undivided Attention'. It's a global KVCache storage solution optimizing LLM time to first token (TTFT) and GPU utilization.

Python 38 9 Updated Mar 12, 2026

Agent-friendly GPU profile-query CLI

Rust 76 2 Updated Jun 12, 2026

Efficient reliable UDP unicast, UDP multicast, and IPC message transport

Java 1 Updated Jun 1, 2026

Efficient reliable UDP unicast, UDP multicast, and IPC message transport

Java 8,684 1,051 Updated Jun 11, 2026
Python 250 27 Updated Jun 9, 2026

A comprehensive knowledge base for Huawei Ascend NPU development, structured as distributed Agent Skills. https://ascend-ai-coding.github.io/awesome-ascend-skills/

Python 99 46 Updated Jun 14, 2026

An Online Deep Learning Interface for HPC programs on NVIDIA GPUs

C++ 196 35 Updated Jun 11, 2026

Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x

Python 120 10 Updated Jun 10, 2026

mKernel: fast multi-node, multi-GPU fused kernels

Cuda 231 22 Updated Jun 8, 2026

A library of HTML slide templates designed so any coding agent can pick the right one and produce a beautiful deck on the user's behalf, automatically.

HTML 2,832 253 Updated Jun 9, 2026

Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown

TypeScript 88,645 9,040 Updated Jun 14, 2026

CLI for X/Twitter API v2 -- post, search, like, bookmark from your terminal

Python 409 24 Updated Jun 7, 2026

一个接入微信的本地生活 Agent Bridge,让 Codex / Claude Code 拥有时间感、行踪感、随机唤醒和自主唤醒能力,用主动陪伴替代所有番茄钟和效率工具,自动记录日记、维护生活时间轴、发送文件和表情包,并调用 MCP / 本地工具。

JavaScript 865 79 Updated Jun 8, 2026

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.

Python 2,936 404 Updated Jun 12, 2026

Implementation of "UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification"

Python 39 2 Updated May 8, 2026

TokenSpeed is a speed-of-light LLM inference engine.

Python 1,427 156 Updated Jun 14, 2026

A PyTorch native platform for training generative AI models

Python 5,436 860 Updated Jun 14, 2026

Official repository for "SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space"

Python 26 1 Updated May 7, 2026

Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini C…

TypeScript 59,123 4,913 Updated Jun 11, 2026

An empirical study of benchmarking LLM inference with KV cache offloading using vLLM and LMCache on NVIDIA GB200 with high-bandwidth NVLink-C2C .

Python 4 Updated Dec 20, 2025

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

Python 311 26 Updated May 8, 2026

Clash官网各版本Clash下载地址及备份下载地址

5,146 1,246 Updated Jun 2, 2026

Utility to convert between various subscription format

C++ 16,736 3,800 Updated Feb 27, 2026

Wrap Gemini CLI, Antigravity, ChatGPT Codex, Claude Code, Grok Build as an OpenAI/Gemini/Claude/Codex compatible API service, allowing you to enjoy the free Gemini 3.1 Pro, GPT 5.5, Grok 4.3, Claud…

Go 37,496 6,179 Updated Jun 14, 2026

Clash Mihomo for iOS/MacOS/Android/Windows/Linux

Dart 7,690 468 Updated Jun 11, 2026

A rule-based tunnel for Android.

Kotlin 41,309 2,658 Updated Jun 13, 2026

Warp is an agentic development environment, born out of the terminal.

Rust 61,716 5,006 Updated Jun 14, 2026

A Codex-powered Chrome side-panel assistant for page context, tabs, voice, and image workflows.

TypeScript 1,135 116 Updated May 10, 2026
Next