Drop-in TaylorSeer/HiCache basis upgrade — training-free diffusion acceleration via a Dynamic Mode Decomposition (Prony) exponential feature-forecast basis. Not the SGLang KV-cache HiCache.

Python 8 1 Updated Jun 15, 2026

NX-AI / xlstm

Official repository of the xLSTM.

Python 2,175 184 Updated May 28, 2026

TongmingLAIC / AKO4X

Agentic Kernel Optimization — advanced & eXtensible: a closed-loop, campaign-based multi-agent system for optimizing GPU kernels (benchmark-swappable; default flashinfer-bench).

Python 55 10 Updated May 31, 2026

TongmingLAIC / AKO4ALL

Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language

Python 299 21 Updated May 31, 2026

ISEEKYAN / mlite

Forked from NVIDIA/Megatron-LM

Ongoing research training transformer models at scale

Python 10 Updated Jun 22, 2026

Zishan-Shao / FlashSVD

[AAAI 2026] Official implementation of "FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models". If you find this repository helpful, please consider starring 🌟 it to support the p…

Python 17 2 Updated May 1, 2026

fenglang918 / HiCache

HiCache: Hermite Polynomial-based Feature Cache for diffusion inference

Python 14 1 Updated Jan 27, 2026

omnigent-ai / omnigent

Omnigent is an open-source AI agent framework and meta-harness: orchestrate Claude Code, Codex, Cursor, Pi, and custom agents — swap harnesses without rewriting, enforce policies and sandboxing, an…

Python 4,395 499 Updated Jun 22, 2026

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,168 148 Updated Mar 21, 2025

RightNow-AI / AutoMegaKernel

An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode, paper: https://arxiv.org/abs/2606.09682

Python 70 9 Updated Jun 18, 2026

kunitoki / sonic-skills

Modular Markdown-based audio skills for AI agents and developers, covering signal processing, synthesis, effects, analysis, and spatial audio.

Shell 14 1 Updated May 21, 2026

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,327 223 Updated Jun 19, 2026

marin-community / marin

Open-source framework for the research and development of foundation models.

Python 1,128 133 Updated Jun 22, 2026

vllm-project / vime

An LLM post-training framework with vLLM for RL Scaling

Python 290 30 Updated Jun 22, 2026

littsk / flexible-flash-attention

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python 8 2 Updated Jun 22, 2026

Jiang Jiwen JiwenJ

Lists (8)

AI

AutoML

Bayesian optimisation

CS

cv_object_detection

finetune

multimodal

NJU

Starred repositories

llm-serving

bayesian-optimization

pytorch-implementation

object-detection