Skip to content
View amy-77's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report amy-77

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 283 24 Updated Jun 15, 2026

Understand and test language model architectures on synthetic tasks.

Python 274 52 Updated Mar 22, 2026

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,231 399 Updated Jul 11, 2024

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,997 351 Updated Jun 15, 2026

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

Python 386 33 Updated Sep 25, 2024

DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm

C 13,826 1,219 Updated Jun 11, 2026

Fast CUDA matrix multiplication from scratch

Cuda 1,216 196 Updated Sep 2, 2025

LongLive 2.0: Infra - Long Video Gen

Python 2,336 210 Updated Jun 13, 2026

Implementation for IceCache: Memory-Efficient KV-cache Management for Long-Sequence LLMs (ICLR 2026).

Python 18 2 Updated Jun 9, 2026

🚀🚀 Efficient implementations of Native Sparse Attention

Python 619 15 Updated Sep 29, 2025

Segmented Code Adjustment Quantization (SAQ)

C++ 25 6 Updated Sep 22, 2025

Query-Adaptive Vector Search

C++ 74 21 Updated Mar 19, 2026

Triton kernels and PyTorch ops for Block Attention Residuals (AttnRes)

Python 82 6 Updated May 29, 2026

A clean, modular SDK for building AI agents with OpenHands V1.

Python 1 Updated Apr 28, 2026

High-performance LLM operator library built on TileLang.

Python 145 41 Updated Jun 14, 2026

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Python 9,071 1,321 Updated Jun 15, 2026

This is an AAAI-2026 conference paper repo.

Jupyter Notebook 11 2 Updated Mar 1, 2026

high-performance linear attention kernel library built on TileLang

Python 540 47 Updated May 7, 2026

你想蒸馏的下一个员工,何必是同事。蒸馏任何人的思维方式——心智模型、决策启发式、表达DNA。Distill how anyone thinks.

Python 24,358 3,569 Updated Jun 14, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 6,498 604 Updated Jun 15, 2026

A kernel library written in tilelang

Python 1,587 138 Updated Apr 23, 2026

FlashKDA: high-performance Kimi Delta Attention kernels

Cuda 449 38 Updated May 26, 2026

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 1,016 237 Updated Jun 11, 2026

TurboQuant reference implementation — KV cache compression with engineering insights (ICLR 2026 paper reproduction)

Python 17 2 Updated Mar 28, 2026

A CUDA kernel optimization toolkit for validation, benchmarking, Nsight Compute profiling, bottleneck analysis, and iterative tuning. It helps improve custom GPU operators with reproducible workflo…

Python 177 17 Updated Apr 22, 2026

A Super AI Lab with massive AI Doctors as Assistants. Best IDE for Research via AI Power.

JavaScript 986 102 Updated Jun 12, 2026

An official lightweight library for the RaBitQ algorithm and its applications in vector search.

C++ 223 58 Updated Jun 9, 2026

TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration

Python 1,560 181 Updated Mar 27, 2026

GLM-5: From Vibe Coding to Agentic Engineering

3,438 379 Updated May 15, 2026
Next