amy-77

Follow

🎯

Focusing

Yanlin Qi amy-77

🎯

Focusing

Follow

PhD Student at Universite Paris Cite. Research Interest：KVcache retrival, Efficient LLM, vector search

35 followers · 145 following

diNo Research Group, LIPADE Lab
Paris
08:25 (UTC +02:00)
https://amy-77.github.io/
in/yanlin-qi-456177268

Achievements

Achievements

Stars

MiniMax-AI / MSA

Python 283 24 Updated Jun 15, 2026

HazyResearch / zoology

Understand and test language model architectures on synthetic tasks.

Python 274 52 Updated Mar 22, 2026

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,231 399 Updated Jul 11, 2024

NovaSky-AI / SkyRL

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,997 351 Updated Jun 15, 2026

OpenBMB / InfiniteBench

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

Python 386 33 Updated Sep 25, 2024

antirez / ds4

DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm

C 13,826 1,219 Updated Jun 11, 2026

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 1,216 196 Updated Sep 2, 2025

NVlabs / LongLive

LongLive 2.0: Infra - Long Video Gen

Python 2,336 210 Updated Jun 13, 2026

yuzhenmao / IceCache

Implementation for IceCache: Memory-Efficient KV-cache Management for Long-Sequence LLMs (ICLR 2026).

Python 18 2 Updated Jun 9, 2026

Relaxed-System-Lab / Flash-Sparse-Attention

🚀🚀 Efficient implementations of Native Sparse Attention

Python 619 15 Updated Sep 29, 2025

howarlii / SAQ

Segmented Code Adjustment Quantization (SAQ)

C++ 25 6 Updated Sep 22, 2025

marius-team / quake

Query-Adaptive Vector Search

C++ 74 21 Updated Mar 19, 2026

catswe / flash-attention-residuals

Triton kernels and PyTorch ops for Block Attention Residuals (AttnRes)

Python 82 6 Updated May 29, 2026

qtwang / software-agent-sdk

Forked from OpenHands/software-agent-sdk

A clean, modular SDK for building AI agents with OpenHands V1.

Python 1 Updated Apr 28, 2026

tile-ai / TileOPs

High-performance LLM operator library built on TileLang.

Python 145 41 Updated Jun 14, 2026

LMCache / LMCache

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Python 9,071 1,321 Updated Jun 15, 2026

IBM / SPIRAL

This is an AAAI-2026 conference paper repo.

Jupyter Notebook 11 2 Updated Mar 1, 2026

QwenLM / FlashQLA

high-performance linear attention kernel library built on TileLang

Python 540 47 Updated May 7, 2026

alchaincyf / nuwa-skill

你想蒸馏的下一个员工，何必是同事。蒸馏任何人的思维方式——心智模型、决策启发式、表达DNA。Distill how anyone thinks.

Python 24,358 3,569 Updated Jun 14, 2026

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 6,498 604 Updated Jun 15, 2026

deepseek-ai / TileKernels

A kernel library written in tilelang

Python 1,587 138 Updated Apr 23, 2026

MoonshotAI / FlashKDA

FlashKDA: high-performance Kimi Delta Attention kernels

Cuda 449 38 Updated May 26, 2026

rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 1,016 237 Updated Jun 11, 2026

intel / ScalableVectorSearch

C++ 228 41 Updated Jun 11, 2026

scos-lab / turboquant

TurboQuant reference implementation — KV cache compression with engineering insights (ICLR 2026 paper reproduction)

Python 17 2 Updated Mar 28, 2026

KernelFlow-ops / cuda-optimized-skill

A CUDA kernel optimization toolkit for validation, benchmarking, Nsight Compute profiling, bottleneck analysis, and iterative tuning. It helps improve custom GPU operators with reproducible workflo…

Python 177 17 Updated Apr 22, 2026

OpenLAIR / dr-claw

A Super AI Lab with massive AI Doctors as Assistants. Best IDE for Research via AI Power.

JavaScript 986 102 Updated Jun 12, 2026

VectorDB-NTU / RaBitQ-Library

An official lightweight library for the RaBitQ algorithm and its applications in vector search.

C++ 223 58 Updated Jun 9, 2026

0xSero / turboquant

TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration

Python 1,560 181 Updated Mar 27, 2026

zai-org / GLM-5

GLM-5: From Vibe Coding to Agentic Engineering

3,438 379 Updated May 15, 2026