Lists (1)
Sort Name ascending (A-Z)
Starred repositories
[EMNLP'25 findings] This is the official repo for the paper, HiRAG: Retrieval-Augmented Generation with Hierarchical Knowledge.
Build Real-Time Knowledge Graphs for AI Agents
A Graph RAG System for Evidenced-based Medical Information Retrieval [ACL 2025]
A simple yet fast user space network driver for Intel 10 Gbit/s NICs written from scratch
PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]
KV cache store for distributed LLM inference
Pipeline Parallelism Emulation and Visualization
Implementing DeepSeek R1's GRPO algorithm from scratch
A cheatsheet of modern C++ language and library features.
Flash VSCode is a minimal port of the flash.nvim Neovim plugin
Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
Minimal reproduction of DeepSeek R1-Zero
FlashMLA: Efficient Multi-head Latent Attention Kernels
MoBA: Mixture of Block Attention for Long-Context LLMs
A fast, small C/C++ function call tracer for x86-64/Linux, supports clang & gcc, ftrace, threads, exceptions & shared libraries
A faster int-to-int hashmap implemented in C++.
A toy large model for recommender system based on LLaMA2/SASRec/Meta's generative recommenders. Besides, note and experiments of official implementation for Meta's generative recommenders.
A curated list of awesome C/C++ performance optimization resources: talks, articles, books, libraries, tools, sites, blogs. Inspired by awesome.
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Pip compatible CodeBLEU metric implementation available for linux/macos/win
Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)
[NAACL 2025] Benchmark for Repository-Level Code Generation, focus on Executability, Correctness from Test Cases and Usage of Contexts from Cross-file Dependencies
🚴 Call stack profiler for Python. Shows you why your code is slow!