SuperCB

Follow

🏠

Working from home

CuiBo SuperCB

🏠

Working from home

Follow

Learning Machine Learning System

52 followers · 178 following

rednote-hilab
Beijing

Achievements

Achievements

Lists (1)

Sort

MLsys

Starred repositories

NVlabs / GatedDeltaNet

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 400 23 Updated Sep 15, 2025

hhy-huang / HiRAG

[EMNLP'25 findings] This is the official repo for the paper, HiRAG: Retrieval-Augmented Generation with Hierarchical Knowledge.

Python 484 74 Updated Nov 19, 2025

getzep / graphiti

Build Real-Time Knowledge Graphs for AI Agents

Python 21,268 2,060 Updated Dec 20, 2025

ImprintLab / Medical-Graph-RAG

A Graph RAG System for Evidenced-based Medical Information Retrieval [ACL 2025]

Python 685 113 Updated Oct 18, 2025

emmericp / ixy

A simple yet fast user space network driver for Intel 10 Gbit/s NICs written from scratch

C 1,284 138 Updated Feb 19, 2022

HW-whistleblower / True-Story-of-Pangu

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,366 1,348 Updated Jul 9, 2025

NoakLiu / PiKV

PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]

Python 48 7 Updated Oct 19, 2025

bytedance / InfiniStore

KV cache store for distributed LLM inference

C++ 377 32 Updated Nov 13, 2025

Victarry / PP-Schedule-Visualization

Pipeline Parallelism Emulation and Visualization

Python 74 7 Updated Jun 12, 2025

policy-gradient / GRPO-Zero

Implementing DeepSeek R1's GRPO algorithm from scratch

Python 1,716 81 Updated Apr 18, 2025

AnthonyCalandra / modern-cpp-features

A cheatsheet of modern C++ language and library features.

21,323 2,252 Updated Apr 5, 2025

cunbidun / flash.vscode

Flash VSCode is a minimal port of the flash.nvim Neovim plugin

TypeScript 10 1 Updated May 3, 2025

McGill-NLP / nano-aha-moment

Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"

Jupyter Notebook 567 54 Updated Oct 7, 2025

Jiayi-Pan / TinyZero

Minimal reproduction of DeepSeek R1-Zero

Python 12,505 1,532 Updated Apr 24, 2025

codefuse-ai / CodeFuse-Embeddings

Python 209 37 Updated Nov 19, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,929 922 Updated Dec 15, 2025

MoonshotAI / MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 2,020 128 Updated Apr 3, 2025

yosefk / funtrace

A fast, small C/C++ function call tracer for x86-64/Linux, supports clang & gcc, ftrace, threads, exceptions & shared libraries

C++ 193 2 Updated Mar 25, 2025

slow-steppers / NeighborHash

A faster int-to-int hashmap implemented in C++.

C++ 50 9 Updated Jan 6, 2025

codefuse-ai / RepoFuse

Python 65 5 Updated Jan 16, 2025

glb400 / Toy-RecLM

A toy large model for recommender system based on LLaMA2/SASRec/Meta's generative recommenders. Besides, note and experiments of official implementation for Meta's generative recommenders.

Python 67 6 Updated Apr 25, 2024

fenbf / AwesomePerfCpp

A curated list of awesome C/C++ performance optimization resources: talks, articles, books, libraries, tools, sites, blogs. Inspired by awesome.

CSS 2,491 260 Updated Sep 22, 2022

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,133 106 Updated Dec 22, 2025

aorwall / moatless-tree-search

Python 128 28 Updated Jun 6, 2025

k4black / codebleu

Pip compatible CodeBLEU metric implementation available for linux/macos/win

Python 127 28 Updated Mar 31, 2025

agiresearch / AIOS

AIOS: AI Agent Operating System

Python 4,880 643 Updated Nov 24, 2025

ashvardanian / less_slow.cpp

Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 1,886 81 Updated Sep 10, 2025

wuye9036 / CppTemplateTutorial

中文的C++ Template的教学指南。与知名书籍C++ Templates不同，该系列教程将C++ Templates作为一门图灵完备的语言来讲授，以求帮助读者对Meta-Programming融会贯通。(正在施工中)

C++ 10,494 1,620 Updated Aug 20, 2024

FSoft-AI4Code / RepoExec

[NAACL 2025] Benchmark for Repository-Level Code Generation, focus on Executability, Correctness from Test Cases and Usage of Contexts from Cross-file Dependencies

Python 39 4 Updated Mar 7, 2025

joerick / pyinstrument

🚴 Call stack profiler for Python. Shows you why your code is slow!

Python 7,540 256 Updated Dec 21, 2025

Starred topics

anomaly-detection

polyhedral-model

embedded-machine-learning

Compiler

Emulator

Database