Highlights
Stars
deepspeedai / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
Python bindings and high-level abstractions for Linux io_uring-based asynchronous I/O.
Running large language models on a single GPU for throughput-oriented scenarios.
A Datacenter Scale Distributed Inference Serving Framework
Library providing helpers for the Linux kernel io_uring support
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
[IJAIT 2021] MABWiser: Contextual Multi-Armed Bandits Library
Dynamic resources changes for multi-dimensional parallelism training
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Accurate traffic splitting (multipath routing) technique for software switch (implemented on Open vSwitch)
🚨 Prediction of the Resource Consumption of Distributed Deep Learning Systems
eBPF implementation that runs on top of Windows
NASP-THU / multiverse
Forked from harnets/multiverseGPU-accelerated LLM Training Simulator
Userspace/GPU eBPF VM with llvm JIT/AOT compiler
A Linux eBPF rootkit with a backdoor, C2, library injection, execution hijacking, persistence and stealth capabilities.
Userspace eBPF runtime for Observability, Network, GPU & General Extensions Framework
eBPF-based Security Observability and Runtime Enforcement
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
A high-throughput and memory-efficient inference and serving engine for LLMs
Linux Runtime Security and Forensics using eBPF
🗜️ Codebase-digest is your AI-friendly codebase packer and analyzer. Features 60+ coding prompts and generates structured overviews with metrics. Ideal for feeding projects to LLMs like GPT-4, Clau…
CUDA Templates and Python DSLs for High-Performance Linear Algebra
NumPy aware dynamic Python compiler using LLVM