Skip to content
View kangtegong's full-sized avatar

Block or report kangtegong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

eBPF XDP on GPU

C++ 15 1 Updated Oct 5, 2025

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 2,251 365 Updated Aug 14, 2025

LLM inference in C/C++

C++ 116,382 19,544 Updated Jun 13, 2026

Python bindings and high-level abstractions for Linux io_uring-based asynchronous I/O.

Python 1 Updated Apr 10, 2026

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,366 591 Updated Oct 28, 2024

A Datacenter Scale Distributed Inference Serving Framework

Rust 7,250 1,243 Updated Jun 13, 2026

Library providing helpers for the Linux kernel io_uring support

C 3,684 518 Updated Jun 11, 2026

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 980 83 Updated May 28, 2026

[IJAIT 2021] MABWiser: Contextual Multi-Armed Bandits Library

Python 281 47 Updated Sep 5, 2024

Dynamic resources changes for multi-dimensional parallelism training

Go 31 5 Updated Aug 22, 2025

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 306 23 Updated May 1, 2025
Python 20 17 Updated Nov 27, 2025

Accurate traffic splitting (multipath routing) technique for software switch (implemented on Open vSwitch)

C 21 16 Updated Apr 7, 2025
Python 20 15 Updated Oct 10, 2024

🚨 Prediction of the Resource Consumption of Distributed Deep Learning Systems

Python 32 26 Updated Feb 6, 2023

eBPF implementation that runs on top of Windows

C 3,497 287 Updated Jun 12, 2026

GPU-accelerated LLM Training Simulator

Makefile 52 5 Updated Jun 26, 2025

Userspace/GPU eBPF VM with llvm JIT/AOT compiler

C++ 134 17 Updated May 25, 2026

AI/GPU flame graph

C++ 259 9 Updated Jun 9, 2026

A Linux eBPF rootkit with a backdoor, C2, library injection, execution hijacking, persistence and stealth capabilities.

C 1,962 241 Updated Apr 7, 2024

Userspace eBPF runtime for Observability, Network, GPU & General Extensions Framework

C++ 1,494 177 Updated Jun 8, 2026

eBPF-based Security Observability and Runtime Enforcement

C 4,750 560 Updated Jun 13, 2026

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Python 8,813 1,301 Updated Jun 13, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 82,768 18,022 Updated Jun 13, 2026

Linux Runtime Security and Forensics using eBPF

Go 4,513 501 Updated Jun 9, 2026

🗜️ Codebase-digest is your AI-friendly codebase packer and analyzer. Features 60+ coding prompts and generates structured overviews with metrics. Ideal for feeding projects to LLMs like GPT-4, Clau…

Python 387 35 Updated Oct 21, 2024

Pipeline Parallelism for PyTorch

Python 785 87 Updated Aug 21, 2024

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,891 1,905 Updated Jun 11, 2026

NumPy aware dynamic Python compiler using LLVM

Python 11,043 1,278 Updated Jun 12, 2026
Next