Skip to content
View kiraadven's full-sized avatar

Block or report kiraadven

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,265 983 Updated May 13, 2026

c++后台服务器开发面经或八股总结!(有深度有广度,和仅有概念的总结文章不同!)

2,198 276 Updated Sep 9, 2024

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 1,040 111 Updated May 16, 2026

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 534 79 Updated May 5, 2026

Efficient reliable UDP unicast, UDP multicast, and IPC message transport

Java 8,638 1,038 Updated May 17, 2026

High-performance limit order book engine with C++ core and Python SDK. Processes 20M+ msgs/sec with µs latency. Supports real crypto/equity data replay, spread/imbalance/impact analytics, and backt…

C++ 47 20 Updated Aug 30, 2025

Free, open source, a high frequency trading and market making backtesting and trading bot, which accounts for limit orders, queue positions, and latencies, utilizing full tick data for trades and o…

Rust 4,074 790 Updated Dec 23, 2025

分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等

Jupyter Notebook 2,281 193 Updated May 8, 2026

NVIDIA Inference Xfer Library (NIXL)

C++ 1,035 319 Updated May 17, 2026

保研/求职latex简历模版

TeX 34 4 Updated Mar 23, 2025

Material for gpu-mode lectures

Jupyter Notebook 6,082 611 Updated May 9, 2026

Supercharge Your LLM with the Fastest KV Cache Layer

Python 8,283 1,179 Updated May 18, 2026

The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.

Rust 191,813 109,919 Updated May 16, 2026

High Performance LLM Inference Operator Library

C++ 849 84 Updated Apr 13, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 9,633 1,245 Updated May 13, 2026

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 42,572 7,583 Updated May 17, 2026

Fork of vLLM for developing the paper "Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference"

Python 8 2 Updated Mar 5, 2026

Efficient and easy multi-instance LLM serving

Python 549 49 Updated Mar 12, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 80,280 16,883 Updated May 18, 2026

搜集、整理、维护 Surge / Quantumult (X) / Shadowrocket / Surfboard / clash (Premium) 实用规则。

JavaScript 11,265 1,793 Updated Mar 19, 2024
JavaScript 26 Updated Mar 4, 2026

Nano vLLM

Python 13,471 2,101 Updated Apr 26, 2026

Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM stan…

C++ 551 165 Updated May 14, 2026

The Replica Dataset v1 as published in https://arxiv.org/abs/1906.05797 .

C++ 1,260 111 Updated Jul 22, 2024

Pytorch package to compute Chamfer distance between point sets (pointclouds).

Cuda 353 51 Updated Apr 10, 2024

快速搭建个人VPN/科学上网/翻墙/教程/ssr/ss/bbr/梯子搭建/自建机场/自由上网/代理服务/VPN/2023最新教程

Shell 2,394 247 Updated Apr 29, 2022

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 113,325 13,272 Updated May 17, 2026

Official code release for ConceptGraphs

Python 868 124 Updated Oct 16, 2025

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Python 10,119 1,028 Updated Aug 12, 2024

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 33,506 4,009 Updated Mar 25, 2026
Next