Stars
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
Efficient reliable UDP unicast, UDP multicast, and IPC message transport
High-performance limit order book engine with C++ core and Python SDK. Processes 20M+ msgs/sec with µs latency. Supports real crypto/equity data replay, spread/imbalance/impact analytics, and backt…
Free, open source, a high frequency trading and market making backtesting and trading bot, which accounts for limit orders, queue positions, and latencies, utilizing full tick data for trades and o…
分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等
Supercharge Your LLM with the Fastest KV Cache Layer
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
DeepEP: an efficient expert-parallel communication library
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Fork of vLLM for developing the paper "Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference"
Efficient and easy multi-instance LLM serving
A high-throughput and memory-efficient inference and serving engine for LLMs
搜集、整理、维护 Surge / Quantumult (X) / Shadowrocket / Surfboard / clash (Premium) 实用规则。
Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM stan…
The Replica Dataset v1 as published in https://arxiv.org/abs/1906.05797 .
Pytorch package to compute Chamfer distance between point sets (pointclouds).
快速搭建个人VPN/科学上网/翻墙/教程/ssr/ss/bbr/梯子搭建/自建机场/自由上网/代理服务/VPN/2023最新教程
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Official code release for ConceptGraphs
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image