- Vancouver, Canada
-
06:27
(UTC -07:00)
Stars
Offline optimization of your disaggregated Dynamo graph
Buddy-alloc is a memory allocator for no-std Rust, used for embedded environments.
Community maintained hardware plugin for vLLM on Ascend
Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.
Rust library for concurrent data access, using memory-mapped files, zero-copy deserialization, and wait-free synchronization.
Enforce the output format (JSON Schema, Regex etc) of a language model
A Virtual Machine Monitor for modern Cloud workloads. Features include CPU, memory and device hotplug, support for running Windows and Linux guests, device offload with vhost-user and a minimal com…
Official inference framework for 1-bit LLMs
Simple, safe way to store and distribute tensors
Serverless LLM Serving for Everyone.
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
SGLang is a high-performance serving framework for large language models and multimodal models.
QJL: 1-Bit Quantized JL transform for KV Cache Quantization with Zero Overhead
A high-throughput and memory-efficient inference and serving engine for LLMs
A secure container runtime with CRI/OCI interface
A book-in-progress about the Linux kernel and its insides.
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.