Stars
profintegra / raptor-rag
Forked from parthsarthi03/raptorThe official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Running large language models on a single GPU for throughput-oriented scenarios.
First Latency-Aware Competitive LLM Agent Benchmark
The simplest, highest-throughput Python interface to S3, GCS & Azure Storage, powered by Rust.
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Supercharge Your LLM with the Fastest KV Cache Layer
Part of the sai3 project that delivers multi-protocol storage access for AI/ML workflows. This project provides a CLI, along with Rust and Python libraries for AI/ML storage workflows. Supporting S…
MLPerf Client is a benchmark for Windows and macOS, focusing on client form factors in ML inference scenarios.
Cutting-edge tool that unlocks the full potential of semantic chunking
kioxia-jp / aisaq-diskann
Forked from microsoft/DiskANNAll-in-Storage Solution based on DiskANN for DRAM-free Approximate Nearest Neighbor Search
wvaske / mlperf-storage
Forked from mlcommons/storageMLPerf™ Storage Benchmark Suite
Carefully crafted Alpine Docker image with glibc (~12MB)
An I/O benchmark for deep Learning applications
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
Build userspace NVMe drivers and storage applications with CUDA support
Newsletter to help busy software engineers become good at system design 👇
This Repository Contains Solution to the Assignments of the Generative Adversarial Networks (GANs) Specialization from deeplearning.ai on Coursera Taught by Sharon Zhou, Eda Zhou, Eric Zelikman
Programming assignments and quizzes from all courses within the GANs specialization offered by deeplearning.ai
Tools to enable the development of mlperf storage
StyleGAN - Official TensorFlow Implementation
Collective Knowledge (CK), Collective Mind (CM/CMX) and MLPerf automations: community-driven projects to facilitate collaborative and reproducible research and to learn how to run AI, ML, and other…
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.