-
21:27
(UTC +08:00)
Highlights
- Pro
Lists (5)
Sort Name ascending (A-Z)
Stars
AI agents running research on single-GPU nanochat training automatically
Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞
Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search
Asterinas aims to be a production-grade Linux alternative—memory safe, high-performance, and more.
Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions
Manage your dotfiles across multiple diverse machines, securely.
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs
FalconFS is a high-performance distributed file system (DFS) designed for AI workloads.
A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
High-Throughput, Cost-Effective Billion-Scale Vector Search with a Single GPU [to appear in SIGMOD'26]
A low-latency, billion-scale, and updatable graph-based vector store on SSD.
PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]
Distributed KV cache scheduling & offloading libraries
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
FlashInfer: Kernel Library for LLM Serving
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Hackable and optimized Transformers building blocks, supporting a composable construction.