Lists (20)
Sort Name ascending (A-Z)
Stars
Democratizing Reinforcement Learning for LLMs
A blazingly fast JSON serializing & deserializing library
bpftop provides a dynamic real-time view of running eBPF programs. It displays the average runtime, events per second, and estimated total CPU % for each program.
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
Large Language Model (LLM) Systems Paper List
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
The official implementation of OSDI'25 paper BlitzScale
My learning notes/codes for ML SYS.
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
wolfecameron / nanoMoE
Forked from karpathy/nanoGPTAn extension of the nanoGPT repository for training small MOE models.
Convert PDF to markdown + JSON quickly with high accuracy
flash attention tutorial written in python, triton, cuda, cutlass
5ire is a cross-platform desktop AI assistant, MCP client. It compatible with major service providers, supports local knowledge base and tools via model context protocol servers .
12306 MCP Server 是一个基于 Model Context Protocol (MCP) 的高性能火车票查询后端系统。它通过标准化接口提供官方 12306 的实时数据服务,包括余票查询、车站信息、列车经停站、中转换乘方案等核心功能。
Kimi K2 is the large language model series developed by Moonshot AI team
Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM
Everything about the SmolLM and SmolVLM family of models
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
Disaggregated serving system for Large Language Models (LLMs).
DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.
Production-grade client-side tracing, profiling, and analysis for complex software systems.
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Windows Precision Touchpad Driver Implementation for Apple MacBook / Magic Trackpad
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.