Skip to content
View marwage's full-sized avatar

Block or report marwage

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,842 12,096 Updated Dec 21, 2025

Deploy the SC2 system on Kubernetes.

Python 10 5 Updated May 7, 2025

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,201 267 Updated Dec 19, 2025

An instrumentation tool to monitor queue depths in tokio channels

Rust 11 Updated Oct 29, 2025

DDGS | Dux Distributed Global Search. A metasearch library that aggregates results from diverse web search services

Python 2,027 196 Updated Dec 19, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 722 73 Updated Nov 30, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,474 500 Updated Dec 13, 2025

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Python 203 11 Updated Sep 21, 2024

[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications with Ayo

Python 55 7 Updated Aug 5, 2025

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

Shell 47,463 3,329 Updated Dec 20, 2025

Tempo is a system for declarative, efficient, end-to-end compiled dynamic deep learning

Python 25 3 Updated Oct 21, 2025

Large Language Model (LLM) Systems Paper List

1,694 89 Updated Dec 13, 2025

Lightweight coding agent that runs in your terminal

Rust 54,384 6,902 Updated Dec 21, 2025

Analyze computation-communication overlap in V3/R1.

1,128 144 Updated Mar 21, 2025

Replace 'hub' with 'ingest' in any GitHub URL to get a prompt-friendly extract of a codebase

Python 13,407 1,007 Updated Dec 19, 2025

A resilient distributed training framework

Python 96 9 Updated Apr 11, 2024

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 3,005 110 Updated Dec 19, 2025

Watches files and records, or triggers actions, when they change.

C++ 13,436 1,045 Updated Dec 20, 2025

Dynamic resources changes for multi-dimensional parallelism training

Go 30 4 Updated Aug 22, 2025

Fully open reproduction of DeepSeek-R1

Python 25,742 2,405 Updated Nov 24, 2025

Golang bindings for Nvidia Datacenter GPU Manager (DCGM)

C 143 42 Updated Dec 3, 2025

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 631 77 Updated Dec 4, 2025

Recipes to scale inference-time compute of open models

Python 1,121 131 Updated May 22, 2025

Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.

11,696 1,923 Updated Aug 31, 2023

Use your Neovim like using Cursor AI IDE!

Lua 16,760 765 Updated Dec 20, 2025

A low-latency & high-throughput serving engine for LLMs

Python 457 58 Updated Oct 16, 2025

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Python 4,219 369 Updated Sep 11, 2025

Minimal, single page, smooth-scrolling theme for Hugo static site generator.

HTML 709 273 Updated Jan 30, 2025

Microsoft Azure Traces

Jupyter Notebook 1,049 172 Updated Dec 6, 2025
Next