Skip to content
View marwage's full-sized avatar

Block or report marwage

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,999 12,136 Updated Dec 23, 2025

Deploy the SC2 system on Kubernetes.

Python 10 5 Updated May 7, 2025

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,209 267 Updated Dec 19, 2025

An instrumentation tool to monitor queue depths in tokio channels

Rust 11 Updated Oct 29, 2025

DDGS | Dux Distributed Global Search. A metasearch library that aggregates results from diverse web search services

Python 2,030 196 Updated Dec 19, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 725 73 Updated Nov 30, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,481 503 Updated Dec 23, 2025

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Python 203 11 Updated Sep 21, 2024

[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications with Ayo

Python 56 7 Updated Aug 5, 2025

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

Shell 48,098 3,373 Updated Dec 20, 2025

Tempo is a system for declarative, efficient, end-to-end compiled dynamic deep learning

Python 25 3 Updated Oct 21, 2025

Large Language Model (LLM) Systems Paper List

1,699 90 Updated Dec 22, 2025

Lightweight coding agent that runs in your terminal

Rust 54,548 6,930 Updated Dec 23, 2025

Analyze computation-communication overlap in V3/R1.

1,128 144 Updated Mar 21, 2025

Replace 'hub' with 'ingest' in any GitHub URL to get a prompt-friendly extract of a codebase

Python 13,433 1,012 Updated Dec 19, 2025

A resilient distributed training framework

Python 96 9 Updated Apr 11, 2024

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 3,006 110 Updated Dec 22, 2025

Watches files and records, or triggers actions, when they change.

C++ 13,436 1,045 Updated Dec 22, 2025

Dynamic resources changes for multi-dimensional parallelism training

Go 30 4 Updated Aug 22, 2025

Fully open reproduction of DeepSeek-R1

Python 25,749 2,406 Updated Nov 24, 2025

Golang bindings for Nvidia Datacenter GPU Manager (DCGM)

C 144 42 Updated Dec 3, 2025

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 633 77 Updated Dec 4, 2025

Recipes to scale inference-time compute of open models

Python 1,120 131 Updated May 22, 2025

Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.

11,723 1,929 Updated Aug 31, 2023

Use your Neovim like using Cursor AI IDE!

Lua 16,810 769 Updated Dec 22, 2025

A low-latency & high-throughput serving engine for LLMs

Python 458 58 Updated Oct 16, 2025

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Python 4,219 367 Updated Sep 11, 2025

Minimal, single page, smooth-scrolling theme for Hugo static site generator.

HTML 710 273 Updated Jan 30, 2025

Microsoft Azure Traces

Jupyter Notebook 1,050 172 Updated Dec 6, 2025
Next