Skip to content
View llsj14's full-sized avatar

Block or report llsj14

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies

Rust 63,512 3,907 Updated Jun 17, 2026

Open Source AI Platform - AI Chat with advanced features that works with every LLM

Python 30,403 4,158 Updated Jun 18, 2026

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Python 9,308 1,344 Updated Jun 18, 2026

DSPy: The framework for programming—not prompting—language models

Python 35,114 2,979 Updated Jun 16, 2026

ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…

Python 12,303 1,126 Updated Jun 18, 2026

Common recipes to run vLLM

JavaScript 864 306 Updated Jun 18, 2026

Offline optimization of your disaggregated Dynamo graph

Python 341 128 Updated Jun 18, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 7,288 1,257 Updated Jun 18, 2026

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 268 38 Updated May 5, 2026

High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference

Python 52 4 Updated Jun 15, 2026
Python 287 53 Updated Jun 18, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,608 859 Updated Jun 18, 2026

A collection of prompts, system prompts and LLM instructions

HTML 5,140 696 Updated Feb 21, 2026

Extracted system prompts from Anthropic - Claude Fable 5, Opus 4.8, Claude Code, Claude Design. OpenAI - ChatGPT 5.5 Thinking, GPT 5.5 Instant, Codex. Google - Gemini 3.5 Flash, 3.1 Pro, Antigravit…

JavaScript 43,264 7,171 Updated Jun 18, 2026

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 906 192 Updated May 9, 2026

🚀 Efficient implementations for emerging model architectures

Python 5,232 560 Updated Jun 18, 2026

Material for gpu-mode lectures

Jupyter Notebook 6,189 623 Updated Jun 15, 2026

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

Python 9,654 971 Updated Jun 17, 2026

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 355 215 Updated Jun 18, 2026

Easy, Fast, and Scalable Multimodal AI

Python 126 10 Updated Jun 2, 2026

[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)

Python 221 13 Updated Feb 11, 2026

A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do

827 102 Updated Apr 27, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 9,741 1,289 Updated Jun 15, 2026

An Open-Source Asynchronous Coding Agent

Python 10,003 1,137 Updated Jun 18, 2026

LLM-powered multiagent persona simulation for imagination enhancement and business insights.

Jupyter Notebook 7,476 662 Updated May 7, 2026

ToolOrchestra is an end-to-end RL training framework for orchestrating tools and agentic workflows.

Python 740 102 Updated Mar 25, 2026

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python 451 64 Updated Jun 17, 2026

Load Balancer Implementation for Kubernetes in Bare-Metal, Edge, and Virtualization

Go 1,776 209 Updated May 26, 2025

AWS Neuron Deep Learning Containers (DLCs) are a set of Docker images for training and serving models on AWS Trainium and Inferentia instances using AWS Neuron SDK.

Python 22 12 Updated May 22, 2026
Next