Skip to content
View junuxyz's full-sized avatar

Block or report junuxyz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A simple, fast and robust program-aware agentic inference system.

Python 257 20 Updated Mar 16, 2026

rvLLM: High-performance LLM inference in Rust. Drop-in vLLM replacement.

Rust 370 37 Updated Apr 5, 2026

Semantic search over videos using Gemini Embedding 2 or Qwen3-VL.

Python 2,816 269 Updated Apr 3, 2026

MCP server and Claude Code skill for Excalidraw — programmatic canvas toolkit to create, edit, and export diagrams via AI agents with real-time canvas sync.

JavaScript 1,625 158 Updated Mar 6, 2026

A low-latency & high-throughput serving engine for LLMs

Python 489 63 Updated Jan 8, 2026

RBLN Model Zoo — Compile once. Deploy anywhere.

Python 33 7 Updated Mar 27, 2026

⚡ A seamless integration of HuggingFace Transformers & Diffusers with RBLN SDK for efficient inference on RBLN NPUs.

Python 17 3 Updated Apr 4, 2026

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 272 20 Updated Apr 3, 2026

A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.

Python 153 9 Updated Mar 31, 2026

Large Language Model Text Generation Inference

Python 10,819 1,261 Updated Mar 21, 2026

Learning notes and hands-on experiments for understanding modern Machine Learning System.

Python 3 Updated Apr 4, 2026

Local, Free CharacterAI with inference on Apple Silicon and ESP32 WebSocket transport

TypeScript 69 5 Updated Apr 2, 2026

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 319 36 Updated Jun 10, 2025

Systems for GenAI

162 12 Updated Mar 27, 2026

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Python 414 313 Updated Apr 3, 2026

SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models

Python 164 65 Updated Apr 5, 2026

Claude Code skills that turn any codebase into an interactive knowledge graph you can explore, search, and ask questions about (Multi-platform e.g., Codex are supported).

TypeScript 7,810 631 Updated Apr 5, 2026

a LLM inference engine to run on consumer hardware

Python 9 1 Updated Mar 30, 2026

a fun and educational take on vLLM

Python 189 10 Updated Jan 25, 2026

GLM-OCR: Accurate × Fast × Comprehensive

Python 5,431 482 Updated Mar 27, 2026

Complete solutions to the Programming Massively Parallel Processors Edition 4

Jupyter Notebook 705 96 Updated Jun 18, 2025

Dashboard for InferenceX™, Open Source Continuous Inference

TypeScript 22 1 Updated Apr 4, 2026

A clean, single-file PyTorch implementation of Attention Residuals (Kimi Team, MoonshotAI, 2026), integrated with Grouped Query Attention (GQA), SwiGLU feed-forward networks, and Rotary Position Em…

Python 19 1 Updated Mar 16, 2026

A fast, helpful, and open-source document parser

TypeScript 3,740 235 Updated Apr 4, 2026

Give your agents the power of the Hugging Face ecosystem

Python 10,048 610 Updated Apr 3, 2026

A high-performance inference system for large language models, designed for production environments.

C++ 496 40 Updated Dec 19, 2025

MimikaStudio - A local-first application for macOS (Apple Silicon) + Agentic MCP Support

Dart 506 73 Updated Apr 1, 2026

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Go 3,515 380 Updated Apr 5, 2026

Common recipes to run vLLM

Jupyter Notebook 552 193 Updated Apr 3, 2026
Next