Skip to content
View junuxyz's full-sized avatar

Block or report junuxyz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 169 15 Updated Mar 25, 2026

A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.

Python 47 3 Updated Mar 25, 2026

Large Language Model Text Generation Inference

Python 10,809 1,261 Updated Mar 21, 2026

Learning notes and hands-on experiments for understanding modern Machine Learning System.

Python 1 Updated Mar 25, 2026

Free, Local CharacterAI with inference on Apple Silicon and ESP32 WebSocket transport

TypeScript 36 4 Updated Mar 25, 2026

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 321 36 Updated Jun 10, 2025

Systems for GenAI

163 11 Updated Mar 25, 2026

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Python 410 300 Updated Mar 18, 2026

Qwen3-TTS with nano vLLM-style optimizations for fast text-to-speech generation. Achieved 3x faster

Python 98 21 Updated Mar 3, 2026

SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models

Python 128 41 Updated Mar 26, 2026

Claude Code skills that turn any codebase into an interactive knowledge graph you can explore, search, and ask questions about (Multi-platform e.g., Codex are supported).

TypeScript 6,308 487 Updated Mar 26, 2026

a LLM inference engine to run on consumer hardware

Python 8 Updated Mar 25, 2026

a fun and educational take on vLLM

Python 184 8 Updated Jan 25, 2026

GLM-OCR: Accurate × Fast × Comprehensive

Python 3,736 296 Updated Mar 24, 2026

Complete solutions to the Programming Massively Parallel Processors Edition 4

Jupyter Notebook 697 95 Updated Jun 18, 2025

Dashboard for InferenceX™, Open Source Continuous Inference

TypeScript 20 1 Updated Mar 26, 2026

A clean, single-file PyTorch implementation of Attention Residuals (Kimi Team, MoonshotAI, 2026), integrated with Grouped Query Attention (GQA), SwiGLU feed-forward networks, and Rotary Position Em…

Python 14 1 Updated Mar 16, 2026

A fast, helpful, and open-source document parser

TypeScript 2,213 136 Updated Mar 25, 2026

Give your agents the power of the Hugging Face ecosystem

Python 9,910 604 Updated Mar 25, 2026

A high-performance inference system for large language models, designed for production environments.

C++ 495 40 Updated Dec 19, 2025

MimikaStudio - A local-first application for macOS (Apple Silicon) + Agentic MCP Support

Dart 490 72 Updated Mar 25, 2026

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Go 3,219 347 Updated Mar 26, 2026

Common recipes to run vLLM

Jupyter Notebook 521 177 Updated Mar 16, 2026

Community maintained hardware plugin for vLLM on Intel Gaudi

Python 34 121 Updated Mar 26, 2026

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python 301 55 Updated Mar 25, 2026

vLLM plugin for RBLN NPU

Python 44 8 Updated Mar 26, 2026

Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞

Python 8,812 939 Updated Mar 24, 2026

Our Clone of Orca used for experimentation

Python 13 4 Updated Oct 15, 2024

A lightweight, efficient transformer inference engine written in Rust. MiniLLM provides a clean, well-documented implementation of GPT-2 style transformer models with support for text generation.

Rust 4 Updated Oct 1, 2025
Next