junuxyz

junuxyz

inference.

4 followers · 15 following

https://www.junupark.xyz

Achievements

Lists (7)

Sort

Stars

ThunderAgent-org / ThunderAgent

A simple, fast and robust program-aware agentic inference system.

Python 257 20 Updated Mar 16, 2026

m0at / rvllm

rvLLM: High-performance LLM inference in Rust. Drop-in vLLM replacement.

Rust 370 37 Updated Apr 5, 2026

ssrajadh / sentrysearch

Semantic search over videos using Gemini Embedding 2 or Qwen3-VL.

Python 2,816 269 Updated Apr 3, 2026

yctimlin / mcp_excalidraw

MCP server and Claude Code skill for Excalidraw — programmatic canvas toolkit to create, edit, and export diagrams via AI agents with real-time canvas sync.

JavaScript 1,625 158 Updated Mar 6, 2026

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 489 63 Updated Jan 8, 2026

RBLN-SW / rbln-model-zoo

RBLN Model Zoo — Compile once. Deploy anywhere.

Python 33 7 Updated Mar 27, 2026

RBLN-SW / optimum-rbln

⚡ A seamless integration of HuggingFace Transformers & Diffusers with RBLN SDK for efficient inference on RBLN NPUs.

Python 17 3 Updated Apr 4, 2026

SandAI-org / MagiCompiler

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 272 20 Updated Apr 3, 2026

facebookresearch / tensor-layouts

A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.

Python 153 9 Updated Mar 31, 2026

huggingface / text-generation-inference

Large Language Model Text Generation Inference

Python 10,819 1,261 Updated Mar 21, 2026

junuxyz / mlsys-notes

Learning notes and hands-on experiments for understanding modern Machine Learning System.

Python 3 Updated Apr 4, 2026

akdeb / open-toys

Local, Free CharacterAI with inference on Apple Silicon and ESP32 WebSocket transport

TypeScript 69 5 Updated Apr 2, 2026

interestingLSY / swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 319 36 Updated Jun 10, 2025

fanlai0990 / CS598

Systems for GenAI

162 12 Updated Mar 27, 2026

runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Python 414 313 Updated Apr 3, 2026

sgl-project / sglang-omni

SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models

Python 164 65 Updated Apr 5, 2026

Lum1104 / Understand-Anything

Claude Code skills that turn any codebase into an interactive knowledge graph you can explore, search, and ask questions about (Multi-platform e.g., Codex are supported).

TypeScript 7,810 631 Updated Apr 5, 2026

JINO-ROHIT / tachyon

a LLM inference engine to run on consumer hardware

Python 9 1 Updated Mar 30, 2026

ovshake / nano-vllm

a fun and educational take on vLLM

Python 189 10 Updated Jan 25, 2026

zai-org / GLM-OCR

GLM-OCR: Accurate × Fast × Comprehensive

Python 5,431 482 Updated Mar 27, 2026

tugot17 / pmpp

Complete solutions to the Programming Massively Parallel Processors Edition 4

Jupyter Notebook 705 96 Updated Jun 18, 2025

SemiAnalysisAI / InferenceX-app

Dashboard for InferenceX™, Open Source Continuous Inference

TypeScript 22 1 Updated Apr 4, 2026

kyegomez / attn_res

A clean, single-file PyTorch implementation of Attention Residuals (Kimi Team, MoonshotAI, 2026), integrated with Grouped Query Attention (GQA), SwiGLU feed-forward networks, and Rotary Position Em…

Python 19 1 Updated Mar 16, 2026

MoonshotAI / Attention-Residuals

2,961 149 Updated Mar 17, 2026

run-llama / liteparse

A fast, helpful, and open-source document parser

TypeScript 3,740 235 Updated Apr 4, 2026

huggingface / skills

Give your agents the power of the Hugging Face ecosystem

Python 10,048 610 Updated Apr 3, 2026

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

C++ 496 40 Updated Dec 19, 2025

BoltzmannEntropy / MimikaStudio

MimikaStudio - A local-first application for macOS (Apple Silicon) + Agentic MCP Support

Dart 506 73 Updated Apr 1, 2026

maximhq / bifrost

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Go 3,515 380 Updated Apr 5, 2026

vllm-project / recipes

Common recipes to run vLLM

Jupyter Notebook 552 193 Updated Apr 3, 2026

junuxyz

Lists (7)

Agent

Diffusion

etc

MLOps

MLSys

NPU

reinforcement learning

Stars