junuxyz

junuxyz

inference.

2 followers · 14 following

https://www.junupark.xyz

Achievements

Lists (7)

Sort

Stars

SandAI-org / MagiCompiler

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 169 15 Updated Mar 25, 2026

facebookresearch / tensor-layouts

A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.

Python 47 3 Updated Mar 25, 2026

huggingface / text-generation-inference

Large Language Model Text Generation Inference

Python 10,809 1,261 Updated Mar 21, 2026

junuxyz / mlsys-notes

Learning notes and hands-on experiments for understanding modern Machine Learning System.

Python 1 Updated Mar 25, 2026

akdeb / Elato-Local

Free, Local CharacterAI with inference on Apple Silicon and ESP32 WebSocket transport

TypeScript 36 4 Updated Mar 25, 2026

interestingLSY / swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 321 36 Updated Jun 10, 2025

fanlai0990 / CS598

Systems for GenAI

163 11 Updated Mar 25, 2026

runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Python 410 300 Updated Mar 18, 2026

tsdocode / nano-qwen3tts-vllm

Qwen3-TTS with nano vLLM-style optimizations for fast text-to-speech generation. Achieved 3x faster

Python 98 21 Updated Mar 3, 2026

sgl-project / sglang-omni

SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models

Python 128 41 Updated Mar 26, 2026

Lum1104 / Understand-Anything

Claude Code skills that turn any codebase into an interactive knowledge graph you can explore, search, and ask questions about (Multi-platform e.g., Codex are supported).

TypeScript 6,308 487 Updated Mar 26, 2026

JINO-ROHIT / tachyon

a LLM inference engine to run on consumer hardware

Python 8 Updated Mar 25, 2026

ovshake / nano-vllm

a fun and educational take on vLLM

Python 184 8 Updated Jan 25, 2026

zai-org / GLM-OCR

GLM-OCR: Accurate × Fast × Comprehensive

Python 3,736 296 Updated Mar 24, 2026

tugot17 / pmpp

Complete solutions to the Programming Massively Parallel Processors Edition 4

Jupyter Notebook 697 95 Updated Jun 18, 2025

SemiAnalysisAI / InferenceX-app

Dashboard for InferenceX™, Open Source Continuous Inference

TypeScript 20 1 Updated Mar 26, 2026

kyegomez / attn_res

A clean, single-file PyTorch implementation of Attention Residuals (Kimi Team, MoonshotAI, 2026), integrated with Grouped Query Attention (GQA), SwiGLU feed-forward networks, and Rotary Position Em…

Python 14 1 Updated Mar 16, 2026

MoonshotAI / Attention-Residuals

2,738 127 Updated Mar 17, 2026

run-llama / liteparse

A fast, helpful, and open-source document parser

TypeScript 2,213 136 Updated Mar 25, 2026

huggingface / skills

Give your agents the power of the Hugging Face ecosystem

Python 9,910 604 Updated Mar 25, 2026

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

C++ 495 40 Updated Dec 19, 2025

BoltzmannEntropy / MimikaStudio

MimikaStudio - A local-first application for macOS (Apple Silicon) + Agentic MCP Support

Dart 490 72 Updated Mar 25, 2026

maximhq / bifrost

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Go 3,219 347 Updated Mar 26, 2026

vllm-project / recipes

Common recipes to run vLLM

Jupyter Notebook 521 177 Updated Mar 16, 2026

vllm-project / vllm-gaudi

Community maintained hardware plugin for vLLM on Intel Gaudi

Python 34 121 Updated Mar 26, 2026

vllm-project / speculators

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python 301 55 Updated Mar 25, 2026

RBLN-SW / vllm-rbln

vLLM plugin for RBLN NPU

Python 44 8 Updated Mar 26, 2026

aiming-lab / AutoResearchClaw

Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞

Python 8,812 939 Updated Mar 24, 2026

LLM-Systems-Research / orca

Our Clone of Orca used for experimentation

Python 13 4 Updated Oct 15, 2024

bmqube / minillm

A lightweight, efficient transformer inference engine written in Rust. MiniLLM provides a clean, well-documented implementation of GPT-2 style transformer models with support for text generation.

Rust 4 Updated Oct 1, 2025

junuxyz

Lists (7)

Agent

Diffusion

etc

MLOps

MLSys

NPU

reinforcement learning

Stars