Skip to content
View tjtanaa's full-sized avatar

Organizations

@EmbeddedLLM

Block or report tjtanaa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Stateful API logic for agentic applications using vLLM

Makefile 20 5 Updated Apr 1, 2026

FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.

Python 139 31 Updated Apr 1, 2026

From Minimal GEMM to Everything

Cuda 192 10 Updated Feb 10, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)

Python 372 37 Updated Mar 26, 2026

amdgpu example code in hip/asm

C++ 58 29 Updated Mar 18, 2026

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Cuda 1,078 162 Updated Apr 1, 2026

A framework for efficient model inference with omni-modality models

Python 4,076 672 Updated Apr 1, 2026

A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports both GPU and CPU modes, with special optimizations for macOS Apple Silicon and ent…

JavaScript 413 57 Updated Mar 17, 2026

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 335 78 Updated Apr 1, 2026

Modular RDMA Interface

C++ 105 30 Updated Apr 1, 2026
Python 23 5 Updated Jul 11, 2025

A "standard library" of Triton kernels.

Python 22 4 Updated Oct 2, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 55,568 9,698 Updated Feb 11, 2026

Code for BLT research paper

Python 2,031 190 Updated Nov 3, 2025

QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.

C++ 38 8 Updated Aug 29, 2025

Submission for the SG Innovation Challenge

JavaScript 3 Updated Feb 25, 2025

LLM KV cache compression made easy

Python 1,004 126 Updated Apr 1, 2026

Accessible large language models via k-bit quantization for PyTorch.

Python 8,092 843 Updated Mar 31, 2026

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++ 568 59 Updated Sep 13, 2025

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,954 81 Updated Mar 30, 2026

Efficient LLM Inference over Long Sequences

Python 392 22 Updated Jun 25, 2025

Master programming by recreating your favorite technologies from scratch.

Markdown 485,370 45,654 Updated Feb 21, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,312 5,108 Updated Apr 1, 2026

Open-source observability for your GenAI or LLM application, based on OpenTelemetry

Python 6,968 912 Updated Apr 1, 2026

LLM Serving Performance Evaluation Harness

Python 84 12 Updated Feb 25, 2025

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 3,305 128 Updated Apr 1, 2026

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,903 368 Updated Dec 17, 2025

Efficient and easy multi-instance LLM serving

Python 540 47 Updated Mar 12, 2026

Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“

Python 64 6 Updated Jun 5, 2024
Next