Skip to content
View tjtanaa's full-sized avatar

Organizations

@EmbeddedLLM

Block or report tjtanaa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The best-benchmarked open-source AI memory system. And it's free.

Python 47,297 6,178 Updated Apr 17, 2026

Stateful API logic for agentic applications using vLLM

Python 23 8 Updated Apr 16, 2026

FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.

Python 154 42 Updated Apr 17, 2026

From Minimal GEMM to Everything

Cuda 199 10 Updated Feb 10, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)

Python 395 39 Updated Apr 9, 2026

amdgpu example code in hip/asm

C++ 58 29 Updated Apr 16, 2026

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Cuda 1,089 173 Updated Apr 17, 2026

A framework for efficient model inference with omni-modality models

Python 4,357 778 Updated Apr 17, 2026

A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports both GPU and CPU modes, with special optimizations for macOS Apple Silicon and ent…

JavaScript 430 60 Updated Apr 7, 2026

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 343 78 Updated Apr 16, 2026

Modular RDMA Interface

C++ 114 33 Updated Apr 17, 2026

A "standard library" of Triton kernels.

Python 22 4 Updated Oct 2, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 55,792 9,738 Updated Feb 11, 2026

Code for BLT research paper

Python 2,032 191 Updated Nov 3, 2025

QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.

C++ 38 8 Updated Aug 29, 2025

Submission for the SG Innovation Challenge

JavaScript 3 Updated Feb 25, 2025

LLM KV cache compression made easy

Python 1,042 133 Updated Apr 14, 2026

Accessible large language models via k-bit quantization for PyTorch.

Python 8,125 839 Updated Apr 17, 2026

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++ 570 59 Updated Sep 13, 2025

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,967 83 Updated Apr 15, 2026

Efficient LLM Inference over Long Sequences

Python 393 22 Updated Jun 25, 2025

Master programming by recreating your favorite technologies from scratch.

Markdown 490,984 46,369 Updated Feb 21, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,965 5,422 Updated Apr 17, 2026

Open-source observability for your GenAI or LLM application, based on OpenTelemetry

Python 7,015 933 Updated Apr 17, 2026

LLM Serving Performance Evaluation Harness

Python 85 13 Updated Feb 25, 2025

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 3,406 134 Updated Apr 17, 2026

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,907 368 Updated Dec 17, 2025

Efficient and easy multi-instance LLM serving

Python 546 48 Updated Mar 12, 2026
Next