Skip to content
View tjtanaa's full-sized avatar

Organizations

@EmbeddedLLM

Block or report tjtanaa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The best-benchmarked open-source AI memory system. And it's free.

Python 49,976 6,557 Updated Apr 27, 2026

Stateful API logic for agentic applications using vLLM

Python 24 9 Updated Apr 16, 2026

FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.

Python 171 48 Updated Apr 27, 2026

From Minimal GEMM to Everything

Cuda 201 10 Updated Feb 10, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)

Python 418 44 Updated Apr 9, 2026

amdgpu example code in hip/asm

C++ 60 29 Updated Apr 22, 2026

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Cuda 1,105 178 Updated Apr 27, 2026

A framework for efficient model inference with omni-modality models

Python 4,519 844 Updated Apr 27, 2026

A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports both GPU and CPU modes, with special optimizations for macOS Apple Silicon and ent…

JavaScript 443 61 Updated Apr 7, 2026

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 348 81 Updated Apr 27, 2026

Modular RDMA Interface

C++ 117 36 Updated Apr 27, 2026

A "standard library" of Triton kernels.

Python 22 3 Updated Oct 2, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 55,937 9,746 Updated Feb 11, 2026

Code for BLT research paper

Python 2,035 193 Updated Nov 3, 2025

QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.

C++ 38 8 Updated Aug 29, 2025

Submission for the SG Innovation Challenge

JavaScript 3 Updated Feb 25, 2025

LLM KV cache compression made easy

Python 1,053 135 Updated Apr 23, 2026

Accessible large language models via k-bit quantization for PyTorch.

Python 8,164 844 Updated Apr 20, 2026

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++ 574 60 Updated Sep 13, 2025

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,978 83 Updated Apr 15, 2026

Efficient LLM Inference over Long Sequences

Python 394 22 Updated Jun 25, 2025

Master programming by recreating your favorite technologies from scratch.

Markdown 497,353 47,131 Updated Feb 21, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 26,553 5,590 Updated Apr 27, 2026

Open-source observability for your GenAI or LLM application, based on OpenTelemetry

Python 7,044 943 Updated Apr 27, 2026

LLM Serving Performance Evaluation Harness

Python 85 13 Updated Feb 25, 2025

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 3,448 136 Updated Apr 27, 2026

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,906 368 Updated Dec 17, 2025

Efficient and easy multi-instance LLM serving

Python 547 49 Updated Mar 12, 2026
Next