Skip to content
View tjtanaa's full-sized avatar

Organizations

@EmbeddedLLM

Block or report tjtanaa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A framework for efficient model inference with omni-modality models

Python 1,037 142 Updated Dec 20, 2025

A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports both GPU and CPU modes, with special optimizations for macOS Apple Silicon and ent…

Python 170 28 Updated Dec 19, 2025

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 302 55 Updated Dec 20, 2025

Modular RDMA Interface

C++ 67 15 Updated Dec 19, 2025
Python 22 4 Updated Jul 11, 2025

A "standard library" of Triton kernels.

Python 17 2 Updated Oct 2, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 51,385 8,963 Updated Nov 17, 2025

Code for BLT research paper

Python 2,019 188 Updated Nov 3, 2025

QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.

C++ 36 7 Updated Aug 29, 2025

Submission for the SG Innovation Challenge

JavaScript 3 Updated Feb 25, 2025

LLM KV cache compression made easy

Python 726 83 Updated Dec 15, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 7,839 801 Updated Dec 12, 2025

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++ 539 49 Updated Sep 13, 2025

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,850 78 Updated Dec 6, 2025

Efficient LLM Inference over Long Sequences

Python 393 20 Updated Jun 25, 2025

Master programming by recreating your favorite technologies from scratch.

Markdown 450,486 42,254 Updated Oct 10, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 21,828 3,814 Updated Dec 21, 2025

Open-source observability for your GenAI or LLM application, based on OpenTelemetry

Python 6,703 850 Updated Dec 16, 2025

LLM Serving Performance Evaluation Harness

Python 82 11 Updated Feb 25, 2025

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 3,005 110 Updated Dec 19, 2025

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,869 371 Updated Dec 17, 2025

Efficient and easy multi-instance LLM serving

Python 517 44 Updated Sep 3, 2025

Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“

Python 64 5 Updated Jun 5, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 85 134 Updated Dec 20, 2025

Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine

Python 491 49 Updated Jul 23, 2025

A collection of best resources to learn System Design, Software architecture, and prepare for System Design Interviews

3,806 527 Updated Dec 10, 2025

[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Python 233 20 Updated Oct 14, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 924 45 Updated Oct 29, 2025

Learn System Design concepts and prepare for interviews using free resources.

Java 28,446 6,549 Updated Oct 15, 2025
Next