Skip to content
View tjtanaa's full-sized avatar

Organizations

@EmbeddedLLM

Block or report tjtanaa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
282 results for source starred repositories
Clear filter

A framework for efficient model inference with omni-modality models

Python 1,555 200 Updated Dec 24, 2025

A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports both GPU and CPU modes, with special optimizations for macOS Apple Silicon and ent…

Python 176 29 Updated Dec 22, 2025

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 306 55 Updated Dec 23, 2025

Modular RDMA Interface

C++ 67 15 Updated Dec 24, 2025
Python 22 4 Updated Jul 11, 2025

A "standard library" of Triton kernels.

Python 17 2 Updated Oct 2, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 51,441 8,976 Updated Nov 17, 2025

Code for BLT research paper

Python 2,018 188 Updated Nov 3, 2025

QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.

C++ 36 7 Updated Aug 29, 2025

Submission for the SG Innovation Challenge

JavaScript 3 Updated Feb 25, 2025

LLM KV cache compression made easy

Python 729 83 Updated Dec 15, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 7,850 806 Updated Dec 12, 2025

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++ 541 50 Updated Sep 13, 2025

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,856 78 Updated Dec 6, 2025

Efficient LLM Inference over Long Sequences

Python 394 20 Updated Jun 25, 2025

Master programming by recreating your favorite technologies from scratch.

Markdown 451,703 42,369 Updated Oct 10, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 21,939 3,860 Updated Dec 24, 2025

Open-source observability for your GenAI or LLM application, based on OpenTelemetry

Python 6,709 853 Updated Dec 21, 2025

LLM Serving Performance Evaluation Harness

Python 82 11 Updated Feb 25, 2025

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 3,013 111 Updated Dec 24, 2025

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,873 371 Updated Dec 17, 2025

Efficient and easy multi-instance LLM serving

Python 518 44 Updated Sep 3, 2025

Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“

Python 64 5 Updated Jun 5, 2024

Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine

Python 491 49 Updated Jul 23, 2025

A collection of best resources to learn System Design, Software architecture, and prepare for System Design Interviews

3,820 527 Updated Dec 10, 2025

[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Python 232 20 Updated Oct 14, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 925 45 Updated Oct 29, 2025

Learn System Design concepts and prepare for interviews using free resources.

Java 28,555 6,570 Updated Oct 15, 2025

Redis 6.0.20 6.2.18 7.0.15 7.2.8 7.4.3 8.0.0 for Windows

Batchfile 3,262 280 Updated Nov 21, 2025
Next