Skip to content
View 1a1a11a's full-sized avatar

Highlights

  • Pro

Organizations

@cacheMon

Block or report 1a1a11a

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Develop software autonomously.

Python 2,218 213 Updated Jan 30, 2026

Docker configuration for running VLLM on dual DGX Sparks

Shell 1,008 178 Updated Apr 13, 2026

Ultra-lightweight sandbox platform for AI agents. Powered by BoxLite.

Rust 17 2 Updated Feb 13, 2026

Examples, end-2-end tutorials and apps built using Liquid AI Foundational Models (LFM) and the LEAP SDK

Jupyter Notebook 1,744 272 Updated Apr 8, 2026

🕳 bore is a simple CLI tool for making tunnels to localhost

Rust 11,045 487 Updated Feb 4, 2026

OpenAI API-compatible wrapper for Claude Code

Python 490 90 Updated Jan 6, 2026

DedupBench is a benchmarking tool for content-defined chunking techniques used in data deduplication. It currently supports eleven unique CDC techniques and five different vector instruction sets.

C++ 23 1 Updated Feb 20, 2026

slime is an LLM post-training framework for RL Scaling.

Python 5,276 714 Updated Apr 13, 2026

DAOS Storage Stack (client libraries, storage engine, control plane)

C 929 341 Updated Apr 13, 2026

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

Rust 3,980 340 Updated Mar 26, 2026

This is the user space repo for famfs, the fabric-attached memory file system

C 94 5 Updated Apr 8, 2026

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,368 183 Updated Mar 12, 2026

Python bindings for libCacheSim, designed for rapid experimentation with cache simulation models.

Python 7 3 Updated Feb 17, 2026

A framework for generating realistic LLM serving workloads

Python 113 11 Updated Oct 9, 2025

A single interface to use and evaluate different agent frameworks

Python 1,147 89 Updated Apr 13, 2026

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…

Python 43,172 7,215 Updated Apr 14, 2026

Zero instrucment LLM and AI agent (e.g. claude code, gemini-cli) observability in eBPF

C 289 43 Updated Apr 12, 2026

A comprehensive open-source cache trace dataset

Jupyter Notebook 24 6 Updated Aug 23, 2025

Lossless codec for numerical data

Rust 473 28 Updated Mar 22, 2026

a high performance library for building cache simulators

C++ 296 103 Updated Apr 13, 2026

Nano vLLM

Python 12,862 1,919 Updated Apr 13, 2026

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Python 423 63 Updated Jan 5, 2026

Huawei Cloud datasets

Jupyter Notebook 87 13 Updated Jan 8, 2026

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 662 79 Updated Apr 8, 2026

A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.

Kotlin 20,901 1,976 Updated Apr 8, 2026

Simple high-throughput inference library

Python 155 10 Updated May 14, 2025

PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.

Python 154 67 Updated Apr 13, 2026

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 321 37 Updated Jun 10, 2025
Next