Skip to content
View 1a1a11a's full-sized avatar

Highlights

  • Pro

Organizations

@cacheMon

Block or report 1a1a11a

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Develop software autonomously.

Python 2,217 213 Updated Jan 30, 2026

Docker configuration for running VLLM on dual DGX Sparks

Shell 879 164 Updated Apr 6, 2026

Ultra-lightweight sandbox platform for AI agents. Powered by BoxLite.

Rust 15 2 Updated Feb 13, 2026

Examples, end-2-end tutorials and apps built using Liquid AI Foundational Models (LFM) and the LEAP SDK

Jupyter Notebook 1,669 259 Updated Apr 2, 2026

🕳 bore is a simple CLI tool for making tunnels to localhost

Rust 11,009 484 Updated Feb 4, 2026

OpenAI API-compatible wrapper for Claude Code

Python 478 88 Updated Jan 6, 2026

DedupBench is a benchmarking tool for content-defined chunking techniques used in data deduplication. It currently supports eleven unique CDC techniques and five different vector instruction sets.

C++ 23 1 Updated Feb 20, 2026

slime is an LLM post-training framework for RL Scaling.

Python 5,143 695 Updated Apr 5, 2026

DAOS Storage Stack (client libraries, storage engine, control plane)

C 926 340 Updated Apr 6, 2026

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

Rust 3,938 330 Updated Mar 26, 2026

This is the user space repo for famfs, the fabric-attached memory file system

C 94 5 Updated Mar 30, 2026

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,361 183 Updated Mar 12, 2026

Python bindings for libCacheSim, designed for rapid experimentation with cache simulation models.

Python 7 3 Updated Feb 17, 2026

A framework for generating realistic LLM serving workloads

Python 110 10 Updated Oct 9, 2025

A single interface to use and evaluate different agent frameworks

Python 1,138 89 Updated Apr 6, 2026

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…

Python 42,309 7,020 Updated Apr 5, 2026

Zero instrucment LLM and AI agent (e.g. claude code, gemini-cli) observability in eBPF

C 272 40 Updated Mar 31, 2026

A comprehensive open-source cache trace dataset

Jupyter Notebook 24 5 Updated Aug 23, 2025

Lossless codec for numerical data

Rust 472 29 Updated Mar 22, 2026

a high performance library for building cache simulators

C++ 296 96 Updated Apr 4, 2026

Nano vLLM

Python 12,710 1,882 Updated Nov 3, 2025

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Python 419 63 Updated Jan 5, 2026

Huawei Cloud datasets

Jupyter Notebook 86 13 Updated Jan 8, 2026

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 656 75 Updated Apr 15, 2025

A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.

Kotlin 17,553 1,618 Updated Apr 4, 2026

Simple high-throughput inference library

Python 155 10 Updated May 14, 2025

PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.

Python 153 67 Updated Apr 1, 2026

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 319 36 Updated Jun 10, 2025
Next