Skip to content
View 1a1a11a's full-sized avatar

Highlights

  • Pro

Organizations

@cacheMon

Block or report 1a1a11a

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

DFlash: Block Diffusion for Flash Speculative Decoding

Python 5,093 368 Updated May 10, 2026

Minimal coding agent written in Rust, optimized for memory footprint and performance

Rust 1,271 90 Updated Jun 13, 2026

A collection of Rust implementation of state-of-the-art cache algorithms

Rust 2 1 Updated Jun 13, 2026

Develop software autonomously.

Python 2,224 209 Updated Jan 30, 2026

Docker configuration for running VLLM on dual DGX Sparks

Shell 1,601 290 Updated Jun 12, 2026

Examples, end-2-end tutorials and apps built using Liquid AI Foundational Models (LFM) and the LEAP SDK

Jupyter Notebook 2,075 340 Updated Jun 12, 2026

🕳 bore is a simple CLI tool for making tunnels to localhost

Rust 11,229 498 Updated Feb 4, 2026

OpenAI API-compatible wrapper for Claude Code

Python 560 113 Updated May 4, 2026

DedupBench is a benchmarking tool for content-defined chunking techniques used in data deduplication. It currently supports eleven unique CDC techniques and five different vector instruction sets.

C++ 24 1 Updated Feb 20, 2026

slime is an LLM post-training framework for RL Scaling.

Python 6,109 893 Updated Jun 13, 2026

DAOS Storage Stack (client libraries, storage engine, control plane)

C 945 349 Updated Jun 12, 2026

⚡ Pure-Rust WebGPU inference engine — OpenAI-API compatible, GGUF native, runs on any GPU. No Python. No llama.cpp. Single binary.

Rust 5,417 515 Updated Jun 11, 2026

This is the user space repo for famfs, the fabric-attached memory file system

C 96 6 Updated May 25, 2026

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C 1,383 189 Updated Jun 13, 2026

Python bindings for libCacheSim, designed for rapid experimentation with cache simulation models.

Python 7 3 Updated May 18, 2026

A framework for generating realistic LLM serving workloads

Python 152 14 Updated May 11, 2026

A single interface to use and evaluate different agent frameworks

Python 1,174 94 Updated Jun 8, 2026

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…

Python 50,263 8,856 Updated Jun 13, 2026

Zero instrucment system-level AI agent tracing in eBPF

C 446 63 Updated Jun 13, 2026

A comprehensive open-source cache trace dataset

Jupyter Notebook 25 6 Updated Aug 23, 2025

Lossless codec for numerical data

Rust 486 29 Updated Jun 12, 2026

a high performance library for building cache simulators

C++ 332 109 Updated May 4, 2026

Nano vLLM

Python 14,012 2,209 Updated Apr 26, 2026

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Python 445 66 Updated Jan 5, 2026

Huawei Cloud datasets

Jupyter Notebook 91 13 Updated Jan 8, 2026

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 712 83 Updated Apr 8, 2026

A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.

Kotlin 23,732 2,479 Updated Jun 12, 2026

Simple high-throughput inference library

Python 158 10 Updated Jun 10, 2026

PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.

Python 155 67 Updated Jun 9, 2026
Next