-
Harvard University
- Cambridge
- http://jasony.me
- @1a1a11a
Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Starred repositories
🕳 bore is a simple CLI tool for making tunnels to localhost
OpenAI API-compatible wrapper for Claude Code
DedupBench is a benchmarking tool for content-defined chunking techniques used in data deduplication. It currently supports eleven unique CDC techniques and five different vector instruction sets.
slime is an LLM post-training framework for RL Scaling.
DAOS Storage Stack (client libraries, storage engine, control plane)
⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.
cxl-micron-reskit / famfs
Forked from jagalactic/famfsThis is the user space repo for famfs, the fabric-attached memory file system
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
Python bindings for libCacheSim, designed for rapid experimentation with cache simulation models.
A framework for generating realistic LLM serving workloads
A single interface to use and evaluate different agent frameworks
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…
Zero instrucment LLM and AI agent (e.g. claude code, gemini-cli) observability in eBPF
A comprehensive open-source cache trace dataset
a high performance library for building cache simulators
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
A tool for bandwidth measurements on NVIDIA GPUs.
A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.
Simple high-throughput inference library
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
Composable building blocks to build LLM Apps
New file format for storage of large columnar datasets.
A C implementation of the SIEVE cache eviction algorithm, based on the research paper (https://junchengyang.com/publication/nsdi24-SIEVE.pdf)