-
Harvard University
- Cambridge
- http://jasony.me
- @1a1a11a
Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Starred repositories
DFlash: Block Diffusion for Flash Speculative Decoding
Minimal coding agent written in Rust, optimized for memory footprint and performance
A collection of Rust implementation of state-of-the-art cache algorithms
Docker configuration for running VLLM on dual DGX Sparks
Examples, end-2-end tutorials and apps built using Liquid AI Foundational Models (LFM) and the LEAP SDK
🕳 bore is a simple CLI tool for making tunnels to localhost
OpenAI API-compatible wrapper for Claude Code
DedupBench is a benchmarking tool for content-defined chunking techniques used in data deduplication. It currently supports eleven unique CDC techniques and five different vector instruction sets.
slime is an LLM post-training framework for RL Scaling.
DAOS Storage Stack (client libraries, storage engine, control plane)
⚡ Pure-Rust WebGPU inference engine — OpenAI-API compatible, GGUF native, runs on any GPU. No Python. No llama.cpp. Single binary.
cxl-micron-reskit / famfs
Forked from jagalactic/famfsThis is the user space repo for famfs, the fabric-attached memory file system
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
Python bindings for libCacheSim, designed for rapid experimentation with cache simulation models.
A framework for generating realistic LLM serving workloads
A single interface to use and evaluate different agent frameworks
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…
Zero instrucment system-level AI agent tracing in eBPF
A comprehensive open-source cache trace dataset
a high performance library for building cache simulators
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
A tool for bandwidth measurements on NVIDIA GPUs.
A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.
Simple high-throughput inference library
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.