nilsmatteson.com *** est. 2026 *** 0.88s session fork vs ~340s cold boot *** 14.3 GB/s weight restore *** new: sole-author preprint, "re-feeding is not replaying" (jun 2026) *** open to summer 2027 internships *** zero js on this entire site. the marquee is HTML baby

README.txt - Notepad

Nils Matteson

I build systems for LLM inference: GPU and CUDA, distributed systems, applied ML. I like problems where the deliverable is a number someone else can re-run.

The thing I keep pulling on: a transformer mid-generation is a multi-gigabyte data structure that the entire ecosystem treats as disposable. Treating it as a first-class artifact instead got me thaw (below), an open PR against vLLM core, and a sole-author research preprint measuring what the standard replay shortcut actually does to token-level credit estimates (also below).

B.S. Data Science, CS minor, UW-Madison, May 2026. M.S. CS, Northeastern Silicon Valley, San Jose, starting September 2026. The longer plan is research: measurement problems in ML systems, then a PhD.

Open to SWE/MLE internship Summer 2027, full-time 2028. GPU inference, distributed systems, ML infrastructure.

mail nils@thaw.sh  ·  code github.com/matteso1  ·  resume.pdf

thaw v0.6.0

git for live LLM agent sessions. checkpoints, branches, diffs, and restores live vLLM/SGLang inference state (weights, KV cache, prefix-hash table, scheduler). a session forks in 0.88s median on an H100 instead of a ~340s cold boot, about 400x amortized. 16 releases on PyPI as thaw-vllm, currently 0.6.0, Apache-2.0. opened PR #44074 in vLLM (pluggable sleep-mode backend), following participation in RFC #34303. sole-author preprint: "Re-feeding Is Not Replaying" (June 2026).

0.88s fork vs ~340s cold boot (H100)
14.3 GB/s restore weight restore, disk to GPU
0.29s / 55 GB/s hot-swap 8B model reload after one-time pin
388 tests in CI 155 Rust + 233 Python, no GPU required
Re-feeding Is Not Replaying.pdf - Acrobat Reader

Re-feeding Is Not Replaying: Measuring Replay Noise in Counterfactual Token-Credit Estimation
sole author · June 2026 · 10 pages · total compute under $10

Every published method that asks "which token caused the model's answer" rebuilds the model's state by re-feeding the transcript as a fresh prompt, and assumes that is the same state. I measured the assumption on stock vLLM with a three-pass design: continuations resumed from the exact decode-time KV state, an identical second exact pass as a replica noise floor, and the re-feed. At low-margin decision tokens, re-feeding changes the credit estimate at rates 14 to 28 points above the floor. The perturbation is consistent with mean-zero, so averaged quantities mostly survive; threshold-based critical-token selection does not. Rerunning under vLLM's batch-invariant kernels makes every pass bit-identical, which closes the causal attribution and validates the instrument in one move.

Headed to arXiv (cs.LG). Every per-pivot record, run log, and the analysis script that emits each number are public in the repo.

Exploring - C:\nils
Name Type Notes
work\thaw File Folder 0.88s fork vs ~340s cold boot
work\matteson-systems File Folder 10,500 businesses scored at ~$0.03
work\sentinel File Folder Go log engine, LSM + Raft
work\madison-metro-ml File Folder conformal ETAs, calibrated 90% coverage
writing\project-gorgon.md Markdown speculative decoding at 0.66x of baseline, negative result
writing\madison-bus-eta.md Markdown the transit-ETA writeup
writing\wattbot-rag.md Markdown RAG benchmarks, 282 questions
refeed-drift.pdf Adobe Acrobat Document the preprint, 10 pages, reproduces for under $10
agents.txt Text Document facts for LLMs
9 object(s)
System Properties

System: B.S. Data Science, UW-Madison (May 2026)

System: M.S. CS, Northeastern Silicon Valley (Sep 2026 - May 2028)

Registered to: Nils Matteson, Madison WI -> San Jose CA

Status: seeking Summer 2027 SWE/MLE internship

Scheduled upgrade: Ph.D., ML systems (pending)

New Message - Outlook Express
To:
Subject: shoot you an electronic message on the net

"Every program attempts to expand until it can read mail. Those which cannot are replaced by ones which can."

--
Zawinski's Law, 1995

(you are reading this inside exactly such a program)

you are visitor
016,384
(counter is static. no js, no tracking.)
RUST INSIDE CUDA 12+ vLLM APACHE-2.0 Y2K COMPLIANT BEST VIEWED IN NETSCAPE 4.0