A portable skill file that gives an AI agent an autonomous, profile-driven experiment loop for kernel optimization — CUDA / Triton / ROCm / CPU / SIMD kernels, or any accelerator code with a measurable performance metric.
It is the kernel-focused sibling of auto-research: same no-install, pure-protocol approach and the same measurement-honesty discipline, with one defining addition — optimization is driven by profiler evidence, not guesswork.
Set a measurable performance goal once, point the agent at a profiler, and it runs unattended through this cycle until the budget is spent or no honest gain remains:
profile → profile-grounded hypothesis → edit → measure → keep wins / revert losses → repeat
The agent profiles the kernel first to find where the time actually goes, forms hypotheses that each cite a real profiler observation, and re-profiles after every structural win because the bottleneck moves (memory-bound becomes compute-bound once you fix the first limiter).
The most expensive mistake in kernel work is optimizing the wrong thing — hand-tuning a loop that the profiler would have shown is memory-bound. This skill makes a bottleneck claim earned: "this is memory-bound" / "occupancy is the limiter" / "this is the hotspot" only counts if it came from a profiler run captured to disk this session. Profiling honesty is enforced as a protocol rule, exactly like metric honesty.
- Earned numbers — a metric counts only if read from a real command's
output, parsed from
METRIC name=valuelines. No estimating, no fabricating. - Earned bottlenecks — every hypothesis must cite a line from the current profile summary. No profiler basis ⇒ the next action is to profile, not edit.
- Correctness gate — a fast wrong kernel is worthless, so a reference comparison must pass before any speed result counts.
- Git as the record — every experiment is a commit; losing runs are reverted
with
git reset --hard HEAD~1. Branch history is the result log. - Noise discipline — GPU jitter is large, so gains are kept only when the delta clears a median-absolute-deviation (MAD) threshold over repeated runs.
- Long-run resilience — on-disk state, a heartbeat, and resume-from-disk survive session restarts and context compaction.
- Anti-stall — when progress plateaus, re-profile and pivot to a different limiter rather than twiddling the same knobs.
Drop the skill where your agent looks for skills. For Claude Code:
mkdir -p ~/.claude/skills
cp -r skills/autoresearch-kernel ~/.claude/skills/Then ask the agent to run an autoresearch-kernel loop on a kernel with a measurable metric and a profiler available.
skills/autoresearch-kernel/SKILL.md the full protocol
LICENSE MIT
README.md
This is a protocol, not a sandbox. The METRIC-only, profiler-grounded, and
git-only rules make cheating visible and effortful — they don't make it
impossible. A profile is a snapshot of one shape on one device, so a win on the
profiled shape can lose on another; confirm the metric on every in-scope shape
before claiming victory. The single most valuable thing you can build to make
the discipline real is a pair of wrappers — .auto/measure.sh that emits real
METRIC lines and .auto/profile.sh that writes a real profiler report.
This skill stands on prior work in autonomous, no-human-in-the-loop research loops:
- auto-research — the pure-skill loop this edition is built on.
- autoresearch-cli — a CLI contract layer for Claude Code that enforces honest measurement, commits, reverts, and bookkeeping for autonomous optimization research.
- Deli_AutoResearch — a protocol framework for long-horizon agent tasks, with the conventions for state persistence, stall detection, and layered watchdogs that guard against cognitive loops, stalling, and runtime fragility.
MIT — see LICENSE.