Skip to content

Dominic789654/auto-kernel-research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

auto-kernel-research

A portable skill file that gives an AI agent an autonomous, profile-driven experiment loop for kernel optimization — CUDA / Triton / ROCm / CPU / SIMD kernels, or any accelerator code with a measurable performance metric.

It is the kernel-focused sibling of auto-research: same no-install, pure-protocol approach and the same measurement-honesty discipline, with one defining addition — optimization is driven by profiler evidence, not guesswork.

The idea

Set a measurable performance goal once, point the agent at a profiler, and it runs unattended through this cycle until the budget is spent or no honest gain remains:

profile → profile-grounded hypothesis → edit → measure → keep wins / revert losses → repeat

The agent profiles the kernel first to find where the time actually goes, forms hypotheses that each cite a real profiler observation, and re-profiles after every structural win because the bottleneck moves (memory-bound becomes compute-bound once you fix the first limiter).

Why profile-first

The most expensive mistake in kernel work is optimizing the wrong thing — hand-tuning a loop that the profiler would have shown is memory-bound. This skill makes a bottleneck claim earned: "this is memory-bound" / "occupancy is the limiter" / "this is the hotspot" only counts if it came from a profiler run captured to disk this session. Profiling honesty is enforced as a protocol rule, exactly like metric honesty.

What it enforces

  • Earned numbers — a metric counts only if read from a real command's output, parsed from METRIC name=value lines. No estimating, no fabricating.
  • Earned bottlenecks — every hypothesis must cite a line from the current profile summary. No profiler basis ⇒ the next action is to profile, not edit.
  • Correctness gate — a fast wrong kernel is worthless, so a reference comparison must pass before any speed result counts.
  • Git as the record — every experiment is a commit; losing runs are reverted with git reset --hard HEAD~1. Branch history is the result log.
  • Noise discipline — GPU jitter is large, so gains are kept only when the delta clears a median-absolute-deviation (MAD) threshold over repeated runs.
  • Long-run resilience — on-disk state, a heartbeat, and resume-from-disk survive session restarts and context compaction.
  • Anti-stall — when progress plateaus, re-profile and pivot to a different limiter rather than twiddling the same knobs.

Install

Drop the skill where your agent looks for skills. For Claude Code:

mkdir -p ~/.claude/skills
cp -r skills/autoresearch-kernel ~/.claude/skills/

Then ask the agent to run an autoresearch-kernel loop on a kernel with a measurable metric and a profiler available.

Repo layout

skills/autoresearch-kernel/SKILL.md   the full protocol
LICENSE                               MIT
README.md

Honest limits

This is a protocol, not a sandbox. The METRIC-only, profiler-grounded, and git-only rules make cheating visible and effortful — they don't make it impossible. A profile is a snapshot of one shape on one device, so a win on the profiled shape can lose on another; confirm the metric on every in-scope shape before claiming victory. The single most valuable thing you can build to make the discipline real is a pair of wrappers — .auto/measure.sh that emits real METRIC lines and .auto/profile.sh that writes a real profiler report.

Acknowledgements

This skill stands on prior work in autonomous, no-human-in-the-loop research loops:

  • auto-research — the pure-skill loop this edition is built on.
  • autoresearch-cli — a CLI contract layer for Claude Code that enforces honest measurement, commits, reverts, and bookkeeping for autonomous optimization research.
  • Deli_AutoResearch — a protocol framework for long-horizon agent tasks, with the conventions for state persistence, stall detection, and layered watchdogs that guard against cognitive loops, stalling, and runtime fragility.

License

MIT — see LICENSE.

About

Profile-driven autonomous kernel optimization loop — a portable skill file for AI agents (CUDA/Triton/ROCm/CPU/SIMD)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors