dotnet-test: add Rule #0 (Confirm the Test Target) to ruby/powershell extensions#702
Merged
Evangelink merged 3 commits intoJun 1, 2026
Merged
Conversation
… extensions
When the prompt does not name a specific file ("test the repository", "one
core module", "comprehensive suite"), agents frequently target the wrong
code — typically the largest upstream module that already has rich existing
tests — instead of the newly-added module the user actually wants tested.
Recent benchmark runs (top5-{ruby,powershell}-*-{simple,complex}) showed
this is a dominant failure mode for Ruby/PowerShell: the agent burns 50+
turns writing tests for files the verifier never measures, while the real
target (a small untracked `lib/string_utils.rb` or `tools/StringUtils.psm1`)
sits one `git status` away.
Adds a Rule #0 to both ruby.md and powershell.md, ahead of the existing
"Rule #1: Investigate the Repo First". The rule:
- Tells the agent to use git history (`git status -s`,
`git ls-files --others --exclude-standard`,
`git log --diff-filter=A --name-only -5`) to find the actual target
rather than guessing from repo size.
- Documents a Test Placement Contract — RSpec scopes to `spec/`,
Pester scopes to whatever directory the harness passes to
`Invoke-Pester -Path`; tests placed elsewhere are invisible.
- Adds a First-Test Sanity Loop: write one test, run --dry-run /
-PassThru to confirm discovery > 0, fix LoadError / Import-Module
issues before expanding. Catches placement mistakes on turn 1.
Both files validate cleanly under skill-validator. The existing Rule #1
and all subsequent sections are unchanged — this is a pure prepend.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new “Rule #0: Confirm the Test Target” to the Ruby and PowerShell code-testing extensions to reduce a common failure mode where agents write tests for the wrong (often large, already-tested) parts of a repo when prompts are vague.
Changes:
- Prepend a Ruby Rule #0 describing git-based target discovery, test placement expectations (RSpec/Minitest), and a first-test discovery sanity check.
- Prepend a PowerShell Rule #0 describing git-based target discovery, Pester test placement expectations, and a first-test discovery sanity check.
Show a summary per file
| File | Description |
|---|---|
| plugins/dotnet-test/skills/code-testing-extensions/extensions/ruby.md | Adds Rule #0 to guide correct Ruby test target selection and early spec discovery validation. |
| plugins/dotnet-test/skills/code-testing-extensions/extensions/powershell.md | Adds Rule #0 to guide correct PowerShell/Pester test target selection and early test discovery validation. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 1
Contributor
Skill Coverage Report
|
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
nohwnd
approved these changes
May 29, 2026
- Rephrase Rule #0's discovery preamble in ruby.md and powershell.md to call out the commands as the read-only exception to Rule #1, removing the apparent contradiction between 'before planning' and Rule #1's 'before writing any test or running any command'. - Broaden the spec_helper require grep to match leading whitespace and `require_relative`, avoiding false negatives that send the agent to the wrong target. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/evaluate |
Contributor
Skill Validation ResultsModel: claude-opus-4.6 | Judge: claude-opus-4.6 🔍 Full Results - additional metrics and failure investigation steps ▶ Sessions Visualisation -- interactive replay of all evaluation sessions |
JanKrivanek
approved these changes
Jun 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the test-generation prompt does not name a specific file ("test the repository", "one core module", "comprehensive suite"), agents frequently target the wrong code — typically the largest upstream module that already has rich existing tests — instead of the newly-added module the user actually wants tested.
Recent benchmark runs showed this is a dominant failure mode for Ruby/PowerShell: the agent burns 50+ turns writing tests for files the verifier never measures, while the real target (a small untracked
lib/string_utils.rbortools/StringUtils.psm1) sits onegit statusaway.Change
Adds Rule #0: Confirm the Test Target to both
ruby.mdandpowershell.mdextensions, ahead of the existing "Rule #1: Investigate the Repo First". The rule:git status -s,git ls-files --others --exclude-standard,git log --diff-filter=A --name-only -5) to find the actual target rather than guessing from repo size.spec/, Pester scopes to whatever directory the harness passes toInvoke-Pester -Path; tests placed elsewhere are invisible.--dry-run/-PassThruto confirm discovery > 0, fixLoadError/Import-Moduleissues before expanding. Catches placement mistakes on turn 1.This complements an msbench-side benchmark fix that aligns the affected prompts with their verifier scope; the skill change provides defense-in-depth for real-world vague prompts.
Validation
dotnet run --project eng/skill-validator/src -- check --plugin ./plugins/dotnet-test→ ✅ All checks passed (22 skills, 11 agents, 1 plugin).