Add extension files for Python, TypeScript, and PowerShell#617
Conversation
First-draft language extension files following the dotnet.md pattern. These guide the polyglot test agent on build/test/lint/fix for each language. Python: pytest-focused, environment/runner detection (Poetry/PDM/uv/Hatch), public-API testing philosophy, common errors, mocking guidelines. TypeScript: package-manager detection (npm/pnpm/yarn/bun), Jest+Vitest+Mocha support, ESM/CJS guidance, TS-specific considerations, framework detection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Key changes: - Add 'Rule dotnet#1: Investigate the repo first' as top section - Remove generic templates (anchor to wrong conventions) - Remove testing philosophy (unactionable for agent) - Fix command inconsistency (parameterize with <prefix>/<exec>) - Move dependency install to 'last resort' section - Add error-driven fixer playbook with concrete fixes - Fix Mocha --grep (test name, not file filter) - Fix ESM guidance (don't change package.json type field) - Add monorepo/workspace guidance (Nx, Turborepo) - Add framework notes (React, Express, NestJS) - Add Jest mock hoisting warning - Cut from 224+289 lines to 117+136 lines (~57% reduction) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Key sections: - Repo investigation first (shell target detection) - Pester v5 discovery vs run phase rules - Import patterns for modules, library scripts, and executable scripts - Cross-platform guidance (pwsh vs powershell.exe, Join-Path, casing) - Non-terminating error handling with Should -Throw - Mock scoping, -ModuleName, PesterBoundParameters - TestDrive: for file-based tests - Non-obvious assertion gotchas (Contain vs Be, Throw needs scriptblock) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Reword 'do not add' to 'use existing, only introduce if none exist' - Add test framework detection table to dotnet.md (MSTest/xUnit/NUnit) - Tailor wording per language (Python defaults to pytest, TS follows scripts.test, PS defaults to Pester) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rule dotnet#1 now searches for ALL test file formats (not just test_*.py), explicitly calls out custom frameworks like UTscapy, and emphasizes adopting repo conventions fully rather than layering pytest on top. Test Commands section now has a custom framework block before the pytest block, and Test File Naming defers to repo conventions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Ran benchmarks against the branch. Results by language: TypeScript — clear improvement on batnoter instances:
C# — big jump on contoso-university:
Python — initial run showed a rubric regression on scapy (92% → 67%) because python.md was too pytest-centric. The agent still discovered UTscapy from task context but the pytest-heavy guidance crowded out domain-specific patterns (corrupt_bits, ContextManagerCaptureOutput). Fixed by making python.md framework-adaptive — \Rule #1\ now searches for all test formats, test commands section leads with custom frameworks before pytest, file naming defers to repo conventions. After the fix:
C++ — fmt-chrono-simple errors in both baseline and extension runs (broken instance). |
|
/evaluate |
Skill Validation ResultsModel: claude-opus-4.6 | Judge: claude-opus-4.6 🔍 Full Results - additional metrics and failure investigation steps ▶ Sessions Visualisation -- interactive replay of all evaluation sessions |
First draft of extension files for Python, TypeScript, and PowerShell — same idea as the existing
dotnet.mdandcpp.md, but for the polyglot pipeline.These are initial ideas, not polished. The plan is to run Atlas benchmarks against them, see what helps and what doesn't, and iterate.
Each file focuses on what an LLM would get wrong without guidance — repo discovery, command detection, common errors, mocking pitfalls. Generic language knowledge the model already has is kept to a minimum.
What's here
python.md(~117 lines) — pytest, environment/runner detection (Poetry/PDM/uv/Hatch), import layout, error playbooktypescript.md(~136 lines) — Jest/Vitest/Mocha, package manager detection, ESM/CJS, monorepo/framework notespowershell.md(~110 lines) — Pester v5, discovery vs run phase pitfalls, cross-platform, mock scopingdotnet.md— added test framework detection table (MSTest/xUnit/NUnit)SKILL.md— updated with new entriesNot here yet
python-examples.md/typescript-examples.md/powershell-examples.md(followingdotnet-examples.mdpattern) — worth adding once we see benchmark resultsdotnet.mdcould use "investigate repo first" and structural alignment with the newer files — left that for a follow-up