Skip to content

Add extension files for Python, TypeScript, and PowerShell#617

Merged
JanKrivanek merged 7 commits into
dotnet:mainfrom
nohwnd:extension-inceptions
May 13, 2026
Merged

Add extension files for Python, TypeScript, and PowerShell#617
JanKrivanek merged 7 commits into
dotnet:mainfrom
nohwnd:extension-inceptions

Conversation

@nohwnd

@nohwnd nohwnd commented May 6, 2026

Copy link
Copy Markdown
Member

First draft of extension files for Python, TypeScript, and PowerShell — same idea as the existing dotnet.md and cpp.md, but for the polyglot pipeline.

These are initial ideas, not polished. The plan is to run Atlas benchmarks against them, see what helps and what doesn't, and iterate.

Each file focuses on what an LLM would get wrong without guidance — repo discovery, command detection, common errors, mocking pitfalls. Generic language knowledge the model already has is kept to a minimum.

What's here

  • python.md (~117 lines) — pytest, environment/runner detection (Poetry/PDM/uv/Hatch), import layout, error playbook
  • typescript.md (~136 lines) — Jest/Vitest/Mocha, package manager detection, ESM/CJS, monorepo/framework notes
  • powershell.md (~110 lines) — Pester v5, discovery vs run phase pitfalls, cross-platform, mock scoping
  • dotnet.md — added test framework detection table (MSTest/xUnit/NUnit)
  • SKILL.md — updated with new entries

Not here yet

  • No python-examples.md / typescript-examples.md / powershell-examples.md (following dotnet-examples.md pattern) — worth adding once we see benchmark results
  • dotnet.md could use "investigate repo first" and structural alignment with the newer files — left that for a follow-up

nohwnd and others added 5 commits May 5, 2026 16:11
First-draft language extension files following the dotnet.md pattern.
These guide the polyglot test agent on build/test/lint/fix for each language.

Python: pytest-focused, environment/runner detection (Poetry/PDM/uv/Hatch),
public-API testing philosophy, common errors, mocking guidelines.

TypeScript: package-manager detection (npm/pnpm/yarn/bun), Jest+Vitest+Mocha
support, ESM/CJS guidance, TS-specific considerations, framework detection.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Key changes:
- Add 'Rule dotnet#1: Investigate the repo first' as top section
- Remove generic templates (anchor to wrong conventions)
- Remove testing philosophy (unactionable for agent)
- Fix command inconsistency (parameterize with <prefix>/<exec>)
- Move dependency install to 'last resort' section
- Add error-driven fixer playbook with concrete fixes
- Fix Mocha --grep (test name, not file filter)
- Fix ESM guidance (don't change package.json type field)
- Add monorepo/workspace guidance (Nx, Turborepo)
- Add framework notes (React, Express, NestJS)
- Add Jest mock hoisting warning
- Cut from 224+289 lines to 117+136 lines (~57% reduction)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Key sections:
- Repo investigation first (shell target detection)
- Pester v5 discovery vs run phase rules
- Import patterns for modules, library scripts, and executable scripts
- Cross-platform guidance (pwsh vs powershell.exe, Join-Path, casing)
- Non-terminating error handling with Should -Throw
- Mock scoping, -ModuleName, PesterBoundParameters
- TestDrive: for file-based tests
- Non-obvious assertion gotchas (Contain vs Be, Throw needs scriptblock)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Reword 'do not add' to 'use existing, only introduce if none exist'
- Add test framework detection table to dotnet.md (MSTest/xUnit/NUnit)
- Tailor wording per language (Python defaults to pytest, TS follows
  scripts.test, PS defaults to Pester)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
nohwnd and others added 2 commits May 11, 2026 15:10
Rule dotnet#1 now searches for ALL test file formats (not just test_*.py),
explicitly calls out custom frameworks like UTscapy, and emphasizes
adopting repo conventions fully rather than layering pytest on top.

Test Commands section now has a custom framework block before the
pytest block, and Test File Naming defers to repo conventions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@nohwnd

nohwnd commented May 12, 2026

Copy link
Copy Markdown
Member Author

Ran benchmarks against the branch. Results by language:

TypeScript — clear improvement on batnoter instances:

  • batnoter-simple: coverage +24.3% → +24.5%, tests 114 → 132
  • batnoter-complex: coverage +32.8% → +43.9% (+11pp), tests 121 → 167, 17 test files vs ~6

C# — big jump on contoso-university:

  • contoso-university-simple: coverage +47.6% → +82.4% (+35pp), tests 81 → 145

Python — initial run showed a rubric regression on scapy (92% → 67%) because python.md was too pytest-centric. The agent still discovered UTscapy from task context but the pytest-heavy guidance crowded out domain-specific patterns (corrupt_bits, ContextManagerCaptureOutput).

Fixed by making python.md framework-adaptive — \Rule #1\ now searches for all test formats, test commands section leads with custom frameworks before pytest, file naming defers to repo conventions. After the fix:

  • scapy-2ad271f8 rubric: 67% → 92% (restored to baseline)
  • scapy-2ad27213: flipped to resolved (mutation now passes)
  • Overall Python resolve rate: 25% → 50%

C++ — fmt-chrono-simple errors in both baseline and extension runs (broken instance).

@JanKrivanek

Copy link
Copy Markdown
Member

/evaluate

@github-actions

Copy link
Copy Markdown
Contributor

Skill Validation Results

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

▶ Sessions Visualisation -- interactive replay of all evaluation sessions

@JanKrivanek

Copy link
Copy Markdown
Member

Runs on SWE Atlas, with opus 4.6

image

@JanKrivanek JanKrivanek merged commit 6e7adc5 into dotnet:main May 13, 2026
35 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants