feat(benchmarks): add generation benchmark #240
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Linked Issue
Closes #207
Description
This PR imports the Python-based generation benchmark suite into the repository.
Placed the files in a dedicated
benchmarks/generationdirectory to keep the Python environment isolated from the main TS project. This benchmark allows for running the token efficiency and accuracy tests documented in thegeneration/readme.mdupdate.Type of Change
Changes Made
benchmarks/generation/directory.src/), Pydantic models, and prompt templates.data/*.gold.jsonand*.toon).requirements.txtand a dedicatedREADME.mdwith setup benchmark description and instructions for the Python environment.SPEC Compliance
Testing
Pre-submission Checklist
Breaking Changes
Additional Context
This benchmark suite is self-contained and does not affect the core TypeScript build process. It requires an API key (e.g., Nebius, OpenAI) to run.