feat(benchmarks): add generation benchmark #240

vetertann · 2025-12-06T15:27:37Z

Linked Issue

Closes #207

Description

This PR imports the Python-based generation benchmark suite into the repository.

Placed the files in a dedicated benchmarks/generation directory to keep the Python environment isolated from the main TS project. This benchmark allows for running the token efficiency and accuracy tests documented in the generation/readme.md update.

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test coverage improvement

Changes Made

Created benchmarks/generation/ directory.
Added Python source code (src/), Pydantic models, and prompt templates.
Added Gold Standard datasets (data/*.gold.json and *.toon).
Added runs results in *.csv
Added requirements.txt and a dedicated README.md with setup benchmark description and instructions for the Python environment.

SPEC Compliance

This PR implements/fixes spec compliance
Spec section(s) affected: N/A (Tooling only)
Spec version: N/A

Testing

All existing tests pass
Added new tests for changes
Tests cover edge cases and spec compliance

Pre-submission Checklist

My code follows the project's coding standards
I have run code formatting/linting tools
I have added tests that prove my fix/feature works
New and existing tests pass locally
I have updated documentation if needed
I have reviewed the TOON specification for relevant sections

Breaking Changes

No breaking changes
Breaking changes (describe migration path below)

Additional Context

This benchmark suite is self-contained and does not affect the core TypeScript build process. It requires an API key (e.g., Nebius, OpenAI) to run.

feat(benchmarks): add generation benchmark suite

f25edb6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(benchmarks): add generation benchmark #240

feat(benchmarks): add generation benchmark #240

Uh oh!

vetertann commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(benchmarks): add generation benchmark #240

Are you sure you want to change the base?

feat(benchmarks): add generation benchmark #240

Uh oh!

Conversation

vetertann commented Dec 6, 2025

Linked Issue

Description

Type of Change

Changes Made

SPEC Compliance

Testing

Pre-submission Checklist

Breaking Changes

Additional Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant