Skip to content

[codex] Generalize prebuilt-index reproduction doc generation beyond MS MARCO #3233

@lintool

Description

@lintool

PR #3197 adds generated markdown docs for MS MARCO prebuilt-index reproductions, but the current implementation is still fairly bespoke. Follow up by refactoring the doc generation path so it is more config-driven and can cover the remaining prebuilt-index reproduction configs.

Requested cleanup from review:

  • Evaluate whether individual docgen templates are still needed, or whether the markdown can be generated directly from the reproduction YAML configs.
  • Reduce duplication across the generateMsMarcoXReport methods in GenerateReproductionDocsFromPrebuiltIndexes2Test; these methods are structurally similar and should likely share a generic generator.
  • Move hardcoded summary/table mappings into a static definition or into the YAML configs, rather than baking them into the Java generator logic.
  • Add generated docs for BEIR and BRIGHT configs, which are present under src/main/resources/reproduce/from-prebuilt-indexes/configs but are not covered by the current generated markdown docs.

Acceptance criteria:

  • Doc generation supports MS MARCO, BEIR, and BRIGHT prebuilt-index reproduction configs.
  • Repeated report-generation logic is consolidated into a generic config-driven path where practical.
  • Any remaining hardcoded mappings are minimal and clearly isolated.
  • Generated docs are refreshed and covered by the existing doc generation test workflow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions