Skip to content

Harden npm launcher and scale corpus-generate workflow#23

Open
TheColby wants to merge 5 commits into
mainfrom
codex/add-cli-option-for-data-augmentation-z91dl5
Open

Harden npm launcher and scale corpus-generate workflow#23
TheColby wants to merge 5 commits into
mainfrom
codex/add-cli-option-for-data-augmentation-z91dl5

Conversation

@TheColby
Copy link
Copy Markdown
Owner

Motivation

  • Make the npm launcher robust in real-world installs by reliably locating Python, capturing child output, and only bootstrapping deps when truly missing.
  • Enable large-scale, interruption-safe audio corpus generation with parallelism, sharding, and resume/checkpoint semantics to support very large --variants-per-input runs.

Description

  • Hardened npm/verbx.js to search Python executables ($PYTHON, python3, python, and py on Windows), capture stdout/stderr for accurate ModuleNotFoundError detection, and print clearer bootstrap diagnostics when invoking -m pip install --user.
  • Extended verbx batch corpus-generate in src/verbx/cli.py with --jobs, --checkpoint-file, --resume, --num-shards, and --shard-index, streaming JSONL manifest writes, shard-aware planning, and per-run resume/failure accounting.
  • Refactored variant rendering into _generate_corpus_variant with deterministic per-variant RNG seeding so parallel generation is reproducible, and used ThreadPoolExecutor to parallelize per-input variant generation.
  • Updated documentation (README.md, docs/AI_AUGMENTATION.md) to document launcher fallback/bootstrap behavior and the new corpus-generation flags, and added unit tests covering shard dry-run output, checkpoint/resume, and npm-launcher behavior.

Testing

  • Ran static checks with python -m ruff check src/verbx/cli.py tests/test_cli.py tests/test_npm_launcher.py which passed.
  • Executed python -m pytest tests/test_cli.py -k "corpus_generate" -q which produced 4 passed, 104 deselected.
  • Executed python -m pytest tests/test_npm_launcher.py -q which produced 2 passed.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant