Skip to content

Switch trajectory storage from YAML to JSONL format#204

Merged
corbt merged 1 commit into
mainfrom
yaml-to-jsonl-trajectories
Jul 10, 2025
Merged

Switch trajectory storage from YAML to JSONL format#204
corbt merged 1 commit into
mainfrom
yaml-to-jsonl-trajectories

Conversation

@corbt

@corbt corbt commented Jul 6, 2025

Copy link
Copy Markdown
Collaborator

Summary

  • Migrate trajectory storage from YAML to JSONL format for massive performance improvements
  • 160x faster loading: 1.9 seconds vs 300 seconds (benchmarked by @kyle)
  • ~6% smaller file size

Why this change?

Kyle originally suggested using YAML for human readability, but the performance cost turned out to be too high. This PR switches to JSONL (newline-delimited JSON) format while maintaining full backward compatibility.

Changes

  • Update trajectory_logging.py to serialize as JSONL (one JSON object per line)
  • Update backend to save new files with .jsonl extension
  • Add backward compatibility to read both YAML and JSONL formats
  • Consolidate duplicate load_trajectories implementations (removed art-e specific version)
  • Add new aggregate_trajectories module for step-level metric aggregation

Backward Compatibility

  • All loading functions can read both .yaml and .jsonl files
  • Existing YAML trajectories don't need to be converted
  • New trajectories will be saved as JSONL

Trade-offs

  • Files are less human-readable (JSON instead of YAML)
  • We'll add better observability tools to compensate for this

Test plan

  • Test that new trajectories are saved as JSONL
  • Test that old YAML trajectories can still be loaded
  • Test that all trajectory loading functions work with both formats
  • Verify performance improvement

🤖 Generated with Claude Code

@corbt corbt force-pushed the yaml-to-jsonl-trajectories branch from 45e03c8 to 4a1e99b Compare July 6, 2025 06:57
@corbt corbt requested a review from bradhilton July 6, 2025 06:58
@corbt corbt force-pushed the yaml-to-jsonl-trajectories branch from a6fe2a3 to 4a1e99b Compare July 6, 2025 07:24
@corbt corbt changed the title Switch trajectory storage from YAML to JSONL format [WIP] Switch trajectory storage from YAML to JSONL format Jul 7, 2025
@corbt corbt changed the title [WIP] Switch trajectory storage from YAML to JSONL format Switch trajectory storage from YAML to JSONL format Jul 8, 2025
This change migrates our trajectory logging from YAML to JSONL format,
achieving a 160x speedup in loading (1.9s vs 300s) and ~6% smaller files.

Changes:
- Update trajectory_logging.py to serialize as JSONL (one JSON object per line)
- Update backend to save files with .jsonl extension
- Add backward compatibility to read both YAML and JSONL formats
- Consolidate duplicate load_trajectories implementations
- Add new aggregate_trajectories module for step-level aggregation

My bad for originally suggesting YAML - the human readability wasn't worth
the performance cost. We'll add better observability tools to compensate
for the reduced readability of JSONL files.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@corbt corbt force-pushed the yaml-to-jsonl-trajectories branch from 4a1e99b to 46e9aea Compare July 10, 2025 00:45
@corbt corbt merged commit 5522bde into main Jul 10, 2025
2 checks passed
surajpatildev pushed a commit to meetkiara/ART that referenced this pull request May 20, 2026
Switch trajectory storage from YAML to JSONL format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants