Constraint validation, timeout docs, single-case runner #2

bernalde · 2025-11-24T17:03:46Z

Title: Add constraint validation, solver-timeout docs, single-case runner, and tests

Summary

This PR introduces post-solve constraint validation for Pyomo benchmark runs, a lightweight
single-case runner to debug individual parameter combinations, documentation of the IPOPT
timeout semantics, a Future Work roadmap, and unit tests for the new validation and adapter
behavior. The changes are scoped to benchmarking utilities and do not modify core physics.

Key Changes

benchmarks/scripts/validate.py: Added validate_constraints() to check dryness, product
temperature, and ramp-rate constraints and return detailed violation metrics.
benchmarks/scripts/run_single_case.py: New utility to run one scipy baseline + one Pyomo run
with full terminal output for debugging (supports --tee, --solver-timeout, ramp overrides).
benchmarks/src/adapters.py: Ensure CLI ramp overrides are applied before extracting
ramp_constraints metadata; pass solver_timeout through to optimizers.
lyopronto/pyomo_models/optimizers.py: Add solver_timeout parameter usage; set IPOPT
max_cpu_time (CPU seconds). Note: IPOPT <=3.13 does not support max_wall_time.
benchmarks/notebooks/grid_analysis_SIMPLE.ipynb: Add JSONL v2 schema explanation and an
aggregated constraint-metrics summary cell for analysis convenience.
docs/FUTURE_WORK.md: Roadmap capturing next steps (wall-time guard, logging refactor, tests).
tests/test_benchmarks_validate.py: New unit tests for validate_constraints and adapter ramp
override logic.
.gitignore: Ignore generated benchmark artifacts under benchmarks/results/.

Why

Earlier cases showed that Pyomo solvers reported success even when constraints (drying target or
ramp limits) were violated; this PR fixes that by validating trajectories post-solve and marking
failed runs appropriately in analysis. It also supplies developer tooling to reproduce and
diagnose problematic parameter combinations faster.

Testing / Validation Steps

Local (quick):

conda activate lyopronto
pytest tests/test_benchmarks_validate.py -q
pytest tests -q  # full suite (takes ~5 minutes)

Manual scenario reproduction:

# Run a single-case debug with verbose IPOPT output
python benchmarks/scripts/run_single_case.py \
  --task both --R0 0.4 --A1 20 --method fd --n-elements 1000 \
  --ramp-Tsh-max 40 --ramp-Pch-max 0.05 --solver-timeout 180 --tee

Review Checklist

Code compiles and tests pass (CI will run full suite)
Confirm validate_constraints metrics are included in JSONL output when running
grid_cli.py generate ... (this is done by grid_cli integration)
Documentation updated: README and notebook reflect behavior and schema
Decide fate of legacy benchmark scripts deleted from repo (separate cleanup PR recommended)

Risks and Notes

IPOPT max_cpu_time limits CPU time, not wall-clock time. Upgrading to IPOPT 3.14 enables
max_wall_time. For environments where wall time is essential, use a process-level wrapper
(e.g., timeout) or upgrade the solver.
Staged solve prints can be noisy; consider centralizing logging behind a verbosity flag.

Files Changed (high level)

benchmarks/scripts/validate.py (new/updated)
benchmarks/scripts/run_single_case.py (new)
benchmarks/src/adapters.py (updated)
lyopronto/pyomo_models/optimizers.py (updated)
benchmarks/notebooks/grid_analysis_SIMPLE.ipynb (updated)
docs/FUTURE_WORK.md (new)
tests/test_benchmarks_validate.py (new)
.gitignore (updated)

Suggested Reviewers

@team-lyo (benchmark maintainers)
@physics-dev (physics/ops team for solver semantics)

Notes for merging

Target branch: pyomo (this PR should be merged into the pyomo integration branch)
After merge: open follow-up PRs for test expansion and logging refactor as described in
docs/FUTURE_WORK.md.

Generated by automation; edit before submitting the PR if needed.

- Grid: grid.jsonl + ratio CSV/PNG for product.A1 x ht.KC - Batch: series_baseline.jsonl and summary JSON - Includes objective_time_hr and solver metadata in records

Critical Bug Fix: - Fix IPOPT warmstart options being unconditionally enabled - Now properly conditional on warmstart_scipy parameter - Affects optimize_Tsh_pyomo, optimize_Pch_pyomo, optimize_Pch_Tsh_pyomo - Prevents state leakage between independent benchmark runs Benchmark Infrastructure: - Add comprehensive CLI-based benchmarking system (grid_cli.py) - Add adapter layer for scipy/Pyomo optimizer integration - Add JSONL-based result schema with structured validation - Add Makefile for common benchmark workflows Benchmark Analysis: - Add Jupyter notebook for visualization (grid_analysis.ipynb) - Add baseline 3×3 Tsh grid results (A1 × KC parameter sweep) - Add heatmaps showing objective parity and speedup - Demonstrate ~5-10% objective difference, ~250× average speedup Documentation: - docs/BENCHMARK_WARMSTART_FIX.md - Root cause and fix details - docs/BENCHMARK_REFACTOR_COMPLETE.md - Architecture overview - benchmarks/README.md - Usage guide and CLI reference - benchmarks/results/README.md - Version control policy Version Control: - .gitignore for generated benchmark data - Keep representative baseline examples for documentation - All generated data is reproducible via CLI Verification: - Regenerated benchmarks confirm no warmstart state leakage - FD: -5.23% objective difference, 289× speedup - Collocation: -9.14% objective difference, 267× speedup - First-run overhead (~1s) is expected (JIT/library initialization)

- Document CLI-based benchmarking workflow - Add quick-start examples for parameter grid benchmarks - Reference Makefile shortcuts and detailed documentation

Critical Fix: - Original schedule was too short (35 hours), causing all Pch benchmarks to fail - Product dried only 70% before schedule expired with baseline A1=16 - Higher resistance products (A1=18, A1=20) require even longer times Root Cause: - Drying time scales with product resistance (A1 parameter) - A1=16: ~70 hours needed - A1=18: ~78 hours needed - A1=20: ~86 hours needed - Original schedule [300, 1800] minutes = 35 hours total (insufficient) Solution: - Increased schedule to [300, 5700] minutes = 100 hours total - Provides margin for parameter grid variations - Applied to both scipy and Pyomo Pch adapters Verification: - All 27 benchmark combinations now pass (9 scipy + 9 FD + 9 colloc) - 3×3 parameter grid: A1 ∈ {16, 18, 20} × KC ∈ {2.75e-4, 3.3e-4, 4.0e-4} Results (baseline_Pch_3x3.jsonl): - Scipy: 76±7 hr objective, 50±4 s wall time - FD: 11±1 hr objective (-85%), 0.2 s wall time (1244× speedup) - Collocation: 11±1 hr objective (-86%), <0.1 s wall time (1187× speedup) Note: Pyomo finds dramatically better pressure trajectories (~7× faster drying) than scipy's sequential optimization approach. This is expected and demonstrates the value of simultaneous optimization.

Introduce single-case runner (run_single_case.py) for interactive Pyomo vs scipy debugging. Add Future Work roadmap (docs/FUTURE_WORK.md). Enhance analysis notebook with JSONL schema & constraint metrics summary. Extend IPOPT configuration with explicit CPU-time note. Update .gitignore to exclude generated benchmark artifacts. Update README benchmarking workflow.

…behavior

review-notebook-app · 2025-11-24T17:03:53Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Copilot

Pull request overview

This PR introduces post-solve constraint validation for Pyomo benchmark runs, a lightweight single-case runner for debugging individual parameter combinations, documentation of IPOPT timeout semantics, a future work roadmap, and unit tests for validation and adapter behavior. The changes are focused on benchmarking utilities and do not modify core physics code.

Key Changes

Added constraint validation (validate_constraints()) to verify dryness, temperature, and ramp-rate constraints post-solve
Created run_single_case.py for debugging individual parameter combinations with full terminal output
Updated adapter logic to pass solver_timeout and apply CLI ramp overrides before extracting metadata
Enhanced optimizers.py with solver_timeout parameter, ramp-rate constraints, and metadata tracking
Added new example (example_ramp_constraints.py) demonstrating ramp-rate constraint usage
Created comprehensive tests in test_benchmarks_validate.py

Reviewed changes

Copilot reviewed 61 out of 112 changed files in this pull request and generated 33 comments.

Show a summary per file

File	Description
`tests/test_benchmarks_validate.py`	New test file for constraint validation and adapter behavior (118 lines)
`requirements.txt`	Added seaborn>=0.12.0 dependency
`lyopronto/pyomo_models/optimizers.py`	Added solver timeout, ramp constraints, metadata tracking, treat_n_elements_as_effective parameter
`examples/example_ramp_constraints.py`	New example demonstrating ramp-rate constraints with 4 scenarios
`docs/archive/*.md`	Multiple archive documentation files (historical reference)
`benchmarks/src/__init__.py`	New package initialization for benchmarks core modules
`benchmarks/results/.gitignore`	Added to ignore generated benchmark results

Copilot · 2025-11-24T17:11:06Z

benchmarks/src/adapters.py

+    meta: Dict[str, Any] = {}
+    model_ref = None
+    results_ref = None
+    solver_opts = {}


Variable solver_opts is not used.

Copilot · 2025-11-24T17:11:06Z

benchmarks/src/analyze_benchmark.py

+        return pd.DataFrame()
+
+    # Get parameter names from first row
+    param1_name = method_df.iloc[0]['param1_name']


Variable param1_name is not used.

Copilot · 2025-11-24T17:11:06Z

benchmarks/src/analyze_benchmark.py

+
+    # Get parameter names from first row
+    param1_name = method_df.iloc[0]['param1_name']
+    param2_name = method_df.iloc[0]['param2_name']


Variable param2_name is not used.

Copilot · 2025-11-24T17:11:07Z

benchmarks/cleanup.py

+
+    # Expected patterns
+    valid_prefixes = ['traj_', 'heatmap_obj_', 'heatmap_speed_', 'nominal_']
+    valid_suffixes = ['_fd.png', '_colloc.png', '_combined.png', '.png']


Variable valid_suffixes is not used.

Copilot · 2025-11-24T17:11:07Z

benchmarks/scripts/grid_cli.py

+        return 0
+
+    base_scen = copy.deepcopy(SCENARIOS[scenario_name])
+    vial = base_scen["vial"]


Variable vial is not used.

Copilot · 2025-11-24T17:11:12Z

benchmarks/scripts/run_single.py

+from pathlib import Path
+from typing import Any, Dict
+
+import numpy as np


Import of 'np' is not used.

Copilot · 2025-11-24T17:11:13Z

benchmarks/src/schema.py

+"""
+from __future__ import annotations
+import sys, platform, datetime, json, hashlib
+from typing import Any, Dict, List, Tuple


Import of 'Tuple' is not used.
Import of 'List' is not used.

Copilot · 2025-11-24T17:11:13Z

benchmarks/scripts/diagnostics.py

+        )
+        if result.returncode == 0:
+            return result.stdout.strip()[:12]  # Short SHA
+    except (subprocess.TimeoutExpired, FileNotFoundError, Exception):


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2025-11-24T17:11:13Z

lyopronto/pyomo_models/optimizers.py

+            'ncp': None if use_finite_differences else n_collocation,
+            'treat_effective': treat_n_elements_as_effective if not use_finite_differences else False,
+        }
+    except Exception:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2025-11-24T17:11:13Z

benchmarks/src/schema.py

+            import numpy as _np
+            if isinstance(o, _np.ndarray):
+                return o.tolist()
+        except Exception:


'except' clause does nothing but pass and there is no explanatory comment.

bernalde added 7 commits November 14, 2025 19:13

Benchmarks: add baseline grid and batch results

1ad8de3

- Grid: grid.jsonl + ratio CSV/PNG for product.A1 x ht.KC - Batch: series_baseline.jsonl and summary JSON - Includes objective_time_hr and solver metadata in records

docs: Add benchmarking section to README

1f796f6

- Document CLI-based benchmarking workflow - Add quick-start examples for parameter grid benchmarks - Reference Makefile shortcuts and detailed documentation

Add unit tests: validate_constraints and pyomo_adapter ramp override …

e5aa75d

…behavior

Persist local benchmark refactor changes, validation, and artifacts

9ab6da8

bernalde self-assigned this Nov 24, 2025

bernalde requested review from Copilot and parkyr November 24, 2025 17:04

Copilot started reviewing on behalf of bernalde November 24, 2025 17:05 View session

Copilot finished reviewing on behalf of bernalde November 24, 2025 17:06

Copilot AI reviewed Nov 24, 2025

View reviewed changes

bernalde mentioned this pull request Nov 24, 2025

Pyomo: slim PR — remove benchmark artifacts #3

Open

Constraint validation, timeout docs, single-case runner #2

Are you sure you want to change the base?

Constraint validation, timeout docs, single-case runner #2

Uh oh!

Conversation

bernalde commented Nov 24, 2025

Summary

Key Changes

Why

Testing / Validation Steps

Review Checklist

Risks and Notes

Uh oh!

review-notebook-app bot commented Nov 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants