-
Notifications
You must be signed in to change notification settings - Fork 283
add input design documentation to book source #3716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add input design documentation to book source #3716
Conversation
| Multi-site ensembles and sensitivity analyses that sample over input files use | ||
| an `input_design` data.frame to keep parameter draws and input files aligned | ||
| across runs. The design is created up front (typically via | ||
| `generate_joint_ensemble_design()` inside `runModule.run.write.configs()`) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop the "inside" bit. I think the preferred usage would be to generate the design first then pass it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropped and clarified
| `write.sa.configs()`. It is not saved automatically to `samples.Rdata`, so keep | ||
| your copy if you need to reuse it. | ||
|
|
||
| - **Parameter column:** `param` gives the 1-based index of the posterior draw to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear what a "1-based index" is. You could just call this an "index" or "index (i.e. row number)"
| across runs. The design is created up front (typically via | ||
| `generate_joint_ensemble_design()` inside `runModule.run.write.configs()`) and | ||
| passed through to `run.write.configs()`, `write.ensemble.configs()`, and | ||
| `write.sa.configs()`. It is not saved automatically to `samples.Rdata`, so keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK write.sa.configs doesn't currently use the input design, though we've discussed a future direction of pulling out the SA design code into the design function and then merging write.ensemble.configs & write.sa.configs, but that might not happen for months.
|
|
||
| - **Parameter column:** `param` gives the 1-based index of the posterior draw to | ||
| use for each run. `run.write.configs()` reads it when building ensemble | ||
| samples so leave it in the design even though the downstream config writers do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "even though the downstream config writers do not reference it directly" sure sounds like a bug that should be fixed, not a "feature" that should be documented.
| samples so leave it in the design even though the downstream config writers do | ||
| not reference it directly. | ||
| - **Input columns:** any name that matches a tag under `run/inputs` (for example | ||
| `met`, `soil`, `veg`, `poolinitcond`). Values are 1-based indices into that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop "1-based" throughout PR
| - **Input columns:** any name that matches a tag under `run/inputs` (for example | ||
| `met`, `soil`, `veg`, `poolinitcond`). Values are 1-based indices into that | ||
| input’s `path` list. Leaving a column out keeps that input fixed across runs. | ||
| - **Row count and order:** include at least one row per run you plan to write. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the "at least" -- you can't specify multiple rows per run.
| - **Row count and order:** include at least one row per run you plan to write. | ||
| For ensembles this means `ensemble.size` rows; for sensitivity analysis it | ||
| should cover the median run (row 1) plus every trait/quantile combination in | ||
| the order they are written. Extra rows are ignored; too few will leave later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text "Extra rows are ignored; too few will leave later
runs without input paths and lead to confusing readme/config files, so size
the table to cover every planned run" also sounds like a bug. I'd prefer the code thrown an error before starting any runs if it detects a mismatch between the design size and ensemble size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added validation in runModule.run.write.configs and run.write.configs
| |------:|----:|-----:| | ||
| | 1 | 1 | 1 | | ||
| | 2 | 2 | 1 | | ||
| | 3 | 2 | 2 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make the met column be 1,2,1,2 as that's an easier design to understand
base/workflow/R/run.write.configs.R
Outdated
| #' @param input_design data.frame design matrix linking parameter draws and any | ||
| #' sampled inputs across runs. Include a `param` column whose values select | ||
| #' rows from `trait.samples`/`ensemble.samples` plus optional columns named for | ||
| #' `settings$run$inputs` tags (e.g. `met`, `soil`) with 1-based indices into | ||
| #' each input's `path` list. Provide at least one row per planned run (median + | ||
| #' all SA members and/or `ensemble.size`). Usually generated by | ||
| #' `runModule.run.write.configs()` via `generate_joint_ensemble_design()`, but | ||
| #' custom designs may be supplied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since run.write.configs is typically called internally while runModule.rn.write.configs is user-facing, I suggest moving the full details there and having the "as documented in..." here.
| #' @param input_design data.frame design matrix linking parameter draws and any | |
| #' sampled inputs across runs. Include a `param` column whose values select | |
| #' rows from `trait.samples`/`ensemble.samples` plus optional columns named for | |
| #' `settings$run$inputs` tags (e.g. `met`, `soil`) with 1-based indices into | |
| #' each input's `path` list. Provide at least one row per planned run (median + | |
| #' all SA members and/or `ensemble.size`). Usually generated by | |
| #' `runModule.run.write.configs()` via `generate_joint_ensemble_design()`, but | |
| #' custom designs may be supplied. | |
| #' @param input_design data frame containing the design matrix describing parameter and input indices, as | |
| #' documented in \code{runModule.run.write.configs()}. |
| #' @param input_design design matrix describing parameter and input indices, as | ||
| #' documented in \code{run.write.configs()}. Defaults to the object returned by | ||
| #' \code{generate_joint_ensemble_design()} when NULL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #' @param input_design design matrix describing parameter and input indices, as | |
| #' documented in \code{run.write.configs()}. Defaults to the object returned by | |
| #' \code{generate_joint_ensemble_design()} when NULL. | |
| #' @param input_design data.frame design matrix linking parameter draws and any | |
| #' sampled inputs across runs. Include a `param` column whose values select | |
| #' rows from `trait.samples`/`ensemble.samples` plus optional columns named for | |
| #' `settings$run$inputs` tags (e.g. `met`, `soil`) with 1-based indices into | |
| #' each input's `path` list. Provide at least one row per planned run (median + | |
| #' all SA members and/or `ensemble.size`). Usually generated by | |
| #' `generate_joint_ensemble_design()` but custom designs may be supplied. | |
| #' If NULL, `generate_joint_ensemble_design()` will be called internally. |
CHANGELOG.md
Outdated
| - Added documentation for the `input_design` design matrix that coordinates | ||
| parameter draws and input file selections across ensemble runs. The matrix | ||
| requires a `param` column with parameter sample indices and one row per run | ||
| (#3677). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is covered by the block above
| - Added documentation for the `input_design` design matrix that coordinates | |
| parameter draws and input file selections across ensemble runs. The matrix | |
| requires a `param` column with parameter sample indices and one row per run | |
| (#3677). |
| and passed to `runModule.run.write.configs()`. It is not saved automatically to | ||
| `samples.Rdata`, so keep your copy if you need to reuse it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but flagging for improvement later: Needing this sentence is a good indicator the current behavior is unintuitive and the design should be saved somewhere by default (though not in samples.Rdata, as established in other threads).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree. Relying on the user to manually persist the design matrix is fragile.
This reminds me a potential gap fill could be done here -- #3708
Since we are moving toward a runs_manifest.csv to track run metadata (like pfts and traits, ), the natural home for the input design seems to be that same manifest. We could extend the manifest schema to include columns for the inputs (e.g. param_index, met_index, ...); that way runs_manifest.csv becomes the complete 'recipe' .
This would eliminate the need for a separate saved object entirely for that.
|
|
||
| # Validate design matrix size for MultiSettings | ||
| if (!is.null(settings$ensemble$size) && nrow(input_design) != settings$ensemble$size) { | ||
| stop("input_design has ", nrow(input_design), " rows but settings$ensemble$size is ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@infotroph do you think these need to be switched to PEcAn's logger?
de5bf23
into
PecanProject:release/v1.10.0
Description
Add comprehensive documentation for the input_design design matrix used to coordinate parameter draws and input file selection
closes : #3677
Motivation and Context
Review Time Estimate
Types of changes
Checklist: