Toy pipeline for neural simulation-based inference (NSBI) using pyhf with signal morphing via Lagrange interpolation. Three basis signals are trained at fixed parameter nodes; classifiers are used to build likelihood-ratio estimates; CLs upper limits are computed comparing several approaches.
bash run.shCreates background, data, and basis signal samples (Gaussian mixtures, 5 features) and produces validation plots.
python generate_distributions.py [--n_bkg 1000000] [--n_sig 100000] [--nodes 0 5 10]
Output: dataframes/background.parquet, dataframes/data.parquet, dataframes/signal_{0,5,10}.parquet, plots/
Trains one binary classifier (signal vs background) per basis node using k-fold cross-validation. Models are saved per fold.
python train_classifiers.py [--n_epochs 50] [--n_folds 2]
Output: models/classifier_signal_{0,5,10}_fold{0,1}.pt, models/loss_*.pdf
Evaluates each classifier on all samples using out-of-fold predictions. Computes per-event classifier scores, likelihood ratios, and per-event optimal observable scores for every signal point. Also builds nD ratio-space histograms for the sufficient statistic method.
python evaluate_classifiers.py [--n_folds 2] [--n_points 21] [--nodes 0 5 10]
[--n_bins_sufficient N] [--min_bkg 5.0]
| Argument | Default | Meaning |
|---|---|---|
--n_bins_sufficient |
auto | Bins per dimension for nD ratio histograms; auto-determined from --min_bkg if not provided |
--min_bkg |
5.0 | Minimum expected background yield per nD bin |
Columns written per eval dataframe:
weightscore_{clf},ratio_{clf}— basis classifier score and ratio s/(1-s)score_opt_v{v:.2f},ratio_opt_v{v:.2f}— per-event optimal score/ratio at each signal point
Output:
eval_dataframes/background_eval.parquet,eval_dataframes/data_eval.parquet,eval_dataframes/signal_{0,5,10}_eval.parquetratio_templates/background.npz,ratio_templates/data.npz,ratio_templates/signal_{0,5,10}.npzplots/score_{clf}.pdf,plots/score_opt_v{v}.pdf
Loads eval dataframes and the pre-built nD ratio histograms from ratio_templates/, writes pyhf workspaces, then computes 95% CL upper limits on the signal strength mu as a function of the signal parameter v using three methods.
python calculate_limits.py [--n_points 21] [--n_bins 10] [--mu_max 100] [--nodes 0 5 10]
| Method | Description |
|---|---|
| Basis (per-classifier) | Lagrange-morph the 1D score histogram for each basis classifier |
| Optimal (per-parameter) | Use pre-computed per-event optimal scores from eval dataframes |
| Sufficient (nD Lagrange) | Single pyhf workspace in nD ratio space; lagrange_morphing modifier computes sum_k L_k(v) T_k at runtime; v is pinned per scan point |
Output:
workspaces/score_basis_{clf}_{v}.json— one workspace per (classifier, signal point)workspaces/optimal_per_parameter_{v}.json— one workspace per signal pointworkspaces/sufficient.json— single nD morphing workspaceresults/limits.npz— all computed limitsplots/limits_overview.pdf— expected medians for all methodsplots/limits_optimal.pdf,plots/limits_sufficient.pdf— Brazil band plotsplots/limits_basis_{clf}.pdf— Brazil band per basis classifier
The optimal observable for distinguishing signal at parameter value v from background is the likelihood ratio:
r_opt(x; v) = p(x | signal, v) / p(x | bkg) = sum_k L_k(v) r_k(x)
s_opt(x; v) = r_opt / (1 + r_opt)
where L_k(v) are the Lagrange polynomial weights for the basis nodes and r_k(x) = s_k(x)/(1-s_k(x)) is the ratio from basis classifier k. Pre-computing s_opt per event and histogramming gives the exact optimal approach.
One pyhf workspace encodes all basis signal templates in the nD ratio space (r_0, r_1, ..., r_{n-1}). The custom lagrange_morphing modifier (registered in pyhf.modifiers) holds the N basis nD templates and at evaluation time computes:
signal_template(v) = sum_k L_k(v) T_k
The workspace is built once; limits for each v are obtained by pinning v via fixed_params.
| File | Description |
|---|---|
utils.py |
Constants, Lagrange weights, MLP classifier, model loading/scoring, data loading, histogram helpers (quantile edges, ratio histograms, clip-and-renorm) |
limits_utils.py |
Argument parser, pyhf workspace builders (_make_simple_spec, make_lagrange_ratio_spec), histogram computation, all plotting functions (score distributions, limit overviews, Brazil bands, optimal observable) |
nsbi_tooling/
generate_distributions.py
train_classifiers.py
evaluate_classifiers.py
calculate_limits.py
utils.py
limits_utils.py
run.sh
pyhf/ # editable pyhf install (local fork)
src/pyhf/modifiers/
lagrange_morphing.py # custom modifier: sum_k L_k(v) T_k
dataframes/ # raw generated samples
models/ # trained classifier weights
eval_dataframes/ # per-event scores / ratios
ratio_templates/ # nD ratio histograms (background + basis signals + data)
workspaces/ # pyhf workspace JSON files
results/ # saved limit arrays (.npz)
plots/ # output plots