Skip to content

malin-horstmann/nsbi_tooling

Repository files navigation

NSBI Tooling

Toy pipeline for neural simulation-based inference (NSBI) using pyhf with signal morphing via Lagrange interpolation. Three basis signals are trained at fixed parameter nodes; classifiers are used to build likelihood-ratio estimates; CLs upper limits are computed comparing several approaches.

Quick start

bash run.sh

Pipeline

1. generate_distributions.py — Generate samples

Creates background, data, and basis signal samples (Gaussian mixtures, 5 features) and produces validation plots.

python generate_distributions.py [--n_bkg 1000000] [--n_sig 100000] [--nodes 0 5 10]

Output: dataframes/background.parquet, dataframes/data.parquet, dataframes/signal_{0,5,10}.parquet, plots/


2. train_classifiers.py — Train basis classifiers

Trains one binary classifier (signal vs background) per basis node using k-fold cross-validation. Models are saved per fold.

python train_classifiers.py [--n_epochs 50] [--n_folds 2]

Output: models/classifier_signal_{0,5,10}_fold{0,1}.pt, models/loss_*.pdf


3. evaluate_classifiers.py — Score events and build ratio templates

Evaluates each classifier on all samples using out-of-fold predictions. Computes per-event classifier scores, likelihood ratios, and per-event optimal observable scores for every signal point. Also builds nD ratio-space histograms for the sufficient statistic method.

python evaluate_classifiers.py [--n_folds 2] [--n_points 21] [--nodes 0 5 10]
                               [--n_bins_sufficient N] [--min_bkg 5.0]
Argument Default Meaning
--n_bins_sufficient auto Bins per dimension for nD ratio histograms; auto-determined from --min_bkg if not provided
--min_bkg 5.0 Minimum expected background yield per nD bin

Columns written per eval dataframe:

  • weight
  • score_{clf}, ratio_{clf} — basis classifier score and ratio s/(1-s)
  • score_opt_v{v:.2f}, ratio_opt_v{v:.2f} — per-event optimal score/ratio at each signal point

Output:

  • eval_dataframes/background_eval.parquet, eval_dataframes/data_eval.parquet, eval_dataframes/signal_{0,5,10}_eval.parquet
  • ratio_templates/background.npz, ratio_templates/data.npz, ratio_templates/signal_{0,5,10}.npz
  • plots/score_{clf}.pdf, plots/score_opt_v{v}.pdf

4. calculate_limits.py — Compute and plot CLs limits

Loads eval dataframes and the pre-built nD ratio histograms from ratio_templates/, writes pyhf workspaces, then computes 95% CL upper limits on the signal strength mu as a function of the signal parameter v using three methods.

python calculate_limits.py [--n_points 21] [--n_bins 10] [--mu_max 100] [--nodes 0 5 10]
Method Description
Basis (per-classifier) Lagrange-morph the 1D score histogram for each basis classifier
Optimal (per-parameter) Use pre-computed per-event optimal scores from eval dataframes
Sufficient (nD Lagrange) Single pyhf workspace in nD ratio space; lagrange_morphing modifier computes sum_k L_k(v) T_k at runtime; v is pinned per scan point

Output:

  • workspaces/score_basis_{clf}_{v}.json — one workspace per (classifier, signal point)
  • workspaces/optimal_per_parameter_{v}.json — one workspace per signal point
  • workspaces/sufficient.json — single nD morphing workspace
  • results/limits.npz — all computed limits
  • plots/limits_overview.pdf — expected medians for all methods
  • plots/limits_optimal.pdf, plots/limits_sufficient.pdf — Brazil band plots
  • plots/limits_basis_{clf}.pdf — Brazil band per basis classifier

Methods in detail

Per-parameter optimal observable

The optimal observable for distinguishing signal at parameter value v from background is the likelihood ratio:

r_opt(x; v) = p(x | signal, v) / p(x | bkg) = sum_k L_k(v) r_k(x)
s_opt(x; v) = r_opt / (1 + r_opt)

where L_k(v) are the Lagrange polynomial weights for the basis nodes and r_k(x) = s_k(x)/(1-s_k(x)) is the ratio from basis classifier k. Pre-computing s_opt per event and histogramming gives the exact optimal approach.

Suffiecent ratio workspace with Lagrange signal morphing

One pyhf workspace encodes all basis signal templates in the nD ratio space (r_0, r_1, ..., r_{n-1}). The custom lagrange_morphing modifier (registered in pyhf.modifiers) holds the N basis nD templates and at evaluation time computes:

signal_template(v) = sum_k L_k(v) T_k

The workspace is built once; limits for each v are obtained by pinning v via fixed_params.


Shared modules

File Description
utils.py Constants, Lagrange weights, MLP classifier, model loading/scoring, data loading, histogram helpers (quantile edges, ratio histograms, clip-and-renorm)
limits_utils.py Argument parser, pyhf workspace builders (_make_simple_spec, make_lagrange_ratio_spec), histogram computation, all plotting functions (score distributions, limit overviews, Brazil bands, optimal observable)

Directory structure

nsbi_tooling/
  generate_distributions.py
  train_classifiers.py
  evaluate_classifiers.py
  calculate_limits.py
  utils.py
  limits_utils.py
  run.sh
  pyhf/                       # editable pyhf install (local fork)
    src/pyhf/modifiers/
      lagrange_morphing.py    # custom modifier: sum_k L_k(v) T_k
  dataframes/                 # raw generated samples
  models/                     # trained classifier weights
  eval_dataframes/            # per-event scores / ratios
  ratio_templates/            # nD ratio histograms (background + basis signals + data)
  workspaces/                 # pyhf workspace JSON files
  results/                    # saved limit arrays (.npz)
  plots/                      # output plots

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors