NSBI Tooling

Toy pipeline for neural simulation-based inference (NSBI) using pyhf with signal morphing via Lagrange interpolation. Three basis signals are trained at fixed parameter nodes; classifiers are used to build likelihood-ratio estimates; CLs upper limits are computed comparing several approaches.

Quick start

bash run.sh

Pipeline

1. `generate_distributions.py` — Generate samples

Creates background, data, and basis signal samples (Gaussian mixtures, 5 features) and produces validation plots.

python generate_distributions.py [--n_bkg 1000000] [--n_sig 100000] [--nodes 0 5 10]

Output: dataframes/background.parquet, dataframes/data.parquet, dataframes/signal_{0,5,10}.parquet, plots/

2. `train_classifiers.py` — Train basis classifiers

Trains one binary classifier (signal vs background) per basis node using k-fold cross-validation. Models are saved per fold.

python train_classifiers.py [--n_epochs 50] [--n_folds 2]

Output: models/classifier_signal_{0,5,10}_fold{0,1}.pt, models/loss_*.pdf

3. `evaluate_classifiers.py` — Score events and build ratio templates

Evaluates each classifier on all samples using out-of-fold predictions. Computes per-event classifier scores, likelihood ratios, and per-event optimal observable scores for every signal point. Also builds nD ratio-space histograms for the sufficient statistic method.

python evaluate_classifiers.py [--n_folds 2] [--n_points 21] [--nodes 0 5 10]
                               [--n_bins_sufficient N] [--min_bkg 5.0]

Argument	Default	Meaning
`--n_bins_sufficient`	auto	Bins per dimension for nD ratio histograms; auto-determined from `--min_bkg` if not provided
`--min_bkg`	5.0	Minimum expected background yield per nD bin

Columns written per eval dataframe:

weight
score_{clf}, ratio_{clf} — basis classifier score and ratio s/(1-s)
score_opt_v{v:.2f}, ratio_opt_v{v:.2f} — per-event optimal score/ratio at each signal point

Output:

eval_dataframes/background_eval.parquet, eval_dataframes/data_eval.parquet, eval_dataframes/signal_{0,5,10}_eval.parquet
ratio_templates/background.npz, ratio_templates/data.npz, ratio_templates/signal_{0,5,10}.npz
plots/score_{clf}.pdf, plots/score_opt_v{v}.pdf

4. `calculate_limits.py` — Compute and plot CLs limits

Loads eval dataframes and the pre-built nD ratio histograms from ratio_templates/, writes pyhf workspaces, then computes 95% CL upper limits on the signal strength mu as a function of the signal parameter v using three methods.

python calculate_limits.py [--n_points 21] [--n_bins 10] [--mu_max 100] [--nodes 0 5 10]

Method	Description
Basis (per-classifier)	Lagrange-morph the 1D score histogram for each basis classifier
Optimal (per-parameter)	Use pre-computed per-event optimal scores from eval dataframes
Sufficient (nD Lagrange)	Single pyhf workspace in nD ratio space; `lagrange_morphing` modifier computes sum_k L_k(v) T_k at runtime; v is pinned per scan point

Output:

workspaces/score_basis_{clf}_{v}.json — one workspace per (classifier, signal point)
workspaces/optimal_per_parameter_{v}.json — one workspace per signal point
workspaces/sufficient.json — single nD morphing workspace
results/limits.npz — all computed limits
plots/limits_overview.pdf — expected medians for all methods
plots/limits_optimal.pdf, plots/limits_sufficient.pdf — Brazil band plots
plots/limits_basis_{clf}.pdf — Brazil band per basis classifier

Methods in detail

Per-parameter optimal observable

The optimal observable for distinguishing signal at parameter value v from background is the likelihood ratio:

r_opt(x; v) = p(x | signal, v) / p(x | bkg) = sum_k L_k(v) r_k(x)
s_opt(x; v) = r_opt / (1 + r_opt)

where L_k(v) are the Lagrange polynomial weights for the basis nodes and r_k(x) = s_k(x)/(1-s_k(x)) is the ratio from basis classifier k. Pre-computing s_opt per event and histogramming gives the exact optimal approach.

Suffiecent ratio workspace with Lagrange signal morphing

One pyhf workspace encodes all basis signal templates in the nD ratio space (r_0, r_1, ..., r_{n-1}). The custom lagrange_morphing modifier (registered in pyhf.modifiers) holds the N basis nD templates and at evaluation time computes:

signal_template(v) = sum_k L_k(v) T_k

The workspace is built once; limits for each v are obtained by pinning v via fixed_params.

Shared modules

File	Description
`utils.py`	Constants, Lagrange weights, MLP classifier, model loading/scoring, data loading, histogram helpers (quantile edges, ratio histograms, clip-and-renorm)
`limits_utils.py`	Argument parser, pyhf workspace builders (`_make_simple_spec`, `make_lagrange_ratio_spec`), histogram computation, all plotting functions (score distributions, limit overviews, Brazil bands, optimal observable)

Directory structure

nsbi_tooling/
  generate_distributions.py
  train_classifiers.py
  evaluate_classifiers.py
  calculate_limits.py
  utils.py
  limits_utils.py
  run.sh
  pyhf/                       # editable pyhf install (local fork)
    src/pyhf/modifiers/
      lagrange_morphing.py    # custom modifier: sum_k L_k(v) T_k
  dataframes/                 # raw generated samples
  models/                     # trained classifier weights
  eval_dataframes/            # per-event scores / ratios
  ratio_templates/            # nD ratio histograms (background + basis signals + data)
  workspaces/                 # pyhf workspace JSON files
  results/                    # saved limit arrays (.npz)
  plots/                      # output plots

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
pyhf @ f899bff		pyhf @ f899bff
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
calculate_limits.py		calculate_limits.py
evaluate_classifiers.py		evaluate_classifiers.py
generate_distributions.py		generate_distributions.py
limits_utils.py		limits_utils.py
requirements.txt		requirements.txt
run.sh		run.sh
train_classifiers.py		train_classifiers.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NSBI Tooling

Quick start

Pipeline

1. `generate_distributions.py` — Generate samples

2. `train_classifiers.py` — Train basis classifiers

3. `evaluate_classifiers.py` — Score events and build ratio templates

4. `calculate_limits.py` — Compute and plot CLs limits

Methods in detail

Per-parameter optimal observable

Suffiecent ratio workspace with Lagrange signal morphing

Shared modules

Directory structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NSBI Tooling

Quick start

Pipeline

1. generate_distributions.py — Generate samples

2. train_classifiers.py — Train basis classifiers

3. evaluate_classifiers.py — Score events and build ratio templates

4. calculate_limits.py — Compute and plot CLs limits

Methods in detail

Per-parameter optimal observable

Suffiecent ratio workspace with Lagrange signal morphing

Shared modules

Directory structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `generate_distributions.py` — Generate samples

2. `train_classifiers.py` — Train basis classifiers

3. `evaluate_classifiers.py` — Score events and build ratio templates

4. `calculate_limits.py` — Compute and plot CLs limits

Packages