root --> h5 - In this step we would like to converte the filse from root to h5 format.
1. converter:
python converter.py -c config.yaml
the output file format will be output_JZ_?_.h5 - notice the second point on things you might need to modify
converter.py, config.ymal - under the folder preparing_samples
Things you might need to modify:
- the path of the different jetjet samples folders
- the output file name (for each JZ slice)
- n_jets - in the config file
Train, val, and test samples are ready (MC21 - after all stages), and can be found here:
/storage/agrp/dreyet/GNtau/samples/v04/output//eos/user/e/edreyer/GNtau/samples/v04/output/
2. merging:
python merge_ds.py -c merging.yaml
merge_ds.py, merging.yaml - under the folder preparing_samples
Things you might need to modify:
- output file name, in merging.yaml at the top of the page
- total_size: 40e6
- fraction of each JZ slice - notice that you might need to change the path of the JZ slices in this file
3. labeling
python label.py
label.py - under the folder preparing_samples
Things you might need to modify:
- line 173: datasets, you can choose between jet/tau samples.
- (if you created new tau samples you will need to do this stage twice and change the dataset)
- line 182: for tau samples, you can also modify total_jets number.
Congrats! now you have ready to go ntuples
You can use the UPP tutorial
Please add these 2 files to the folder:
tau-variables.yaml, tau.yaml - under the folder umami
Things that are not written and important to mention:
- notice that you activate UPP
- you should have a ntuples folder that should be seperated as follows:
- ntuples
- qcd - contains jetjet samples after labeling
- tau - contains tautau samples after labeling
- to create all train, val, and test files together you should run the following:
preprocess --config configs/tau.yaml --split all
Congrats! now you are ready to train
training is done by salt, the tutorial can be found in this folder under tutorial-salt
Do only once - Replace your predictionwriter.py with the file predictionwriter.py - under the folder salt
- You might need to run all or some of the following, you will see it the the tutorial :
- screen / submit a job
- conda activate salt
- python -m pip install -e .
- Go to the run directory and launch:
salt fit -c GN2TauA.yaml --force
A stands for all, you can modify the config file as you wish
GN2TauA.yaml, GNTau.ymal, GNTauJ.ymal, GNTauC.ymal - under the folder salt
Things you might need to modify:
- train, val, test, norm_dict, class_dict paths - to the output files of UPP
- write_tracks: True - this determines if we will have the output of the aux task or not (the tracks GNTau prediction)
- variables that we want to train on:
- Comment the variables
- Modify the numbers as necessary, for example if you want to change the track variables, you should change line 141, input_size: 51, to the total number of jets and the new track variables.
- number of epochs, according to Dmitrii for the latest training we overfitted, 20 epochs should have been enough.
Congrats! now you have a trained model
The evaluation part is also done by salt, you should run the following:
salt test --config logs/GN2Tau_20230726-T175600/config.yaml --data.test_file /storage/agrp/zivka/umami_Tau/UPP/umami-preprocessing/upp/output/pp_output_test.h5
config file: can be found in the folder of the training output under logs
data.test_file file: can be the path path of the test_file that was used during training, (UPP output)
Currently, all plots are made in notebooks:
1. ROC and the eff/rej plots
2. Confusion matrices for track classification
- RNN: confusion_matrix.ipynb
- GNTau: confusion_matrix_NOT_RNN.ipynb (if we have the output of the aux task
3. JZ slices
4. Variables distribution plots