SciPost Submission Page
Full Event Particle-Level Unfolding with Variable-Length Latent Variational Diffusion
by Alexander Shmakov, Kevin Greif, Michael James Fenton, Aishik Ghosh, Pierre Baldi, Daniel Whiteson
This is not the latest submitted version.
Submission summary
Authors (as registered SciPost users): | Michael James Fenton · Kevin Greif |
Submission information | |
---|---|
Preprint Link: | https://arxiv.org/abs/2404.14332v1 (pdf) |
Date submitted: | 2024-04-24 20:16 |
Submitted by: | Fenton, Michael James |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Experimental, Computational |
Abstract
The measurements performed by particle physics experiments must account for the imperfect response of the detectors used to observe the interactions. One approach, unfolding, statistically adjusts the experimental data for detector effects. Recently, generative machine learning models have shown promise for performing unbinned unfolding in a high number of dimensions. However, all current generative approaches are limited to unfolding a fixed set of observables, making them unable to perform full-event unfolding in the variable dimensional environment of collider data. A novel modification to the variational latent diffusion model (VLD) approach to generative unfolding is presented, which allows for unfolding of high- and variable-dimensional feature spaces. The performance of this method is evaluated in the context of semi-leptonic top quark pair production at the Large Hadron Collider.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Reports on this Submission
Strengths
1. Innovative Approach: The paper presents a novel modification of the Variational Latent Diffusion (VLD) model, already considered by the authors in the context of unfolding, extending it to handle variable-dimensional feature spaces, which is a significant advancement in generative unfolding methods.
2. Comprehensive Evaluation: The performance of the method is thoroughly evaluated in the context of semi-leptonic top quark pair production at the LHC, providing clear and detailed results.
3. Addressing High-Dimensional Data: The method effectively addresses the challenge of unfolding unbinned distributions in high-dimensional spaces, which is a substantial improvement over traditional techniques.
4. Potential Impact: The proposed method pushes the edges of the performances of generative unfolding methods setting new standards for full-event unfolding.
Weaknesses
1. Sparse Data Regions: As acknowledged by the authors, the method shows some mis-modeling of kinematic distributions at the edges of the selection due to a lack of training samples. This highlights a limitation in handling extremely sparse regions of the data space, which is however an "irreducible" limitation.
2. Computational Complexity: The training and inference processes for the VLD model are computationally intensive, which might limit its practical applicability for extremely large datasets or for use in environments with limited computational resources.
3. Specificity of top pair production: While the method is tested on semi-leptonic top quark pair production, its performance on other processes or in different experimental setups is not explored, leaving , as expected, questions about its generalizability.
4. Iterative Prior Adjustment: The paper mentions that the dependence on the training set prior might be mitigated through an iterative method, but this approach is not demonstrated in the current work. Further exploration and validation of this iterative adjustment would strengthen the findings.
Report
The paper presents a significant advancement in the field of experimental particle physics by introducing a modified VLD model capable of full-event unfolding with variable-dimensional feature spaces. The methodological innovation is substantial, addressing key challenges in high-dimensional data unfolding and offering a generative approach that mitigates some limitations of discriminative methods.
The comprehensive evaluation on semi-leptonic top pair production at the LHC provides convincing evidence of the effectiveness of the method, although the results feature areas of the distributions where the model struggles, particularly in regions with sparse data. The computational demands of the model and its performance on other types of data are also areas that could benefit from further exploration.
The paper certainly meets the criteria for acceptance in this journal. It presents original and significant contributions to the field, demonstrates clear and thorough methodology, and discusses the results and limitations transparently. The clarity of the writing is excellent.
Requested changes
1. All metrics considered by the authors (reported in Ref. [18], and that it would be useful to report here) seem to be sums of 1D distances computed on the 1D marginal distributions corresponding to the features of interest. Such metrics are therefore not naturally expected to be sensitive to correlations among features. I suggest the authors to consider adding at least one metric with some expected sensitivity on the correlation. One option, which does not require too intensive calculations and that is still based on 1D distances, could be the Sliced Wasserstein Distance.
2. To better understand and visualize the performances beyond the 1D marginal distributions, on which evaluation metrics are computed, I would suggest the authors to add corner plots.
3. The code and dataset need to be made public (for instance on GitHub and Zenodo) to allow full reproducibility of the analysis and broader usage of the method.
Recommendation
Ask for minor revision
Strengths
- The method and architecture is well described and a lot of details are given
- It fills a gap in the literature: it is the first method to unfold full-event (variable dimensional) collider data
Weaknesses
- one of the main results of the paper, that the unfolding performance is independent of the training data distribution, could be explained / investigated in more detail
Report
The manuscript describes a novel machine learning architecture to unfold full particle physics events, i.e. variable dimensional collider data. Unfolding describes the inversion of the detector effects, allowing for long-term usage of experimental data without the need of expensive detector simulation. The manuscript is very well written and provides a lot of details on the architecture. In particular, I like the illustrative figure 1 and the many subsections in section 3 that describe all the individual elements of the model.
My biggest concern is about the observations in section 4.3 in combination with statement in section 6: "This lack of prior dependence strongly motivates the use of VLD for unfolding.". It is true that the model performs well on a different distribution as seen in training on the dataset considered here. But, it is not clear that this generalizes beyond this example. I understand that the authors cannot look at many processes within the scope of this manuscript, but I would at least like to see some explanation or investigation of why such a behavior would be expected. If the authors want to keep the sentence in the conclusion, they need to provide more explanantion / tests / examples to support the claim.
Apart from this, I have a few minor proposals that could enhance the manuscript, see the list below.
I therefore think that the manuscript should be published in SciPost Physics after the concerns have been addressed.
Requested changes
- One of the strengths of generative unfolding: event-by-event uncertainties, (since a given detector level input can be unfolded several times) could be mentioned in the introduction, since this an additional reason to use this method compared to discriminative methods.
- In section 3.1, is O_P_0 treated differently by the position-equivariant transformer with respect to the other O_P_i? Please explain.
- In section 3.3, why is y_0 needed? Please explain.
- In section 3.4, how is ensured that the ordering is constant over t? Or is that not a problem to be concerned of?
- In section 4.1, have both coordinate representations P^cart and P^Polar been used simultaneously (concatenated)?
- In sections 4.2 and 4.3, to better visualize the correlations between the observables and how well these are learned by the model, I suggest to add corner plots (for example to an appendix). Maybe this also explains the performance on down-stream observables a bit more.
- In sections 4.2 and 4.3, in addition to the metrics shown in the tables, I think it would be nice to see how well a neural classifier (see discussion in 2305.16774) would be able to distinguish unfolded from true events.
- In figures 5c and 7b (c), I suggest to zoom in on the bottom panel a bit.
- In the tables, I suggest to add a test of 'truth vs truth' to get a better feeling for the natural spread of the metrics. (i.e. is a distance of 0.04 a lot?)
- The authors refer a lot to reference 18, which is ok. However, it would be nice if the definitions of the metrics used in the tables could be replicated here as well and are not kept in Appendix C of Ref 18 only.
- In table 3, the errors of VAE and diffusion seem to add perfectly to the Unfolding error. Is that by construction or a non-trivial cross check on how the metrics are evaluated?
- Please make your code and training data available via git / zenodo / others.
Recommendation
Ask for minor revision
Author: Kevin Greif on 2024-10-18 [id 4874]
(in reply to Report 1 on 2024-06-07)
Dear reviewer,
Thank you very much for the detailed and helpful review. We would like to apologise for the delay in resubmission. Our lead author was away on internship for the summer. We hope the new version of the paper addresses your concerns, and we have some responses for you below.
Sincerely, Kevin for the team
"My biggest concern is about the observations in section 4.3 in combination with statement in section 6: "This lack of prior dependence strongly motivates the use of VLD for unfolding.". It is true that the model performs well on a different distribution as seen in training on the dataset considered here. But, it is not clear that this generalizes beyond this example. I understand that the authors cannot look at many processes within the scope of this manuscript, but I would at least like to see some explanation or investigation of why such a behavior would be expected. If the authors want to keep the sentence in the conclusion, they need to provide more explanantion / tests / examples to support the claim."
Response - We agree with the reviewers' conclusions here. Though we show good performance on one alternative distribution, there are many other possible distributions we could consider, so we should not make broad claims about the level of prior dependence of this method. We have removed the relevant statement from the conclusion.
"One of the strengths of generative unfolding: event-by-event uncertainties, (since a given detector level input can be unfolded several times) could be mentioned in the introduction, since this an additional reason to use this method compared to discriminative methods."
Response- We do believe that the ability to unfold a single event multiple times is an additional benefit of generative models. We have added a sentence to this regard in Section 4.2, along with a citation to a paper that demonstrates how use of generative models can improve the statistical power of particle physics datasets. However we are not sure that “event-by-event uncertainties” have an application in a physics analysis, so we elected to leave this detail out.
"In section 3.1, is O_P_0 treated differently by the position-equivariant transformer with respect to the other O_P_i? Please explain."
Response- The transformer is position-equivariant with respect to the inputs. This means that if the O_P were shuffled, the outputs x would also shuffle in the same way. It’s then only important that the vector describing the event level quantities is given a fixed position, in this case the first one. This way we know what w to apply the event predictor to after the particle decoder (see Figure 1).
"In section 3.3, why is y_0 needed? Please explain."
Response - Y_0 is a learned vector that is added to the set of observables to serve as a carrier for the multiplicity information. Since the multiplicity depends on all of the inputs, but is itself a single value, we need an additional vector that extracts contextual information and feeds it to the multiplicity predictor.
"In section 3.4, how is ensured that the ordering is constant over t? Or is that not a problem to be concerned of?"
Response - The ordering is only enforced during network training in the diffusion loss term. Since we are training, we can make use of the truth-level information, so the objects are ordered by truth pT which doesn’t change at high-levels of noise. During inference, everything is fully equivariant.
"In section 4.1, have both coordinate representations P^cart and P^Polar been used simultaneously (concatenated)?"
Response - Yes, for the input to the detector encoder the representations are concatenated and used together.
"In sections 4.2 and 4.3, to better visualize the correlations between the observables and how well these are learned by the model, I suggest to add corner plots (for example to an appendix). Maybe this also explains the performance on down-stream observables a bit more."
Response - We have added corner plots to the appendix and they are quite interesting to look at, but we do not notice any large difference between the predicted and true correlations. The only visible differences between the predicted and true plots is in the top quark mass distributions, but these differences are already captured in Figure 10. The shapes of the rest of the unfolded distributions are very accurate.
"In sections 4.2 and 4.3, in addition to the metrics shown in the tables, I think it would be nice to see how well a neural classifier (see discussion in 2305.16774) would be able to distinguish unfolded from true events."
Response - We expect that a neural classifier would be able to distinguish unfolded from true easily, for example using the kinematics of the top quarks in Section 4.2 which are mismodeled. We agree that achieving the level of precision where the unfolded events are not distinguishable from the true events is an important milestone, but we are confident we are currently not at this precision.
"In figures 5c and 7b (c), I suggest to zoom in on the bottom panel a bit."
Response - We thank the reviewer for the suggestion and rescaled the log ratio plots across the paper to fit better for the observables.
" In the tables, I suggest to add a test of 'truth vs truth' to get a better feeling for the natural spread of the metrics. (i.e. is a distance of 0.04 a lot?)"
Response - Any difference from 0 in the distance metrics between truth and itself would only result from statistical uncertainties. With the statistics we have, these uncertainties are small enough that the metrics evaluate to 0 within numerical precision. We have added a sentence stating that the statistical uncertainties in these distributions is much smaller than the uncertainties obtained from sampling the generative model many times, in Section 4.2.
"The authors refer a lot to reference 18, which is ok. However, it would be nice if the definitions of the metrics used in the tables could be replicated here as well and are not kept in Appendix C of Ref 18 only."
Response - We agree with this suggestion and have added the definition of the metrics to Appendix B. Note this is simply copy / pasted from the appendix of Ref 18.
"In table 3, the errors of VAE and diffusion seem to add perfectly to the Unfolding error. Is that by construction or a non-trivial cross check on how the metrics are evaluated?"
Response - This is a slightly non-trivial cross check. Since our particle-level decoder is unconditional it can’t move the unfolded distribution any closer to the truth. Ideally it would add zero distance, but it adds some in practice. The sum of this distance, and the distance evaluated in the VAE latent space, is the total distance within the statistical errors. If you look closely there are some places where the VAE and diffusion errors do not add perfectly to the total, due to the statistical errors (e.g. bottom row energy distance in Table 3).
"Please make your code and training data available via git / zenodo / others."
Response - The code is available at https://github.com/Alexanders101/LVD/tree/main and the data at https://zenodo.org/records/13364827. We have also added these links to the paper.
Author: Kevin Greif on 2024-10-18 [id 4875]
(in reply to Report 2 on 2024-07-01)Dear reviewer,
Thank you very much for the detailed and helpful review. We would like to apologise for the delay in resubmission. Our lead author was away on internship for the summer. We hope the new version of the paper addresses your concerns, and we have some responses for you below.
Sincerely, Kevin for the team
All metrics considered by the authors (reported in Ref. [18], and that it would be useful to report here) seem to be sums of 1D distances computed on the 1D marginal distributions corresponding to the features of interest. Such metrics are therefore not naturally expected to be sensitive to correlations among features. I suggest the authors to consider adding at least one metric with some expected sensitivity on the correlation. One option, which does not require too intensive calculations and that is still based on 1D distances, could be the Sliced Wasserstein Distance.
Response - The top-quark kinematics presented in Figure 10 are finely dependent on the correlations present between the directly predicted features (the kinematics of the jets, leptons, and the missing transverse momentum). They are more dependent on the correlations than something like the sliced wasserstein, since they have a quadratic dependence on the directly predicted features instead of linear. Therefore we think it is enough to look at the 1D marginal distances in these particular distributions, especially because we see limited performance in the top mass distributions.
To better understand and visualize the performances beyond the 1D marginal distributions, on which evaluation metrics are computed, I would suggest the authors to add corner plots.
Response - We have added corner plots to the appendix and they are quite interesting to look at, but we do not notice any large difference between the predicted and true correlations. The only visible differences between the predicted and true plots is in the top quark mass distributions, but these differences are already captured in Figure 10. The shapes of the rest of the unfolded distributions are very accurate.
The code and dataset need to be made public (for instance on GitHub and Zenodo) to allow full reproducibility of the analysis and broader usage of the method.
Response - The code is available at https://github.com/Alexanders101/LVD/tree/main and the data at https://zenodo.org/records/13364827. We have also added this information to the paper.