Abstract: We present a formalism for estimating the expected change in the probability distribution of the predicted label of an object, with respect to all small perturbations to the object. We first derive analytically an estimate of the expected probability change as a function of the input noise. We then conduct three empirical studies: in the first study, experimental results on image classification show that the proposed measure can be used to distinguish the not-robust label predictions from those that are robust, even when they are all predicted with high confidence. The second study shows that the proposed robustness measure is almost always higher for the predictions on the corrupted images, compared to the predictions on the original versions of them. The final study shows that the proposed measure is lower for models when they are trained using adversarial training approaches.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. We revised the paper for clarity.
2. We expanded on the limitations section, stating that most open source software does not support computing the diagonal of the hessian matrix with respect to the input, and hence using LO for robustness is difficult for many of the state-of-the-art models.
Assigned Action Editor: ~Jinwoo_Shin1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1102
Loading