0% found this document useful (0 votes)
36 views91 pages

Lecture 11 14

Uploaded by

mohd.omama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views91 pages

Lecture 11 14

Uploaded by

mohd.omama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Trustworthy

Computer Vision?

Face
Recognition

Self-Driving Perception Safe Control Medical Diagnosis 2


Failure Mode I: Data Violate Assumptions
Assumption: Training data is a good representation of the testing

Training Model Testing

In the real world:

10/15/23 3
Failure Mode I: Data Violate Assumptions

Degraded Visual Environments (DVEs): low-resolution,


rain, low-light, haze …
• … cause degradations for visual understanding:
reduced contrasts, detail occlusions, abnormal
illumination, fainted surfaces and color shift...
• It is related to, but not just, image restoration
Failure Mode I: Data Violate Assumptions
Synthetic:
(Training)

Distribution
Real World:
Shift
(Testing)

5
Failure Mode II: Exploration into Unseen Domain

Exploration

State space Uncertain


it sees in
data

10/16/23 6
Key: Extrapolation and Model Confidence
Confidence in the If I fail, I should fail gently
new domain?

New Data Check constraint satisfaction


in the new domain
Source Data

Collect more data to help

10/16/23 7
Failure Mode III: Malicious Adversary
Failure Mode III: Malicious Adversary
Research Questions:
How to produce robust extrapolation under various
unexpected distribution shifts in computer vision?

We will go through many possible answers:


• Data-level
• Enhancing images
• Model-level:
• Uncertainty quantification
• Domain adaptation and generalization
• Adversarial defense

10/16/23 10
Visual Degradation
Data Acquisition

Degradation before Degradation in Degradation after


Data Acquisition Data Acquisition Data Acquisition
• Heavy Rain/Snow • Downsample • Scratches
• Underwater • Motion Blur • Watermark
• Low Light • System Noise • Mildew
• Haze/Sandstorm • Optical Distortion • Compression Loss
Restoration and Enhancement: Tons of Tasks

Underwater Dehazing Inpainting Super Resolution


Enhancement

Rain Removal Denoising Low Light Enhancement


Learning to Enhance Images
• Data-driven training of “end-to-end” models (usually assuming “pairs”)
• Prior/physical information can still be helpful

Big Data
Video
Surveillance

Feature + Feature
Representation Mapping

Low Quality Image/Video Data-Driven Solution High Quality Image/Video


Image Denoising
• Simplest Low-Level Vision Problem

• Noisy Measurement: 𝑦 =𝑥+𝑒

= +
Image Denoising
• Simplest Low-Level Vision Problem

• Estimate the clean image: & = 𝑓(𝑦)


𝒙

Magic
Denoising
Algorithm
Image Denoising – Conventional Methods
• Collaborative Filtering
• Non-local Mean, BM3D, etc
Image Denoising – Conventional Methods
• Collaborative Filtering
• Non-local Mean, BM3D, etc

• Piece-wise Smooth
• Total Variation, Tikhonov Regularization, etc
Image Denoising – Conventional Methods
• Collaborative Filtering
• Non-local Mean, BM3D, etc

• Piece-wise Smooth
• Total Variation, Tikhonov Regularization, etc

• Sparsity
• Discrete Cosine Transform (DCT), Wavelets, etc
• Dictionary Learning: KSVD, OMP, Lasso, etc
• Analysis KSVD, Transform Learning, etc
Conventional Deep Learning
• Shallow Model • Deep Model
• Equivalently one free layer • Multiple free layers
Conventional Deep Learning
• Shallow Model • Deep Model
• One free layer • Multiple free layers
• Unsupervised • Supervised


No training corpus needed
Data efficient


Training corpus needed
Data inefficient ?
Conventional Deep Learning
• Shallow Model • Deep Model
• One free layer • Multiple free layers
• Unsupervised • Supervised
• No training corpus needed • Training corpus needed
• Data efficient • Data inefficient
• Inverse Problem • Inverse Problem
• Assumption & Understanding • Little assumption
of the Data Almost free model
?

• Regularizer & structures of the • Few work until recent
Model
• Flexible
Image Denoising by Deep Learning
• Natural Idea: train a denoising autoencoder, that regresses clean images from noisy ones
• It is not easy for deep networks to outperform classical methods such as BM3D!!
• BM3D is shown to be better at dealing with self-repeating regular structures
• How to outperform BM3D using a deep network denoiser? Some verified tips:
• The model richness is large enough, i.e. enough hidden layers with sufficiently many hidden units.
• The patch size is chosen large enough, i.e. a patch contains enough information to fit a complicated
denoising function that covers the long tail.
• The chosen training set is large enough

• Other benefits of deep network denoiser:


• The testing speed of deep networks is much faster than BM3D, KSVD etc., benefiting from GPU.
• Deep networks can be generalized to other noise types, if correctly supplied in training.
• Recent works show great progress!
• Check out Git repo: https://github.com/wenbihan/reproducible-image-denoising-state-of-the-art
Image Denoising by Deep Learning
• Reference: “Image denoising: Can plain Neural Networks compete with BM3D?”
Image Deblurring

• Blurred Measurement: 𝑦 =𝑀⊗𝒙

= ⊗
Image Deblurring

• Estimate the stable image: & = 𝑓(𝑦)


𝒙

Magic
Deblurring
Algorithm
Image Deblurring
• Non-blind Image Deblurring

• Suppose you know the blurring kernel, 𝑴.

" = 𝑓(𝑦, 𝑀)
• 𝒙

• All training data need to have consistent 𝑴, as the testing data


Image Deblurring
• Non-blind Image Deblurring

• Suppose you know the blurring kernel, 𝑴.

" = 𝑓(𝑦, 𝑀)
• 𝒙

• All training data need to have consistent 𝑴, as the testing data

• Blind Image Deblurring – More challenging yet practical problem

• Estimate both the image, and the blurring kernel

• {"
𝒙, 𝑀} = 𝑓(𝑦)
Image Deblurring by Deep Learning
Reference: “Deep convolutional neural network for image deconvolution”
• Key Technical Features:
• Treat deblurring as a deconvolution task, and the deconvolution operation can be approximated by a convolutional network
with very large filter sizes
• Concatenation of deconvolution CNN module with another denoising CNN module to suppress artifacts and reject outliers
DeblurGAN V2 (2019)
Image Super-Resolution

• Low-Resolution Measurement: 𝑦 =𝐷∗𝑀⊗𝒙

=
Image Super-Resolution

• Estimate the stable image: & = 𝑓(𝑦)


𝒙

Magic
Super-Resolution
Algorithm
Image Super Resolution by Deep Learning
Reference: “Image super-resolution using deep convolutional networks”
• Key Technical Features:
• Learns an end-to-end mapping from low to high-resolution images as a deep CNN
• Closely mimic the traditional SR pipeline: LR feature extraction -> coupled LR-HR feature space mapping -> HR
image reconstruction
Image Super Resolution by Deep Learning
(2013 – 2017)
Super-resolution results of “148026” (B100) with scale factor ×3 (from VDSR paper)
New Trends?
• New topic: dehazing, deraining, low light enhancement, etc.

• New goal: human perception v.s. machine consumption

• New setting: from supervised to unsupervised training (no “GT”)


• … or relying on “synthetic pairs”

• New domain: medical images, infrared images, remote sensing images, etc.

• New concern: “All-in-one” adaptivity, efficient implementation, etc.


Shortage of Real-World Generalization
• Most SOTA algorithms are trained with {clean, corrupted} paired data
• Such paired training data is usually collected by synthesis (assuming known
degradation model), which typically oversimplifies the real-world degradations
• As a result, the trained model “overfits” simpler degradation process and
generalizes poorly to real visual degradations

• Real-world collection of paired data?


• Can be done in small scale and/or in controlled lab environments
• e.g. some recent datasets in light enhancement, and raindrop removal
• Very difficult to “scale up”, sometimes maybe impossible
EnlightenGAN: Deep Light Enhancement without Paired Supervision

Goal: Light enhancement made automatic, adaptive, and artifact-free


From Supervised to Unsupervised Enhancement
• EnlightenGAN is the first work that successfully introduces unpaired
training to low-light image enhancement.
• It only needs one low-light set A and another normal-light set B to train, while
A and B could consist of completely different images!

• What makes Unpaired Training unique and attractive?


• It removes the dependency on paired training data
• Hence enabling us to train with massive images from different domains
• It also avoids overfitting any specific data generation/imaging protocol
• …that previous works implicitly rely on, leading to stronger generalization.
• It makes EnlightenGAN particularly easy and flexible to be adapted
• when enhancing real-world low-light images from completely different/unseen domains
Model Architecture

Paper: https://arxiv.org/abs/1906.06972 (pre-print, full version in TIP 2021)


Code: https://github.com/VITA-Group/EnlightenGAN
Comparison with State-of-the-Arts
[New!] Frustratingly Easy Adaptation to New Data
PreProcessing for Improving Classification

• We applying our pretrained EnlightenGAN as a pre-processing step on the testing set of the ExDark
dataset , followed by passing through another ImageNet-pretrained ResNet-50 classifier.
• It improves the classification accuracy from 22.02% (top-1) and 39.46% (top-5), to 23.94% (top-1)
and 40.92% (top-5) after enhancement.
Uncertainty & Robustness for Out-of-Distribution Generalization
What do we mean by Uncertainty?

Return a distribution over predictions Y

rather than a single prediction.


X2

● Classification: Output label along with


its confidence. X1
● Regression: Output mean along with
its variance.
Y
Good uncertainty estimates quantify when we
can trust the model’s predictions.
X
Image credit: Eric Nalisnick
What do we mean by Out-of-Distribution Robustness?

I.I.D. pTEST(y,x) = pTRAIN(y,x)

O.O.D. pTEST(y,x) ≠ pTRAIN(y,x)


Examples of dataset shift:

● Covariate shift. Distribution of features p(x) changes and p(y|x) is fixed.


● Open-set recognition. New classes may appear at test time.
● Label shift. Distribution of labels p(y) changes and p(x|y) is fixed.
ImageNet-C: Varying Intensity for Dataset Shift

Increasing dataset shift

I.I.D test set

Image source: Benchmarking Neural Network Robustness to Common


Corruptions and Perturbations, Hendrycks & Dietterich, 2019.
Neural networks do not generalize under covariate shift

● Accuracy drops with


increasing shift on
Imagenet-C

● But do the models


know that they are
less accurate?

Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift?, Ovadia et al. 2019
Neural networks do not know when they don’t know

● Accuracy drops with


increasing shift on
Imagenet-C

● Quality of uncertainty
degrades with shift
-> “overconfident
mistakes”
Models assign high confidence predictions to OOD inputs

High uncertainty
(low confidence)

Low uncertainty
(high confidence)

Ideal behavior Deep neural networks

Trust model when x* is close to pTRAIN(x,y)

Image source: “Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness” Liu et al. 2020
Self-driving cars

Dataset shift:

● Time of day / Lighting


● Geographical location (City vs suburban)
● Changing conditions (Weather / Construction)
Daylight Night

Weather Construction
Image credit: Sun et al, Waymo Open Dataset Downtown Suburban
Open Set Recognition

● Example: Classification of genomic


sequences

● High accuracy on known classes is


not sufficient

● Need to be able to detect inputs


that do not belong to one of the
known classes

Image source: https://ai.googleblog.com/2019/12/improving-out-of-distribution-detection.html


Sources of uncertainty: Model uncertainty

● Many models can fit the training data well


● Also known as epistemic uncertainty
● Model uncertainty is “reducible”
○ Vanishes in the limit of infinite data (subject to
model identifiability)
● Models can be from same hypotheses class (e.g.
linear classifiers in top figure) or belong to different
hypotheses classes (bottom figure).
Sources of uncertainty: Data uncertainty

● Labeling noise (ex: human disagreement)


● Measurement noise (ex: imprecise tools)
● Missing data (ex: partially observed
features, unobserved confounders)

● Also known as aleatoric uncertainty


● Data uncertainty is “irreducible*”
○ Persists even in the limit of infinite data
○ *Could be reduced with additional
features/views
Image source: Battleday et al. 2019 “Improving machine
classification using human uncertainty measurements”
How do we measure the quality of uncertainty?

Calibration Error = |Confidence - Accuracy|

Of all the days where the model predicted rain with 80%
probability, what fraction did we observe rain?

● 80% implies perfect calibration


● Less than 80% implies model is overconfident
● Greater than 80% implies model is under-confident
How do we measure the quality of uncertainty?

Expected Calibration Error [Naeini+ 2015]:

● Bin the probabilities into B bins.


● Compute the within-bin accuracy and within-bin predicted confidence.
● Average the calibration error across bins (weighted by number of points in each bin).
How do we measure the quality of uncertainty?

Expected Calibration Error [Naeini+ 2015]:

Confidence >Accuracy

=> Overconfident
Confidence <Accuracy

=> Underconfident
Image source: Guo+ 2017 “On calibration of modern neural networks”
How do we measure the quality of uncertainty , practically?

Evaluate model on
out-of-distribution
(OOD) inputs which
do not belong to any
of the existing classes

● Max confidence
● Entropy of p(y|x)

CIFAR-10 (i.i.d test inputs) SVHN (o.o.d test inputs)


CIFAR-10
classifier

Confidence on i.i.d inputs > Confidence on o.o.d inputs ?


A Simple Baseline for Improving Uncertainty Calibration

Problem: results in just one prediction per example


*No model uncertainty*

How do we get uncertainty?


● Probabilistic approach
○ Estimate a full distribution for

● Intuitive approach: Ensembling


○ Obtain multiple good settings for
Image source: Ranganath+ 2016
Ensemble Learning

● A prior distribution often involves the complication of approximate inference.


● Ensemble learning offers an alternative strategy to aggregate the predictions over a
collection of models.
● Often winner of competitions!
● There are two considerations: the collection of models to ensemble; and the
aggregation strategy.

Popular approach is to average predictions of independently trained models, forming a


mixture distribution.

Many approaches exist: bagging, boosting, decision trees, stacking

[Dietterich 2000]
Simple Baseline: Deep Ensembles

Idea: Just re-run standard SGD training but


with different random seeds and average
the predictions

● A well-known trick for getting better


accuracy and Kaggle scores
● Beyond accuracy – it is good for
robustness and uncertainty too!!
● The mean of predictions is often
more accurate, and the variance of
those predictions reflects “confidence”
● We rely on the facts that the loss
landscape is non-convex and SGD
has noise
[Lakshminarayanan+ 2017]
Deep Ensembles work surprisingly well in practice
Are there
simpler options?

Deep Ensembles are consistently among the best performing methods, especially under dataset shift
An Old Friend Wears A New Hat: (Monte Carlo) Dropout!
Are there even
simpler options?

[Gal+ 2015]
Table source: Guo+ 2017 “On calibration of modern neural networks”

Softmax: Temperature re-


scaling (beat them all!):
How do we measure the quality of robustness, practically?
Measure generalization to a large collection of real-world shifts. A large collection of tasks
encourages general robustness to shifts (ex: GLUE for NLP).

● Novel textures in object recognition.


● Covariate shift (e.g. corruptions).
● Different sub-populations (e.g. geographical location).

Different renditions Nearby video frames Multiple objects and poses


(ImageNet-R) (ImageNet-Vid-Robust, YTBB-Robust) (ObjectNet)
Inductive Priors & Knowledge for Robustness

What about inductive biases to assist OOD? Image source: Dumoulin & Visin 2016

● Hypothesis: “Representations should be invariant with


respect to dataset shift.”
● Data augmentation extends the dataset in order to
encourage invariances.
● More examples: contrastive learning, equivariant
architectures.

Data augmentation requires two considerations:


1. Set of base augmentation operations. (Ex: color distortions, word substitution)
2. Combination strategy (Ex: Sequence of K randomly selected ops.)
Composing a set of base augmentations

Composing base operations and ‘mixing’ them can improve accuracy and calibration under shift.

[Hendrycks+ 2020]
AugMix improves robustness & calibration under shift

Data augmentation can provide complementary benefits to marginalization.

[Hendrycks+ 2020]
Synthetic Data: Towards Infinite Training Data Variations
Takeaways
● Uncertainty & robustness are critical problems in AI and machine learning.

● Benchmark models with calibration error and a large collection of OOD shifts.

● Probabilistic ML, ensemble learning, and optimization provide a foundation.

● The best methods: ensemble multiple predictions; imposing priors and inductive
biases; and “lower your temperature” when using softmax

● Synthetic data can remarkably help capture more variation

● Many future progress are expected – a key knob to make ML “real”


ML Predictions Are (Mostly) Accurate but Brittle
“pig” (91%) noise (NOT random) “airliner” (99%)

+ 0.005 x =

[Szegedy Zaremba Sutskever Bruna Erhan Goodfellow Fergus 2013]


[Biggio Corona Maiorca Nelson Srndic Laskov Giacinto Roli 2013]
But also: [Dalvi Domingos Mausam Sanghai Verma 2004][Lowd Meek 2005]
[Globerson Roweis 2006][Kolcz Teo 2009][Barreno Nelson Rubinstein Joseph Tygar 2010]
[Biggio Fumera Roli 2010][Biggio Fumera Roli 2014][Srndic Laskov 2013]
Three commandments of Secure/Safe ML
I. Thou shall not train on data you don’t fully trust
(because of data poisoning)

II. Thou shall not let anyone use your model (or observe its
outputs) unless you completely trust them
(because of model stealing and black box attacks)

III. Thou shall not fully trust the predictions of your model
(because of adversarial examples)
A Possible By-Product of ML Bias-Variance Trade-Off
-> Other Difficulties such as Robust Overfitting (ICML 2020) etc.
Adversarial Examples Beyond Pixel Perturbations …
Further Read: https://adversarial-ml-tutorial.org/

You might also like