Object Detection: The Viola-Jones Face Detector
Augusto Morgan
Institute of Computing - University of Campinas
augusto.morgan@students.ic.unicamp.br
June 9, 2014
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
1 / 22
Overview
Object Detection
Viola-Jones Face Detector
Haar-like features and the integral image
AdaBoost
Cascade of Weak Classifiers
Haar-like Features Extended Set
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
2 / 22
Object Detection
How can we detect objects in an image?
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
3 / 22
Object Detection
How can we detect objects in an image?
We can use a classifier:
Given an image, is it the object we are looking for or not?
But what if the images contains a lot of other objects?
We are interested in finding where in the image are the objects.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
3 / 22
Sliding Window
We can use the classifier in small portions of the image!
We slice the image in small subwindows and apply the classifier on each
one of them.
Problems?
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
4 / 22
Viola-Jones Real-Time Face Detector
Proposed in 2001 by Paul Viola and Michael Jones
It discards a great number of negative samples before applying too
much processing time on them, achieving high frame-rates
How does it achieve that?
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
5 / 22
Haar wavelet function
The classifier used in the paper is bases on Haar-like features.
Haar wavelet
function:
1 0 t < 21 ,
(t) = 1 12 < t 1,
0 otherwhise.
Figure: Haar wavelet
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
6 / 22
Haar-like Features
Rectangles representing a score based on positive areas and negative areas.
Three kind of features: 2, 3 and 4 rectangles.
Each feature is calculated by:
X
X
f (i) =
IWhite
IBlack
Figure: The different types of Haar-Like
Features
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
7 / 22
Haar-like Features
Rectangles representing a score based on positive areas and negative areas.
Three kind of features: 2, 3 and 4 rectangles.
Each feature is calculated by:
X
X
f (i) =
IWhite
IBlack
Problem: The number of
Haar-Like Features is too large!
For a 24x24 pixels window there
are more than 160,000 distinct
Haar-Like Features.
Note: this set is overcomplete.
Augusto Morgan (IC)
Figure: The different types of Haar-Like
Features
Viola-Jones Face Detector
June 9, 2014
7 / 22
The Integral Image
New intermediate representation of
the image, similar to the Summed
Area Table used in CG.
Each pixel (x,y) contains the sum of
the original pixels above and to the
left of (x,y), inclusive.
ii(x, y ) =
i(x 0 , y 0 )
x 0 <x
y 0 <y
It can be computed in one pass over
the original image.
Augusto Morgan (IC)
Viola-Jones Face Detector
Figure: The integral image
June 9, 2014
8 / 22
Features Calculation using the Integral Image
The sum of each rectangle can
be calculated using the integral
image in four array references.
Sum(R) = ii(A)ii(B)ii(D)+ii(C )
Figure: The sum of one
rectangle using the integral
image
Augusto Morgan (IC)
Each feature can then be
calculated in a few array
references.
Viola-Jones Face Detector
June 9, 2014
9 / 22
Advantages and Drawbacks
Rectangular Features are very simple and coarse.
However they are really fast!
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
10 / 22
Advantages and Drawbacks
Rectangular Features are very simple and coarse.
However they are really fast!
They can be calculated at different scales without the need to calculate a
Gaussian Pyramid and each level integral image, wich speeds up its use
with multiscale detection.
Every other feature strategy that need the Pyramid to be calculated for
multiscale runs slower than this approach.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
10 / 22
Training the Classifier
Given the features and the set of positive and negative examples, any
classifier can be trained.
There are, however, a huge number of features.
A very small number of features can be combined to create an effective
classifier.
How to find these features?
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
11 / 22
A weak classifier
The weak classifier used in the paper takes as input a sub-window (x) and
consists of a feature (f ), a threshold () and a polarity (p) indicating the
direction of the following inequality:
1 pf (x) < p,
h(x, f , , p) =
0 otherwhise.
The weak classifier used can be viewed as a single node decision tree, a
stump.
For each feature, an optimal threshold is associated, which is used to
minimize the number of missclassifications.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
12 / 22
AdaBoost
AdaBoost is used to boost the performance of a simple learning algorithm.
It combines weak classification functions, to create a more powerfull one.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
13 / 22
AdaBoost
AdaBoost is used to boost the performance of a simple learning algorithm.
It combines weak classification functions, to create a more powerfull one.
At each round the examples are re-weighted to emphasize those which
were incorrectly classified by the previous weak classifier.
The final strong classifier is a weighted combination of weak classifiers
followed by a threshold.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
13 / 22
AdaBoost
We can see the AdaBoost procedure as a greedy feature selection process:
AdaBoost is actually selecting a small set of good features.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
14 / 22
AdaBoost
We can see the AdaBoost procedure as a greedy feature selection process:
AdaBoost is actually selecting a small set of good features.
This way, the weak learning algorithm tries to select the single rectangle
that best separate the positive and negative examples.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
14 / 22
Training
Done in multiples rounds.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
15 / 22
Training
Done in multiples rounds.
All examples start with the same weight.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
15 / 22
Training
Done in multiples rounds.
All examples start with the same weight.
At each round it searches over a large set of features and thresholds,
choosing the feature/threshold that minimize the weighted error.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
15 / 22
Training
Done in multiples rounds.
All examples start with the same weight.
At each round it searches over a large set of features and thresholds,
choosing the feature/threshold that minimize the weighted error.
The examples wrongly classified have their weight changed and the process
is repeated.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
15 / 22
Considerations
Huge set of possible features and related thresholds (NK , where N is the
number of examples and K the number of features).
For 20000 samples and 160000 features (the number for the 24x24 pixels
subwindow) contains 3.2 billion distincts classifiers!
If using M rounds, AdaBoost takes O(MKN).
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
16 / 22
Considerations
Huge set of possible features and related thresholds (NK , where N is the
number of examples and K the number of features).
For 20000 samples and 160000 features (the number for the 24x24 pixels
subwindow) contains 3.2 billion distincts classifiers!
If using M rounds, AdaBoost takes O(MKN).
For each subwindow, all the classifiers are used and combined to get the
final answer.
What if we could eliminate subwindows earlier?
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
16 / 22
The Attentional Cascade
The insight is that smaller, and therefore more efficient, boosted classifiers
can be constructed which reject many of the negative sub-windows while
detecting almost all positive instances.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
17 / 22
The Attentional Cascade
The insight is that smaller, and therefore more efficient, boosted classifiers
can be constructed which reject many of the negative sub-windows while
detecting almost all positive instances.
This can be done by adjusting the threshold in the AdaBoost algorithm, to
minimize false-negatives.
Figure: The first features selected by AdaBoost
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
17 / 22
The Attentional Cascade
They achieved 100% Hit Rate, and 50% False Positive in the first 2
feature classifier.
Far from acceptable, but, with a few operations they can discard around
50% of the non-face sub-windows. And this is only the first classifier.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
18 / 22
The Attentional Cascade
They achieved 100% Hit Rate, and 50% False Positive in the first 2
feature classifier.
Far from acceptable, but, with a few operations they can discard around
50% of the non-face sub-windows. And this is only the first classifier.
A cascade of classifiers is built this way, with the positive output of each
one, activating the next one, using the more complex classifiers only in the
sub-windows that are more likely a face.
Since the great majority of sub-windows of an image are negative, the
cascade tries to eliminate as many sub-windows as possible at the earliest
stage possible.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
18 / 22
The Attentional Cascade
Figure: The Classifier Cascade
In the end, a post-processing step is taken to handle multiple-detections of
the same face, to have no duplicates.
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
19 / 22
Haar-like Features Extended Set
Proposed by Rainer Lienhart and Jochen Maydt in 2002.
Same principle, more variability.
Figure: The extended Haar-like feature set
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
20 / 22
Rotated Summed Area Table
Figure: The rotated integral image
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
21 / 22
References
Viola, P. and Jones. M., CVPR 2001, Rapid Object Detection using a
Boosted Cascade of Simple Features
Viola, P. and Jones. M., International Journal of Computer Vision
v. 57 2004, Robust Real-Time Face Detection.
Lienhart, R. and Maydt, J., IEEE ICIP 2002, An Extended Set of
Haar-like Features for Rapid Object Detection
Weisstein, Eric W. Haar Function. From MathWorldA Wolfram
Web Resource. http://mathworld.wolfram.com/HaarFunction.html
Augusto Morgan (IC)
Viola-Jones Face Detector
June 9, 2014
22 / 22