0% found this document useful (0 votes)
8 views82 pages

Biometrics Lecture Face 5

The document discusses biometric security with a focus on face recognition, highlighting challenges such as pose, illumination, expression variations, and aging. It details typical face recognition system steps, including face detection methods like the Viola-Jones algorithm and deep learning-based detectors like MTCNN, as well as feature extraction and matching techniques using various loss functions. Additionally, it explores advanced loss functions for deep learning-based face recognition, emphasizing the importance of metric learning and the use of angular margin techniques for improved accuracy.

Uploaded by

vidhanchavdaoo7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views82 pages

Biometrics Lecture Face 5

The document discusses biometric security with a focus on face recognition, highlighting challenges such as pose, illumination, expression variations, and aging. It details typical face recognition system steps, including face detection methods like the Viola-Jones algorithm and deep learning-based detectors like MTCNN, as well as feature extraction and matching techniques using various loss functions. Additionally, it explores advanced loss functions for deep learning-based face recognition, emphasizing the importance of metric learning and the use of angular margin techniques for improved accuracy.

Uploaded by

vidhanchavdaoo7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

Biometric Security

Face Recognition
Challenges

pose, Illumination,
expression (PIE)
variations

Aging: Longitudinal
changes

Occlusion
Typical FR system: steps involved
● Face detection and segmentation
● Feature extraction
● Matching

We will look at a popular handcrafted and learning based method for these steps
Face Detection
Goal: Detect and isolate the face region from an image.

Key Methods:

● Viola-Jones Face Detector:


○ Uses Haar-like features and AdaBoost for rapid face detection.
○ Viola, P., & Jones, M. (2001). "Rapid Object Detection Using a Boosted Cascade of
Simple Features."
● Deep Learning-Based Detectors:
○ Multi-task cascaded CNN (MTCNN). Reference: K. Zhang et al., Joint Face Detection
and Alignment using Multi-task Cascaded Convolutional Networks, 2016.
○ Models like RCNN, Faster R-CNN, more generally used for object detection.
Face detection: Viola Jones Algorithm
Haar cascades

Real-time face detector

1. Haar features
2. Integral image
3. Adaboost
4. Cascading

Viola, Paul, and Michael Jones. "Rapid object detection using a boosted cascade of simple features." Proceedings of the 2001 IEEE computer
society conference on computer vision and pattern recognition. CVPR 2001. Vol. 1. Ieee, 2001.
Face Detection Framework

Yes
Extract Binary
features f Classifier
No
What features work best?
Viola Jones face detector
● scans through the input image with detection
windows of different sizes and decides whether
each window contains a face or not.
○ classifier on features derived using multiple
Haar-like filters rectangular filters (Haar features):
■ two-rectangle, three-rectangle, and four-rectangle
filters
■ If combination of filter response exceeds a
detects the two eyes and the bridge of the nose
threshold, a face is said to have been detected.

filter

Detection window

+1 -1
Haar wavelet

Haar wavelet function

People call them Haar-like features, since similar to 2D Haar wavelets.


Detecting Faces of Different Sizes using Haar features
Compute Haar
features at different
scales to detect faces
of different sizes.
Haar features

VA =∑(pixel intensities in white area) - ∑(pixel intensities in black area)

features computed at different scales and orientations to detect faces at different scales and orientations.
Integral image
● pre-computing the integral image S from the original image.
(2D array for grayscale image)

● rapidly computed in a single pass


● sum of pixel values within the rectangle D can be calculated in
how many array accesses?

Four: S(x4,y4)-S(x2,y2)-S(x3,y3) + S(x1,y1)

Independent of the filter size


Constant time computataion O(1)
● Integral image does not reduce the computational complexity to
a level that is sufficient for realtime face detection.
Adaboost for feature selection
● Each round of boosting selects
one feature from the set of
potential features
○ T features are selected in
decreasing order of their
discriminative power.
○ The final classifier is a weighted
linear combination of the T best
weak classifiers
○ where the weights are proportional
to the discriminative power of the
features (inversely proportional to
the training error rate).
Adaboost
AdaBoost helps select the best features and combine weak classifiers into a
strong classifier.

AdaBoost assigns different weights to training samples and combines weak


classifiers into a strong classifier.
Adaboost for feature selection

For next round, reweight the


examples according to errors,
choose another filter/threshold
combo.
Adaboost intuition
Cascade classifier: attention on promising windows
● Cascade of boosted classifiers can be constructed which rejects the negative sub-windows early on
○ Small threshold set to avoid false negatives
○ successively more complex classifiers; dramatically increases the speed of the detector
○ Each stage in the cascade reduces the false positive rate and decreases the detection rate
Face detection results using Viola Jones
Viola JOnes: Merits
1. Real-time Performance
2. AdaBoost Feature Selection
3. Integral Image Representation
4. Cascade Classifier Structure
5. Robust to Illumination
Summary Demerits
1. High False Positive Rate
2. Limited to Frontal Faces
3. Rigid Detection Window Size
4. Not Rotation-Invariant
Multitask cascaded CNN (MTCNN)
● Three stages
○ P-Net (Proposal)
○ R-Net and (Refinement)
○ O-Net (Output)

● Three tasks
○ face/non-face classification,
○ bounding box regression
○ facial landmark localization

Zhang, Kaipeng, et al. "Joint face detection and alignment using multitask cascaded convolutional networks." IEEE signal processing letters 23.10 (2016):
1499-1503.
Loss functions at each stage for different tasks
○ face/non-face classification, binary cross entropy losjs

○ bounding box regression: Euclidean loss

○ facial landmark localization: Euclidean distance

○ Overall Loss

Task importance type indicator


Stages
Stage 1- Proposal network
● Fully convolutional network (FCN) - no FC layers
● used to obtain candidate windows and their bounding box regression vectors.
Stage 2: Refine network
● All candidates from the P-Net are fed into the Refine Network.
● The R-Net further reduces the number of candidates, performs calibration with bounding box
regression and employs non-maximum suppression (NMS) to merge overlapping candidates.
● The R-Net outputs whether the input is a face or not, a 4 element vector which is the
bounding box for the face, and a 10 element vector for facial landmark localization.
Stage 3: Output network
Output Network aims to describe the face in more detail and output the five facial landmarks’
positions for eyes, nose and mouth.
Training data
four different kinds of data annotation in our training process:
(i) Negatives: Regions that the Intersection-over-Union (IoU) ratio less than 0.3 to
any ground-truth faces;
(ii) Positives: IoU above 0.65 to a ground truth face;
(iii) Part faces: IoU between 0.4 and 0.65 to a ground truth face; and
(iv) Landmark faces: faces labeled 5 landmarks’ positions.
● Negatives and positives are used for face classification tasks,
● positives and part faces are used for bounding box regression, and
● landmark faces are used for facial landmark localization.
Result
Aside
Intersection over union
Feature extraction and Matching
(state-of-the-art techniques)
Face recognition
There are two ways in which deep learning based face recognition is done:

1. End -to-end classifier network


● Applicable in closed-set face recognition ONLY

2. Extraction of feature embeddings


● Applicable to both closed-set and open-set settings
Deep learning based loss functions
Softmax loss

Contrastive loss

Triplet loss

Center loss

Angular margin loss (SphereFace)

Large margin cosine loss (CosFace)

Additive angular margin softmax loss (ArcFace)


Development of loss functions
Softmax loss / cross-entropy loss function
Metric learning
● Many algorithms (e.g., k-NN, clustering, retrieval) rely heavily on a good distance metric. Instead of using predefined
distances like Euclidean or cosine, metric learning learns a distance function tailored to the data and task.
● Metric Learning aims to learn data embeddings/feature vectors in a way that reduces intra-class distance between feature
vectors and increases the inter-class distance between the feature vectors corresponding to different faces.

● Open-set FR is essentially a metric learning problem, where the key is to learn discriminative large-margin features
Euclidean distance based loss
● Euclidean-distance based loss is a metric learning method that embeds
images into a Euclidean space in which intra-class variance is reduced and
inter-class variance is enlarged.

● Euclidean-distance based Loss functions:


● Contrastive loss (pairs)
● Triplet loss (triplets)
Contrastive Loss
Contrastive loss
● Classic loss function for metric learning, most intuitive

Euclidean distance NN with parameters 𝜃 margin

labels Identity function: 1 if condition


Training samples in subscript true

Matching samples Non-Matching samples

● Distance between two samples of different classes is at least α (margin)

Chopra, Sumit, Raia Hadsell, and Yann LeCun. "Learning a similarity metric discriminatively, with application to face verification." IEEE computer
society conference on computer vision and pattern recognition (CVPR'05). Vol. 1. IEEE, 2005.
Example

Notice how the white dots that were outside weren’t moved farther away from the margin.
Triplet Loss
● Representation (triplet)
○ Anchor (image of a specific identity)
○ Positive → (other image of the same identity)
○ Negative → any image of any other identity

● Representation (triplet)
○ Anchor (image of a specific identity)
○ Positive → (other image of the
same identity)
○ Negative → any image of any other
identity
Triplet loss
● Triplet constraint

margin Set of all possible triplets in


training set, cardinalty N
● Loss function
Distance between (anchor,
negative) pair is higher than
distance between (anchor,
positive) pair by atleast α
(margin)
Embedding of
anchor Hinge loss, penalize if positive
Embedding of
Embedding of
positive
negative
Triplet Loss
● Embedding

where
● Intuition
○ Intra class distance should be low irrespective of imaging conditions
○ Inter class distance should be high

Weinberger, Kilian Q., and Lawrence K. Saul. "Distance metric learning for large margin nearest neighbor classification." Journal of machine learning research
10.2 (2009).
Triplet mining (selection)
● Hard positives
● Hard negatives

● To ensure fast convergence, crucial to select triplets that


violate the triplet constraint
○ If very easy samples - no training or very long time to
converge
○ Selecting hardest negatives can lead to bad local minima
early on in training => collapsed model (i.e f(x) = 0).

To mitigate this:

● Select semihard triplets: negatives that lie inside the margin α


○ triplets where the negative is farther from the anchor than the positive, but
still produces a positive loss
○ Usually done in a minibatch
Limitations of Triplet & Contrastive Loss

● Sampling Issues:
○ careful selection of samples for training to be effective, may lead to mode collapse or
overfitting if the samples are not chosen carefully.
○ time-consuming and computationally expensive
● Expansion Issues:
○ difficult to ensure that with larger numbers of classes, the same label will be pulled together to
a common region in space.
○ during mini-batch training, these loss functions can only enforce the structure locally for the
samples in the batch, not globally leading to suboptimal embeddings.
● Euclidean distances are less meaningful in high dimensions.
○ They fail to capture nonlinearity in data because they represent an isotropic distance metric.
○ These distances don't capture the class structure.
Center loss
One of the first successful attempts to solve first two issues

yith class centre of


deep features

Feature belonging to
yith class
What could be the issue with the regularizing term?

Wen, Yandong, et al. "A discriminative feature learning approach for deep face recognition." Computer Vision–ECCV 2016: 14th European
Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14. Springer International Publishing, 2016.
Center loss
● Calculation of centers is expensive.
○ For each batch, ahead in time class centers needs to be calculated.

● There’s still no guarantee that you will have a large inter-class variability since the
clusters closer to zero will benefit less from the regularization term.

● To make it “fair” for each class, enforce class centers to be at the same distance
from the center?
○ map it to a hypersphere!
Revisiting SOftmax

● When minimizing softmax loss, what exactly you are achieving?


○ For the ith sample, we are trying to increase the value of dot product between Wi and x
○ Learn features that are close in direction (small angle) to their Wi’s.
○ Separate classes in angular space — especially when you normalize weights or features
SphereFace (Angular Softmax or A-softmax)
● Features learnt by original softmax loss have an intrinsic angular distribution
○ Softmax loss does not enforce strictly angular margin as it optimizes magnitude of vectors Wi
and x as well as cosine of angle between Wi and x
● Sphereface takes advantage of this angular distribution by applying an
angular margin to separate features:
○ Instead of learning features in an unconstrained space, SphereFace normalizes feature vectors to lie on a
hypersphere.
○ This ensures that the classifier operates on the angles between features rather than their magnitudes, which
is more effective for face recognition.

Liu, Weiyang, et al. "Sphereface: Deep hypersphere embedding for face recognition." Proceedings of the IEEE conference on computer vision and pattern
recognition. 2017.
Comparison of angular margin among softmax loss, modified softmax loss and A-Softmax
loss.
Experiment:

● CNN learns 2-D features on a


subset of the CASIA face dataset.
● output dimension of FC1 layer as 2
and visualize the learned features.
● Yellow dots represent the first class
face features, while purple dots
represent the second class face
features.
● One can see that features learned
by the original softmax loss can not
be classified simply via angles,
while modified softmax loss can.
A-Softmax loss can further increase
the angular margin of learned
features
Modified softmax loss
● Modifications to softmax loss (Modified softmax loss)
○ Bias vector b = 0 for simplification
○ Normalize weights so that || Wj || =1
SphereFace (Angular Softmax or A-softmax)
● Modifications to softmax loss
○ Bias vector b = 0 for simplification
○ Normalize weights so that || Wi || =1
○ Margin parameter m, to enforce stricter decision boundary

● Key idea: Take one of these angles and magnify it. I.e. multiply it by number m
● Emphasizing that the angle between these two should be bigger than the other ones

Liu, Weiyang, et al. "Sphereface: Deep hypersphere embedding for face recognition." Proceedings of the IEEE conference on computer vision and pattern
recognition. 2017.
SphereFace (Angular Softmax or A-softmax)

● Integer m (m>=1) controls the size of angular margin and hence, decision
boundary
○ Binary case: ||x||(cos(m.theta1) - cos(theta2)) =0
Geometrical interpretation
● Geometrically, it means that we assign the sample to class j if the projection of the
logits vector z to the class center vector Wj is the largest, i.e. if the angle between
Wj and z is the smallest among all class center vectors.
Decision boundaries
In a softmax model with two classes, the probability of class i∈ {1,2} given input x:

The decision boundary is where the model is equally likely to predict class 1 or class 2, i.e.,

So the decision boundary is defined by the equation:


Decision boundary is hyperplane
perpendicular to w1−w2.

Geometrically, this hyperplane lies along


the angular bisector of the two vectors.
Decision boundaries
Visualization of features with different m
3D features projected on unit
hypersphere, 6 classes
Visualization of features with different m
Experiment:
● Trained a CNN using A-Softmax loss
● Data: 6 individuals that have the most samples in CASIA-WebFace.
● output feature dimension (FC1) as 3
● Observations:
○ larger m leads to more discriminative distribution on the sphere and also larger angular
margin.
○ We also use class 1 (blue) and class 2 (dark green) to construct positive and negative pairs to
evaluate the angle distribution of features from the same class and different classes. The
angle distribution of positive and negative pairs (the second row of Fig. 5) quantitatively shows
the angular margin becomes larger while m increases and every class also becomes more
distinct with each other.
CosFace/ Large margin cosine loss (LMCL)
● CosFace is a margin-based softmax loss function used primarily in face
recognition tasks.
○ It aims to enhance the discriminative power of learned face embeddings by maximizing the
decision margin in the angular (cosine) space.
● Modifications to softmax
○ Normalize Features and Weights
○ Scale the output
○ Add cosine margin m to target class
Scale s and margin m
● Both s and m are hyperparameters and need to be carefully tuned
● Typical range of s (scale factor)
○ If s is too small: logits are not separated enough leading to a poor gradient flow
○ If s is too large: logits can become
● Typical value of m : 0.35
CosFace/ Large margin cosine loss (LMCL)
Euclidean space

m should be less than 1

Angular space

A toy experiment of different loss functions on 8 identities with 2D features. The first row maps the
2D features onto the Euclidean space, while the second row projects the 2D features onto the
angular space. The gap becomes evident as the margin term m increases
ArcFace (Additive angular margin softmax)
● ArcFace (Deng et al. 2019) is very similar to CosFace
● However, instead of defining the margin in the cosine space, it defines the margin directly in the angle space.
● The setting is identical to CosFace, with the requirements that the last layer weights and feature vector should be
normalized, and last layer biases should be equal to zero

where s is the scaling parameter and m is referred to as the margin parameter.

Deng, Jiankang, et al. "Arcface: Additive angular margin loss for deep face recognition." Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition. 2019.
ArcFace (Additive angular margin softmax)
Handcrafted Feature Extraction
Local Binary Pattern
Local Binary Pattern (LBP) is a texture descriptor in computer vision that
encodes local texture information by comparing each pixel's intensity to its
neighbors, resulting in a binary pattern for each pixel
Similar
textures have
similar LBP
histograms
Local Binary Pattern
1. Choose a pixel in the grayscale image (often called the center pixel).
2. Compare this center pixel to its surrounding neighbors (usually 8 surrounding
pixels in a 3x3 grid).
3. For each neighbor:
a. If the neighbor's value is greater than or equal to the center pixel → assign a 1.
b. If it's less, assign a 0.
4. You'll end up with a binary pattern of 8 bits (one for each neighbor).
5. Convert this binary number to decimal, and that number becomes the LBP
value for the center pixel.
LBP Image Extraction
How to Map LBP to Uniform LBP
Let’s assume LBP(8,1) for simplicity.

Step 1: Generate all possible 8-bit binary patterns (0–255)

Step 2: Count the number of transitions (0→1 or 1→0) in the circular pattern

Step 3: If transitions ≤2, it's a uniform pattern → assign a unique label (0-57)

Step 4: Else → assign the "non-uniform" label (58)


Neighborhood in LBP

● Most basic: 3x3 grid → 8 neighbors around a center pixel.


● Generalized: P neighbors on a circle of radius R around the center
pixel.
○ Notation: LBP(P, R)
○ Example: LBP(8,1), LBP(16,2)

● For non-integer positions, interpolation is used to get the pixel


value.
Multiscale LBP
Local Binary Pattern (LBP) images encoded at different scales. From left to right:
original image; images encoded using.LBP8,1,.LBP8,3, and.LBP8,5 operators,
respectively
LBP for Face Recognition
In the training set there are k classes. For each class we have n
training images. In this example there are 40 classes with 9 images in
each class. Each image in partitioned into 16 cells. In each cell we
extract LBP features.
Calculating Distance and Matching
Principal Component Analysis
Principal Component Analysis
Principal Component Analysis
● By projecting the data onto the subspace E’ , the dimensionality of Eigen
coefficients becomes d’, which is smaller than the original data dimension d.
● Typically, it is possible to account for most of the variability in the data by
selecting only a few Eigenvectors corresponding to the largest
Eigenvalues in the descending order.
● To perform face matching, the Eigen coefficients ω1 and ω2 corresponding to
the gallery and probe face images, respectively, can be computed and the
Euclidean distance between the two Eigen coefficients can be considered as
a measure of dissimilarity between the two face images.

You might also like