Face Recognition Challenge 2017
Face Recognition Challenge 2017
Patrick Grother
Mei Ngan
Kayee Hanaoka
Information Technology Laboratory, NIST
Chris Boehnen
Intelligence Advanced Research Projects Activity (IARPA)
Lars Ericson
Science Applications International Corporation (SAIC)
November 2017
NISTIR 8197
Patrick Grother
Mei Ngan
Kayee Hanaoka
Information Technology Laboratory, NIST
Chris Boehnen
Intelligence Advanced Research Projects Activity (IARPA)
Lars Ericson
Science Applications International Corporation (SAIC)
November 2017
ACKNOWLEDGEMENTS
The authors would like to thank the Intelligence Advanced Research Projects Activity for supporting this
work, and for administering the $50 000 prize fund.
We are grateful to staff at Noblis for their development and curation of imagery used in this study.
Similarly we thank the staff of SAIC for collection of imagery used in this study. We thank the DHS S&T
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8197
Homeland Security Advanced Research Agency Air Entry/Exit Re-engineering (AEER) Directorate for their
support of that work.
DISCLAIMER
Specific hardware and software products identified in this report were used in order to perform the evalua-
tions described in this document. In no case does identification of any commercial product, trade name, or
vendor, imply recommendation or endorsement by the National Institute of Standards and Technology, nor
does it imply that the products and equipment identified are necessarily the best available for the purpose.
EXECUTIVE SUMMARY
. Overview: This report documents NIST’s execution of the Intelligence Advanced Research Projects Activity (IARPA)
Face Recognition Prize Challenge (FRPC) 2017. The (FRPC) was conducted to assess the capability of contemporary face
recognition algorithms to recognize faces in photographs collected without tight quality constraints, e.g. non-ideal im-
ages collected from individuals who are unaware of, and not cooperating with, the collection. Such images are charac-
terized by variations in head orientation, facial expression, illumination, and also occlusion and reduced resolution.
. Background: Face recognition has recently been revolutionized by the availability of advanced machine learning algo-
rithms, free software implementations thereof, fast processors, vast web-scraped ground-truthed face image databases,
open performance benchmarks, and a vibrant academic literature for both machine learning and face recognition.
The new convolutional neural network technologies have largely been developed to exhibit invariance to the pose, illu-
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8197
mination and expression variations characteristic in photojournalism and social media images. The initial research [7, 9]
employed large numbers of images of relatively few (O(104 )) individuals to learn invariance. Inevitably much larger
populations (O(107 )) were employed for training [8] but the benchmark remained verification at very high false match
rates - LFW with an EER metric [3]. A large scale identification benchmark duly followed [6] yet its primary metric, rank
one hit rate, contrasts with the high threshold discrimination task required in large-population governmental applica-
tions of face recognition, namely credential de-duplication, law enforcement and intelligence searches. There, only one
or two images of at least O(107 ) individuals must be recognized with very low false positive identification rates. The
FRPC was conducted with both a large population (O(106 )) and low false positive rate metrics.
From a field of 16 commercial and academic entries, the FRPC awarded prizes in three categories.
.. Verification accuracy: The $20 000 prize is awarded to the algorithm that can most accurately verify the identity of
faces appearing in photojournalism images. The verification task is the fundamental biometric operation - to determine
whether two images are of the same face or not. The award criterion is to produce the lowest false non-match rate FNMR
at a false match rate FMR of 0.001. The winner of this prize is NTechLab, which achieved FNMR = 0.22 well ahead of
second place developer Yitu Technology.
.. Identification accuracy: The $25 000 prize is awarded to the algorithm that can most accurately retrieve a face
cropped from a video frame when searching a gallery composed of N = 691282 faces from cooperative portrait photos,
while simultaneously producing a false positive outcome in only 1 in 1000 searches, i.e. to produce the lowest false
negative identification rate (FNIR) at a false positive identification rate (FPIR) of 0.001. The winner of this prize is Yitu
Technology whose algorithm produces superior FNIR values at the lower false positives rates required in one-to-many
applications for which many searches do not have a corresponding enrolled entry.
.. Identification speed: A $5 000 prize is awarded to the algorithm that executes a one-to-many search in the shortest
possible time, and still has high accuracy. The formal criterion is to produce the lowest median duration when executing
searches while also producing FNIR less than two times that of the most accurate algorithm. The winner is NTechLab,
one of whose algorithms returns candidates from a gallery of N = 691282 identities, in just 590 ± 50 microseconds. This is
achieved using one process running on a single core of a conventional c. 2016 server-class CPU. This is accompanied by
sub-linear search time, such that a 30-fold increase in the gallery size N only incurs a 3-fold increase in search duration.
This is achieved, however, using a proprietary fast-search data structure that takes almost 11 hours to build from N =
691282 input templates.
Readers might also consider reports from NIST’s Face Recognition Vendor Test (FRVT) which remains open to new algorithm devel-
opers. Comments and questions on FRPC and FRVT should be directed to frpc@nist.gov and frvt@nist.gov, respectively.
Contents
A CKNOWLEDGEMENTS 1
D ISCLAIMER 1
E XECUTIVE S UMMARY 2
List of Tables
1 VERIFICATION ALGORITHM SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 IDENTIFICATION ALGORITHM SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
List of Figures
1 V ERIFICATION IMAGE EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 VERIFICATION PERFORMANCE : FNMR VS . FMR TRADEOFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 BOARDING GATE VIDEO CLIP EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 PASSENGER LOADING BRIDGE VIDEO CLIP EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 ENROLLMENT IMAGE EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6 IDENTIFICATION ACCURACY AT GATE : FNIR VS . POPULATION SIZE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7 IDENTIFICATION ACCURACY ON CONCOURSE : FNIR VS . POPULATION SIZE . . . . . . . . . . . . . . . . . . . . . . . . . 12
8 IDENTIFICATION ACCURACY ON CONCOURSE : FNIR VS . POPULATION SIZE . . . . . . . . . . . . . . . . . . . . . . . . . 13
9 IDENTIFICATION ACCURACY AT GATE : FNIR VS . RANK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
10 IDENTIFICATION ACCURACY AT GATE : FNIR VS . FPIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
11 TIMING PERFORMANCE : DURATION VS . ENROLLED POPULATION SIZE . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
12 OFF ANGLE SEARCH EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
13 PERFORMANCE SUMMARY: FNMR VS . YAW ANGLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
14 PERFORMANCE SUMMARY: FMR VS . YAW ANGLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
15 PERFORMANCE SUMMARY: TPIR VS . PITCH ANGLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
16 PERFORMANCE SUMMARY: TPIR VS . YAW ANGLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
17 PERFORMANCE SUMMARY: FNMR VS . YAW ANGLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Background: Verification is perhaps the most common application of biometrics, being widely deployed in applications
such as access control and authentication. While such uses usually involve cooperative subjects, the FRPC includes a
verification task using non-cooperative and unconstrained photojournalism imagery because one-to-one comparison of
single images present the simplest way to assess core algorithmic efficacy.
5 8 11 12
6 HB Innovation 273006 520 ± 0 298 ± 19 No 5283 ± 484 4888 ± 71
11 15 6 7
7 Imperial College London 274821 2048 ± 0 1367 ± 10 Yes 1911 ± 51 1888 ± 42
4 4 10 11
8 Innovatrics 0 276 ± 0 152 ± 12 Yes 4002 ± 77 3665 ± 126
6 7 7 9
9 Morpho 794266 788 ± 0 254 ± 5 Yes 3112 ± 63 3171 ± 126
15 16 16 16
10 Neurotechnology 413202 4780 ± 0 1560 ± 44 No 73520 ± 1921 72674 ± 1429
16 13 15 15
11 NTechLab 657997 4825 ± 1 943 ± 16 No 55004 ± 80 55042 ± 93
1 2 13 1
12 Rank One 0 144 ± 0 37 ± 1 No 30366 ± 177 307 ± 41
7 3 9 10
13 Smilart UG 107947 1024 ± 0 58 ± 0 Yes 3443 ± 66 3442 ± 69
2 12 5 6
14 VisionLabs 343661 204 ± 0 943 ± 8 No 1013 ± 40 1030 ± 34
10 5 8 8
15 Vocord 918293 1280 ± 0 195 ± 0 Yes 3271 ± 94 2413 ± 99
14 10 14 14
16 Yitu 2226850 4136 ± 0 703 ± 1 No 33991 ± 62 34048 ± 134
Notes
1 The size of configuration data does not capture static data included in the libraries. We do not include the size of the libraries because
some algorithms include common ancilliary libraries for image processing (e.g. openCV) or numerical computation (e.g. blas).
2 The median template creation times are measured on Intel
Xeon
R
CPU
R E5-2630 v4 @ 2.20GHz processors or, in the case of GPU-enabled
implementations, NVidia Tesla K40m equipped with 12GB of memory.
3 The median comparison durations, in nanoseconds, are estimated using std::chrono::high resolution clock which on the machine in (2)
counts clock ticks of duration 1 nanosecond. Precision is somewhat worse than that however. The ± value is the median absolute
deviation times 1.48 to give consistency with 1σ of a Normal distribution.
Table 1: Summary of 1:1 verifications algorithms evaluated in this report. The red superscripts give ranking for the quantity in that
column.
Participation: The participants electing to submit algorithms to the FRPC verification track are listed in Table 1.
Images: The photojournalism set uses 141331 faces images of 3548 adults. The images are closely cropped from the
parent images as shown in Figure 1. The images are primarily collected by professional photographers and as such are
captured, and selected, to not exhibit exposure and focus problems. All of the images are live capture, none are scanned.
Resolution varies widely as these images were posted to the internet with varying resampling and compression practices.
The primary difficulties for face recognition is unconstrained yaw and pitch pose variation, with some images extending
to profile view. Additionally faces can be occluded, including by hair and hands.
The images are cropped prior to passing them to the algorithm. The cropping is done per human-annotated rectangular
bounding boxes. The algorithm must further localize the face and extract features, returning a recognition template.
The templates from the images are used in NG = 7 846 208 genuine and NI = 39 942 674 impostor comparisons. The
impostor trials are zero-effort, meaning any template is compared with any other template - no effort is made to pair on
such variables as sex, age, race or appearance. While zero-effort impostors are easier to correctly reject, the technique is
ubiquitous when assessing core recognition accuracy.
Accuracy metrics: Scores from the genuine comparisons are used in the false non-match rate (FNMR) computation, which
states the proportion of genuine scores below threshold, T :
NG
1 X
FNMR(T ) = 1 − H(si − T ) (1)
NG i=1
Figure 1: Examples of “in the wild” photojournalism stills. The top row gives the full original images; the second row gives the
manually specified face region that is cropped and passed to the algorithms. The source images in this figure are published on the
internet under Creative Commons licenses.
where the step function H(x) is 1 if x ≥ 0 and 0 otherwise. In cases where an algorithm fails to produce a template from
an input image - the so-called failure to enroll outcome - the FNMR computation proceeds by assigning a low score, −∞,
to any comparison involving that template. This simulates false rejection of a user.
Scores from the impostor comparisons are used in the false match rate (FMR) computation, which states the proportion
of impostor scores at or above T :
NI
1 X
FMR(T ) = H(si − T ) (2)
NI i=1
In cases where an algorithm fails to produce a template from an input image, a low score is again assigned as the result
of any comparison involving that template. This practice actually benefits (reduces) FMR.
Figure of Merit: The prize is awarded to the algorithm that achieves the lowest false non-match rate at a threshold set to
achieve a false match rate of 0.001. This is the most common way to state recognition accuracy, and it serves as a simple
way to compare core algorithm recognition capability.
Prize winner: By consulting Figure 2, the most accurate verification algorithm on this dataset is developed by NTechLab
http://ntechlab.com/.
Discussion: The NTechLab algorithm gives FNMR = 0.22 with FMR = 0.001. This FNMR would be intolerably high for
an access control application, but is achieved with images of non-cooperating subjects that have very few of the image
quality constraints that are engineered into, for example, border crossing gates. In particular, as discussed later in section
4, the winning algorithm here has superior capability at recognizing individuals whose head orientations vary widely.
0.90
ibug
cybe
0.80
ayon
0.70
2017/11/22 10:03:52
digi
smil
0.60
morp
visi
Dataset: WILD
0.50 FNMR @ FMR=0.001
inno
3div and Algorithm
0.89 ibug_0_gpu
yitu rank
0.86 vocord_0_gpu
0.40
0.83 cyberextruder_0_cpu
FRPC
-
0.72 neurotechnology_0_cpu
0.30
0.68 smilart_0_gpu
FPIR(N, T)
FNIR(N, R, T)
deep 0.67 digitalbarriers_0_cpu
0.60 rankone_0_cpu
0.56 innovatrics_0_gpu
0.50 hbinno_0_cpu
0.35 yitu_0_cpu
FACE RECOGNITION PRIZE CHALLENGE
0.22 ntechlab_0_cpu
FMR(T)
FNMR(T)
0.10
VERIFICATION
0.09
1e−05 1e−04 1e−03 1e−02 1e−01 1e+00
False match rate (FMR)
FRPC
22 24 21 21 21 23 23
17 Neurotechnology 1 No 413202 4780 ± 0 1391 ± 76 25648 ± 75 99376 ± 281 539105 ± 853 1651293 ± 2866 6471346 ± 19941
24 16 16 16 16 16 16
18 NTechLab 0 No 875851 5784 ± 1 626 ± 21 7912 ± 38 24932 ± 98 91830 ± 418 192315 ± 868 433640 ± 1840
-
9 11 1 1 1 1 1
19 NTechLab 1 No 288973 987 ± 0 361 ± 21 208 ± 13 344 ± 47 508 ± 56 558 ± 50 592 ± 51
3 1 5 8 5 5 5
20 Rank One 0 No 0 144 ± 0 70 ± 26 804 ± 298 5729 ± 1214 12165 ± 4110 17908 ± 5222 32512 ± 9976
FPIR(N, T)
12 9 19 19 19 19 19
21 Vocord 0 Yes 918293 1280 ± 0 191 ± 4 21738 ± 20 66094 ± 61 219715 ± 120 438762 ± 226 947782 ± 467
8 3 18 18 18 18 18
22 Vocord 1 Yes 1089798 896 ± 0 78 ± 5 15224 ± 13 46034 ± 42 153935 ± 128 307279 ± 217 664020 ± 388
19 18 20 20 20 20 20
23 Yitu 0 No 2226850 4136 ± 0 844 ± 18 25641 ± 40 77823 ± 257 286455 ± 9072 1071714 ± 3395 1320849 ± 17367
16 12 2 2 2 2 2
24 Yitu 1 No 2262178 2260 ± 0 436 ± 11 270 ± 57 506 ± 128 2144 ± 622 5006 ± 1558 11885 ± 3663
“False positive identification rate”
“False negative identification rate”
Notes
1 The size of configuration data does not capture static data included in the libraries. We do not include the size of the libraries because
some algorithms include common ancilliary libraries for image processing (e.g. openCV) or numerical computation (e.g. blas).
2 The median template creation times are measured on Intel
Xeon
R
CPU
R E5-2630 v4 @ 2.20GHz processors or, in the case of GPU-enabled
implementations, NVidia Tesla K40m equipped with 12GB of memory.
3 The median impostor search durations, in milliseconds, are estimated using std::chrono::high resolution clock which on the machine in
(2) counts clock ticks of duration 1 nanosecond. Precision is somewhat worse than that however. The ± value is the median absolute
deviation times 1.48 to give consistency with 1σ of a Normal distribution.
4 Four entries appear for 3DiVi who NIST asked to submit a CPU variant of their main GPU submission. This allowed NIST to expedite
testing. The report includes timing results for both CPU and GPU variants. Accuracy numbers are included only once, as accuracy is
identical for CPU and GPU implementations.
Table 2: Summary of 1:N identification algorithms evaluated in this report. The red superscripts give ranking for the quantity in that column.
FNMR(T)
VERIFICATION
FMR(T)
“False match rate”
“False non-match rate”
7
FRPC - FACE RECOGNITION PRIZE CHALLENGE 8
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8197
Figure 3: Enrollment (left) and non-cooperative video-frame search examples from a boarding gate process. The algorithm received
the enrollment image as is, and faces cropped from the video search frames. The images are from subject 79195746 in the DHS/ S&T
AEER dataset. He consented to release of his images in public reports. For those individuals who did not consent to publication, their
faces were masked (yellow circles).
Background: This section documents the one-to-many identification experiments performed under FRPC. Generically,
one-to-many biometric identification is more difficult than one-to-one verification because a search of an N person
database must either correctly reject either N − 1 or N identities depending, respectively, on whether the search has
a mated enrollment or not. Given its difficulty, and implied computational expense, identification algorithms are by far
the largest revenue segment of the face recognition marketplace.
Participation: The participants electing to submit algorithms to the FRPC identification track are listed in Table 2.
Experimental design: The identification experiments proceeds by searching non-cooperative face images against enroll-
ment galleries built from cooperative portrait images.
. Enrolled portraits: The portrait images are either visa images, mugshot images, or dedicated portraits collected
from test subjects. These were collected typically using an SLR camera, ample two point light, and a standard uni-
form grey background. We defined five galleries containing, respectively, N = {16000, 48000, 160000, 320000, 691282}
images and people, i.e. exactly one image per person. These galleries include 825 portraits of the people who ap-
Figure 4: Example images from the ceiling mounted camera for the free movement scenarios from videos collected on an aircraft
boarding ramp. The images in this table are from the subject S1115 in the DHS / S&T provided AEER dataset. The subject gave
written opt-in permission to allow public release of all imagery. Where consent from individuals in the background was not obtained,
their faces were masked (yellow circle).
Subject S1155 (Perm Granted) Subject S2880 (Perm Granted) Subject S1848 (Perm Granted)
Figure 5: Examples of enrollment images collected with an SLR camera. The face images in this figure are from the DHS / S&T
provided AEER dataset. The included subjects consented to release their images in public reports.
pear in the mated search sets described next. Examples of the portraits appear in section 5.
. Mated search images: The non-cooperative face images are faces cropped from video clips collected in surveillance
settings. Examples of the cropped faces and the parent video frames are shown in Figures 4 and 3
. Non-mated search images: A separate set of NI = 79403 faces cropped from video that are known not to contain
any of the enrolled identities are used to estimate false positive accuracy.
Accuracy metrics: Scores from the mated searches are used in the false negative identification rate (FNIR) computation.
FNIR is defined as the number of mated searches which fail to produce the enrolled mate in the top R ranks with score
above threshold, T . FNIR is therefore known as a miss rate. It’s value will generally increase with the size of the enrolled
database, N , because the recognition algorithm is tasked with assigning a low score to all N − 1 non-mated enrollments.
Thus for each of M mated searches the algorithm returns 1 ≤ r ≤ L candidates with hypothesized identities and
similarity scores. If the identity of the search face is IDi and that of the r-th candidate is IDr then
M R
1 XX
FNIR(N, R, T ) = 1 − H(sir − T ) δ(IDi , IDr ) (3)
M i=1 r=1
where sir is the r-th highest score from the i-th search, the step function H(x) is 1 if x ≥ 0 and 0 otherwise, and the
function δ(x, y) is 1 if x = y, and 0 otherwise.
In cases where an algorithm fails to produce a template from an input image - the so-called failure to enroll outcome -
the FNIR computation proceeds by assigning a low score, −∞, and high rank, L + 1, This simulates a miss.
Scores from the non-mated searches are used in the false positive identification rate (FPIR) computation, which states the
proportion of non-mate searches yielding any candidates at or above a threshold T :
N
1 X
FPIR(T ) = H(si − T ) (4)
NI i=1
In cases where an algorithm fails to produce a template from an input image, a low score is again assigned as the result
of any comparison involving that template. This practice actually benefits (reduces) FPIR.
Figure of Merit: The prize is awarded to the algorithm that achieves the lowest FNIR when the threshold is set to
produce FPIR at or below 0.001. This was determined using N = 691,282, and probes from the travel concourse dataset.
This criterion differs substantially from many benchmarks and academic studies which try to maximize “rank one hit
rate”, i.e. to minimize FNIR(N, 10). The criterion here, instead, seeks to minimize FNIR(N, L, T ) by demanding that
mated candidates exceed a score threshold that is adopted to minimize false positives. Use of a high threshold is an
imperative in the many operations which feature high search volumes and a low prior probability that the search is
mated. An example, is a casino “watch list” surveillance application in which card sharps are a small minority of the
customer base.
Prize winner: By consulting Figure 7, the most accurate identification algorithm on this dataset is developed by Yitu.
Using probes from the travel concourse dataset to search in a dataset of N = 691282 portraits, the first Yitu algorithm
gives FNIR = 0.204 with FPIR = 0.001.
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8197
Discussion: The Yitu algorithms would win this prize at all tested gallery sizes. The algorithms, however, would not
win had the figure-of-merit been a zero-threshold, rank-based metric. As can be seen in Figure 8 the first NTechLab
algorithm gives lowest FNIR(N, R, 0) for R = 1 and all N values, i.e. the NTechLab algorithm places more correct mates
at rank 1 but does not do so with a score high enough to survive a threshold. Figure 10 shows error tradeoff characteristic
of NTechLab is superior to Yitu at very low thresholds (high FPIR), but Yitu has a flatter response and quickly dominates
all other algorithms for FPIR below about 0.88. Whether this would be sustained at very low values of FPIR, for example
below 0.0001, is unknown given the limited test size NI = 79403.
● ●
● ● ●
●
● ●
● ●
● ● ●
0.9 ● ●
● ●
●
● ●
● ●
● ●
● ●
● ● ●
● ●
● ● ●
0.8 ● ●
● ●
●
● ● ●
●
2017/11/22 10:03:52
Dataset: Boarding
●
0.7 ● ● FNIR(N=691282, FPIR=0.001)
● and Algorithm
● ●
● ● ●
● ● 1.00 vocord_0_gpu
0.6 ● ●
● ● 1.00 innovatrics_0_cpu
●
●
● ● ● 0.99 innovatrics_1_gpu
● ●
● ● ● 0.94 neurotechnology_0_cpu
0.5 ● ● ●
● ●
● ● ● ● 0.94 neurotechnology_1_cpu
● ●
● ● 0.92 smilart_0_gpu
●
●
● ● 0.90 rankone_0_cpu
0.4 ●
FRPC
● ● 0.90 cyberextruder_1_cpu
-
● 0.89 ibug_0_gpu
●
●
● 0.86 cyberextruder_0_cpu
●
● ● 0.84 vocord_1_gpu
●
0.3
● ● ● 0.82 digitalbarriers_0_cpu
FPIR(N, T)
FNIR(N, R, T)
● ● 0.75 digitalbarriers_1_cpu
●
● ● 0.67 3divi_0_gpu
● ●
● ● 0.67 3divi_1_gpu
● ●
FMR(T)
FNMR(T)
VERIFICATION
0.1
3e+04 1e+05 3e+05
Number of enrolled identities, N
Figure 6: For the boarding gate dataset, the curves show false negative identification rates (FNIR) versus enrolled population when the threshold is set to a high value
● ● ●
● ● ●
●
●
●
0.9 ●
●
● ●
● ● ●
● ●
0.8 ●
●
● ●
● ●
●
● ●
● ●
●
2017/11/22 10:03:52
● ●
● ● Dataset: Concourse
●
0.7 ● ● FNIR(N=691282, FPIR=0.001)
●
● and Algorithm
● ● ●
● ● 1.00 innovatrics_0_cpu
●
● ●
0.6
● ● ● 0.99 vocord_0_gpu
● ●
● ● 0.98 innovatrics_1_gpu
● ●
● 0.89 neurotechnology_0_cpu
0.5 ●
● ● ● 0.89 neurotechnology_1_cpu
● 0.87 ibug_0_gpu
●
● ●
● 0.87 smilart_0_gpu
0.4
● ●
FRPC
● 0.81 cyberextruder_1_cpu
-
●
●
● ● ● 0.77 vocord_1_gpu
● ● 0.77 rankone_0_cpu
●
●
● ● ● ● 0.77 digitalbarriers_0_cpu
● ●
●
0.3 ● ● 0.75 cyberextruder_0_cpu
●
FPIR(N, T)
FNIR(N, R, T)
●
● ●
● ● 0.61 digitalbarriers_1_cpu
● ●
● 0.60 3divi_0_gpu
●
● ● ● 0.60 3divi_1_gpu
● ● ● 0.40 morpho_1_gpu
●
0.2 ● ● 0.37 ntechlab_1_cpu
●
● 0.34 deepsense_1_cpu
●
● 0.32 morpho_0_gpu
●
FACE RECOGNITION PRIZE CHALLENGE
FMR(T)
FNMR(T)
●
VERIFICATION
0.1
3e+04 1e+05 3e+05
Number of enrolled identities, N
Figure 7: For the travel concourse dataset, the curves show false negative identification rates (FNIR) versus enrolled population when the threshold is set to a high value
0.70
0.60
0.50
●
● ●
● ●
2017/11/22 10:03:52
0.40 ●
● ● ● Dataset: Concourse
●
●
● ● ●
FNIR(R=1, N=691282)
● and Algorithm
● ●
●
● ● ● 0.47 smilart_0_gpu
0.30
●
● ●
● 0.45 cyberextruder_1_cpu
● ●
● ● 0.41 cyberextruder_0_cpu
●
●
● ● ● ●
0.20
0.39 rankone_0_cpu
● ●
● ● 0.39 digitalbarriers_0_cpu
●
● ●
● ●
● ● ● 0.39 digitalbarriers_1_cpu
● ● ●
● ● ● 0.27 innovatrics_0_cpu
● ●
FRPC
● ●
● ● ●
●
-
● 0.24 innovatrics_1_gpu
● ● ● ●
● ● ● ● ● 0.23 3divi_1_gpu
0.10 ● ●
● ● ● ● 0.21 3divi_0_gpu
0.09 ●
●
● ● 0.19 morpho_1_gpu
0.08 ●
FPIR(N, T)
FNIR(N, R, T)
●
● ● ● 0.17 ibug_0_gpu
0.07 ●
● ●
● ● ● 0.16 morpho_0_gpu
0.06 ● ●
● ● 0.14 deepsense_0_cpu
●
0.05 ● 0.14 deepsense_1_cpu
● ●
● ● 0.07 vocord_0_gpu
FMR(T)
FNMR(T)
VERIFICATION
●
0.01
3e+04 1e+05 3e+05
Number of enrolled identities, N
Figure 8: For the travel concourse dataset, the curves show false negative identification rates (FNIR) at rank 1 versus population size, N. The threshold is set to zero. This
0.50
0.40
2017/11/22 10:03:52
0.30
Dataset: Boarding
FNIR(R=1, N=691282)
and Algorithm
0.68 cyberextruder_1_cpu
0.20
0.63 cyberextruder_0_cpu
0.60 rankone_0_cpu
0.57 smilart_0_gpu
0.50 digitalbarriers_0_cpu
0.37 innovatrics_0_cpu
-
0.08
0.36 innovatrics_1_gpu
0.07
0.34 morpho_1_gpu
0.06
0.33 3divi_1_gpu
0.05
0.32 3divi_0_gpu
FPIR(N, T)
FNIR(N, R, T)
320000 691282
0.60
0.31 ibug_0_gpu
0.50
0.29 morpho_0_gpu
0.40
0.26 neurotechnology_1_cpu
0.25 deepsense_0_cpu
0.25 neurotechnology_0_cpu
0.23 vocord_1_gpu
0.20
0.18 vocord_0_gpu
FACE RECOGNITION PRIZE CHALLENGE
0.16 ntechlab_1_cpu
0.13 yitu_0_cpu
0.08
0.07
FMR(T)
FNMR(T)
0.06
VERIFICATION
0.05
1 3 10 201 3 10 20
Rank
Figure 9: For the boarding gate dataset, the curves show false negative identification rates (FNIR) versus rank when the threshold is set to zero. This metric is relevant to
2017/11/22 10:03:52
0.10
0.08
0.06
0.04
0.03
0.10
0.08
0.06
FRPC
0.04
-
0.03
FPIR(N, T)
FNIR(N, R, T)
160000
0.10
0.08
320000
0.06
0.04 691282
0.03
0.10
FACE RECOGNITION PRIZE CHALLENGE
0.08
0.06
FMR(T)
FNMR(T)
0.10
0.08
VERIFICATION
0.06
0.04
0.03
0.001 0.010 0.100 1.000 0.001 0.010 0.100 1.000 0.001 0.010 0.100 1.000
False positive identification rate, FPIR(N, T)
Background: Prior tests have documented search speeds spanning up to three orders of magnitude. Given the implica-
tions for hardware procurement, it becomes essential to measure speed and to only invest in slow algorithms if they offer
tangible accuracy advantages. Further, given very large operational databases, the scalability of algorithms is important.
It has been reported previously [2] that search duration can scale sublinearly with enrolled population size N. Further
there has been considerable recent research on indexing, exact [4] and approximate nearest neighbor search [1, 4] and
fast-search [5].
Figure of merit: The FRPC therefore included a prize for the fastest search algorithm but with the requirement that it
also has competitive accuracy. Formally the prize went to the algorithm with the lowest template search duration and
which gave FNIR no larger than twice the best FNIR. The false negative identification rate in question here is FNIR(N, N
T) with N = 691282, and T set to give FPIR = 0.001. The figure of merit did not include the time taken to prepare the search
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8197
template which is independent of N and which dominates search time up to some crossover population size whereupon
search duration is larger.
Participation: The challenge was open to all participants in the identification accuracy challenge, as listed in Table 2.
Prize winner: Figure 11 charts the speed measurements presented earlier in Table 2. By consulting the figure, the fastest
identification algorithm is the second algorithm developed by NTechLab http://ntechlab.com/. The algorithm
gives search duration that grows sub-linearly fitting neither a logarithmic nor Power-law model exactly. It is faster but
somewhat less accurate than its linear sibling.
Note that we did not differentiate between CPU and GPU based implementations - developers were free to submit
algorithms using either kind of hardware. For those algorithms listed in Table 2 as CPU, the search duration is measured
on an Intel(R) Xeon(R) CPU E5-2630 v4 running at 2.20GHz. For GPU algorithms, the hardware is an NVidia Tesla
K40m equipped with 12GB of memory. However the FRPC test infrastructure did not record whether search was actually
conducted on the CPU or GPU - it could have been either.
Invariably the most influential parameter on recognition outcomes has been the orientation of the head in one photo-
graph relative to that in a prior image.
. Verification dependence of yaw in pairs of images: Using wild photographs and yaw estimates obtained from
an automated, government owned, pose-estimation tool we quantify the dependence of face recognition accuracy
on yaw. The ability of algorithms to compensate for viewing angle is summarized in Figure 13 which shows false
non-match rate as a function of yaw angle, θ, of the face in enrollment and verification images. These vary over
±90 degrees. Each panel encodes false non-match rate FNMR for an algorithm at a particular threshold. This is set
to give a false match rate of 0.001 for images of frontal pose i.e. those with |θ| ≤ 15. The FNMR values are generally
lowest for frontal pairs, then for pairs with the same yaw angle, and they increase with difference in yaw.
At this fixed threshold, Figure 14 shows how FMR itself varies with the pair of yaw angles. This figure is relevant
in applications where a global threshold is set and pose varies widely. It would not be relevant in cases where a
specific pair of poses is designed-in, and a dedicated threshold could be set. In all panels the center cell has FMR
= 0.001, by design. The results for other yaw angles show different behaviors. First, the more accurate algorithms
often have weak dependence of FMR on yaw angles (prevalence of grey). Others give consistently low FMR when
angles differ (prevalence of blue) consistent with an inability to match. A final class of algorithms give higher FMR
when yaw angles differ (prevalence of red in the periphery). This is typically unexpected and undesirable.
1000.0
2017/11/22 10:03:52
30.0
10.0
3.0
1.0
0.3
1000.0
30.0
10.0
3.0
1.0
FRPC
-
0.3
Phase
1000.0
templateTime
FPIR(N, T)
FNIR(N, R, T)
30.0 searchTime
10.0
3.0 totalTime
1.0
0.3
Duration (milliseconds)
ntechlab_0_cpu ntechlab_1_cpu rankone_0_cpu smilart_0_gpu vocord_0_gpu
10000.0
1000.0
30.0
10.0
FACE RECOGNITION PRIZE CHALLENGE
3.0
1000.0
30.0
FMR(T)
FNMR(T)
10.0
3.0
VERIFICATION
1.0
0.3
30000 100000 300000 30000 100000 300000 30000 100000 300000
Enrolled population size, N, one image per identity
Figure 12: Approximate examples of the images passed to the FRPC identification algorithms for the off-angle experiments summa-
rized by figures
. Identification with frontal enrollment: It is often the case that a cooperative enrollment photograph that is col-
lected to be an authoritative reference image placed in a credential (passport, driving license) or database (e.g.
mugshot database) will conform to a standard prescription of frontal pose, with roll, pitch and yaw all being zero
degrees. Accuracy then is determined by the pose relative to that. In the general case the three head angles - roll,
pitch and yaw - can vary independently taking on values up to (and beyond) 90 degrees from a frontal (0, 0, 0) view.
The relative yaw angle can then ascend to ±90 degrees, while pitch is usually constrained by the range of motion
of the neck to say ±60 degrees. Roll alone is not usually considered to be serious impediment to face recognition
since an implementation that detects eyes can perform an in-plane rotation to remove roll. However, compound
rotation of the head, as might be seen if a non-cooperative subject was lying down, has presented severe challenges
to face recognition.
Using dedicated controlled non-frontal search images, of the kind shown in Figure 12, for enrolled mates present
in galleries of size N = {16000, 48000, 160000, 320000, 691282}, we plot both rank 1 accuracy, 1−FNIR(N, 1, 0) and
high-threshold accuracy 1−FNIR(N, L, T ) against yaw angle relative to a zero degree frontal. The results are shown
in three figures as follows. The first two, Figures 15 and 16, show the sensitivity of rank one hit rate to pitch and
yaw, respectively. Many algorithms give excellent accuracy with same-day frontal images, but degrade markedly
with pitch of ±40 degrees. Similarly with yaw, most, but not all, algorithms fail to identify profile-view probes.
Figure 17 shows yaw dependence again but for FNIR at high threshold, as would be set in a surveillance application.
This exposes earlier declines in accuracy, as yaw depresses similarity scores below the threshold.
FNMR(Yaw_E, Yaw_V)
0.00 0.25 0.50 0.75 1.00
2017/11/22 10:03:52
(60,75]
(45,60]
(30,45]
(15,30]
(−15,15]
(−30,−15]
(−45,−30]
(−60,−45]
(−75,−60]
[−90,−75]
(15,30]
(−15,15]
(−30,−15]
(−45,−30]
(−60,−45]
(−75,−60]
[−90,−75]
FPIR(N, T)
FNIR(N, R, T)
morpho_0_gpu neurotechnology_0_cpu ntechlab_0_cpu rankone_0_cpu
(75,90]
(60,75]
(45,60]
(30,45]
(15,30]
(−15,15]
(75,90]
FMR(T)
FNMR(T)
] ] 5] 0] ] 5] ] ] ] ] ] ] 0] ] 0] ] ] ] 5] ] 5] 0] 5] 0] ] 0] 5] ] ] 5] 0] ] ] 5] ] ] ] ] ] ] ] ] ] ]
75 −60 −4 −3 15 ,1 30 45 60 75 90 75 −6 − 45 −3 − 15 15 30 ,4 60 7 ,9 −7 −6 − 45 −3 −1 ,15 30 4 ,6 75 90 −7 60 45 30 15 15 30 45 60 75 90
VERIFICATION
,− 5, , , ,− 15 5, 0, 5, 0, 5, ,− 5, 0, 5, 0, 15, 5, 0 5, 0, 5 , 5, 0, 5, , 15 5, 0, 5 0, 5, 0, ,− ,− ,− ,− 5, 5, 0, 5, 0, 5,
90 7 60 45 30 (− (1 (3 (4 (6 (7 90 7 6 4 3 (− (1 (3 (4 (6 (7 90 7 6 4 30 (− (1 (3 (4 (6 (7 9 75 60 45 30 ( −1 (1 (3 (4 (6 (7
[− (− (− (− (− [− (− (− (− (− [− (− (− (− (− [− ( − (− ( − ( −
Figure 13: The heatmaps shows FNMR as a function of the yaw of the enrollment and verification images. The threshold is the same in all cells, and is set to the value
that yeilds FMR = 0.001 on near frontal pairs i.e. where yaw is in the interval (−15, 15]. Poor algorithms give generally red figures. The better algorithms show a)
2017/11/22 10:03:52
(45,60]
(30,45]
(15,30]
(−15,15]
(−30,−15]
(−45,−30]
(−60,−45]
(−75,−60]
[−90,−75]
(15,30]
-
(−15,15]
(−30,−15]
(−45,−30]
(−60,−45]
(−75,−60]
[−90,−75]
FPIR(N, T)
FNIR(N, R, T)
morpho_0_gpu neurotechnology_0_cpu ntechlab_0_cpu rankone_0_cpu
(75,90]
(60,75]
(45,60]
(30,45]
(15,30]
(−15,15]
(60,75]
FMR(T)
FNMR(T)
] ] 5] 0] ] 5] ] ] ] ] ] ] 0] ] 0] ] ] ] 5] ] 5] 0] 5] 0] ] 0] 5] ] ] 5] 0] ] ] 5] ] ] ] ] 5] ] ] ] ] ]
75 60 −4 −3 15 ,1 30 45 60 75 90 75 −6 − 45 3 − 15 15 30 ,4 60 7 ,9 −7 −6 − 45 −3 −1 ,15 30 4 ,6 75 90 −7 60 45 30 15 ,1 30 45 60 75 90
,− ,− , , ,− 15 5, 0, 5, 0, 5, ,− 5, 0, ,− 0, 15, 5, 0 5, 0, 5 , 5, 0, 5, , 15 5, 0, 5 0, 5, 0, ,− ,− ,− ,− 15 5, 0, 5, 0, 5,
90 75 60 45 30 (− (1 (3 (4 (6 (7 90 7 6 45 3 (− (1 (3 (4 (6 (7 90 7 6 4 30 (− (1 (3 (4 (6 (7 9 75 60 45 30 (− (1 (3 (4 (6 (7
VERIFICATION
[− (− (− (− (− [− (− (− (− (− [− (− (− (− (− [− (− (− (− (−
Yaw of face in enrollment image
Figure 14: The heatmap shows FMR dependence on the yaw of the enrollment and verification images. The threshold is the same in all cells, and is set to the value that
yields FMR = 0.001 on near frontal pairs, i.e. where yaw is in the interval (−15, 15]. Thus the center of each panel is grey. The desired behavior is that FMR does not vary
automated pose estimator, and are themselves noisy. The figure assumes that the pose estimates are not systematically incorrect.
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8197
2017/11/22 10:03:52
● ●
0.4 ●● ●
●
0.2
0.2
FRPC
-
0.6
FPIR(N, T)
FNIR(N, R, T)
0.4
0.2
0.8 ●●
● ●
● ●●
0.6 ●
●●
0.2
0.6
FMR(T)
FNMR(T)
0.4
gsize ● 16000 ● 48000 ● 160000 ● 320000 ● 691282
VERIFICATION
0.2
00_−40 00_−20 00_−10 00_20 00_40 00_−40 00_−20 00_−10 00_20 00_40 00_−40 00_−20 00_−10 00_20 00_40
Pitch Angle
Figure 15: The points show zero threshold, rank one, true positive identification rates (TPIR= 1−FNIR), aka hit rates, versus pitch difference between frontal enrollment
2017/11/22 10:03:52
0.25
● ● ●● ● ●
●●
●
●
●
● ●
●●
● ●
●● ●●
●●
0.00
0.50
● ● ●
● ● ●
● ● ●●
0.25 ● ●
●
●
●
●●
●●● ●
0.00 ●
●
●● ●
●
● ●
●●●
FRPC
-
0.50
FPIR(N, T)
FNIR(N, R, T)
0.25 ●●
●● ●●
● ●●
● ●
●
●
●● ●●
●
●●●
● ●
●●●
● ●
●●
0.00
0.50
●● ●
●
0.00 ●●●
● ●
●
●● ●●
●
●
0.50
FMR(T)
FNMR(T)
0.25 ●
●
●●●
gsize ● 16000 ● 48000 ● 160000 ● 320000 ● 691282
VERIFICATION
0.00 ●●
●
●● ●
●● ●
●●
●
22_00 45_00 60_00 80_00 90_00 22_00 45_00 60_00 80_00 90_00 22_00 45_00 60_00 80_00 90_00
Yaw Angle
Figure 16: The points show zero threshold, rank one, true positive identification rates (TPIR= 1−FNIR), aka hit rates, versus yaw difference between frontal enrollment
2017/11/22 10:03:52
● ●
● ●
0.25 ●
● ● ●
● ● ●
● ●
0.00 ●●
● ●
●
●● ●
●
● ●●
● ●●
●
FPIR(N, T)
FNIR(N, R, T)
●
0.25 ● ● ●
●
●
● ●●
● ● ●●● ● ●
● ● ● ●●
● ●● ●
0.00 ●
● ●●
● ●●● ●
●● ●
●● ●●●
● ●●
●●
● ●●
●● ●●
0.00 ●●
●● ●
●
●● ● ●●
●● ●● ●
●
●● ●
●●●
FMR(T)
FNMR(T)
0.25 ●
VERIFICATION
●●
●
● ●
●● ●●
●
● ●
●●
0.00
22_00 45_00 60_00 80_00 90_00 22_00 45_00 60_00 80_00 90_00 22_00 45_00 60_00 80_00 90_00
Yaw Angle
Figure 17: The points show high threshold true positive identification rates (TPIR= 1−FNIR), aka hit rates, versus the yaw angle of the probe face. The threshold is set to
References
[1] Artem Babenko and Victor Lempitsky. Efficient indexing of billion-scale datasets of deep descriptors. In The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
[2] Patrick Grother and Mei Ngan. Interagency report 8009, performance of face identification algorithms. Face Recogni-
tion Vendor Test (FRVT), May 2014.
[3] Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled faces in the wild: A database for
studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts,
Amherst, October 2007.
[4] Masato Ishii, Hitoshi Imaoka, and Atsushi Sato. Fast k-nearest neighbor search for face identification using bounds
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8197
of residual score. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pages
194–199, Los Alamitos, CA, USA, May 2017. IEEE Computer Society.
[5] Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus. CoRR, abs/1702.08734,
2017.
[6] Ira Kemelmacher-Shlizerman, Steven M. Seitz, Daniel Miller, and Evan Brossard. The megaface benchmark: 1 million
faces for recognition at scale. CoRR, abs/1512.00596, 2015.
[7] O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In British Machine Vision Conference, 2015.
[8] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and
clustering. CoRR, abs/1503.03832, 2015.
[9] Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to human-level perfor-
mance in face verification. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR
’14, pages 1701–1708, Washington, DC, USA, 2014. IEEE Computer Society.