0% found this document useful (0 votes)
9 views29 pages

Ls 7

Uploaded by

mraj03060
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views29 pages

Ls 7

Uploaded by

mraj03060
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Computational Visual Media

https://doi.org/10.1007/s41095-022-0317-1 Vol. 9, No. 4, December 2023, 657–685

Review Article

3D face recognition: A comprehensive survey in 2022

Yaping Jing1 , Xuequan Lu1 ( ), and Shang Gao1

c The Author(s) 2023.

Abstract In the past ten years, research on face Thus, 3D face recognition has become an active
recognition has shifted to using 3D facial surfaces, as research topic in recent years.
3D geometric information provides more discriminative Face recognition mainly involves the process of
features. This comprehensive survey reviews 3D extracting feature representations from the input
face recognition techniques developed in the past face, matching the extracted features with existing
decade, both conventional methods and deep learning databases and predicting the personal identity of
methods. These methods are evaluated with detailed the input face. Therefore, using rich facial features
descriptions of selected representative works. Their is critical to the recognition result. In 3D face
advantages and disadvantages are summarized in terms recognition, 3D face data are used for training and
of accuracy, complexity, and robustness to facial testing purposes. Compared to 2D images, 3D faces
variations (expression, pose, occlusion, etc.). A review contain richer geometric information, that can provide
of 3D face databases is also provided, and a discussion
more discriminative features and help face recognition
of future research challenges and directions of the topic.
systems overcome the inherent defects and drawbacks
Keywords 3D face recognition; 3D face databases; of 2D face recognition, such as facial expression,
deep learning; local features; global features occlusion, and pose variation. Furthermore, 3D data
are relatively unchanged after scaling, rotation, and
1 Introduction illumination change [5]. Most 3D scanners can acquire
both 3D meshes/point clouds and corresponding
Face recognition has become a commonly used textures. This allows us to integrate advanced 2D
biometric technology. It is widely applied in public face recognition algorithms into 3D face recognition
surveillance, authentication, security, intelligence, systems for better results.
and many other systems [1]. During recent decades, One of the main challenges to 3D face recognition
many 2D face recognition techniques have achieved is the acquisition of 3D training images—this cannot
strong results in controlled environments. The be accomplished by crawling the Web, unlike for
accuracy of 2D face recognition has been greatly 2D face images. It requires special hardware instead.
enhanced especially by the emergence of deep learning. According to the technologies used, collection systems
However, these techniques are still challenged by the can be broadly divided into active acquisition and
intrinsic limitations of 2D images, due to variations passive acquisition [6]. An active collection system
in illumination, pose, and expression, occlusion, actively emits invisible light (e.g., an infrared laser
disguises, time delays, and image quality [2]. 3D face beam) to illuminate the target face and obtain the
recognition can outperform 2D face recognition [3] shape features of the target by measuring reflectivity.
with greater recognition accuracy and robustness, as it is A passive acquisition system consists of several
less sensitive to pose, illumination, and expression [4]. cameras placed apart. It matches points observed
to those from other cameras and calculates the 3D
1 The School of Information Technology, Deakin University,
Waurn Ponds, VIC, Australia. E-mail: Y. Jing, jingyap@
position of the matched point. The 3D surface is
deakin.edu.au; X. Lu, xuequan.lu@deakin.edu.au ( ); formed from a set of matched points.
S. Gao, shang.gao@deakin.edu.au. Since 2000, many researchers have begun to
Manuscript received: 2022-04-14; accepted: 2022-09-29 assess 3D face recognition algorithms on large-scale

657
658 Y. Jing, X. Lu, S. Gao

databases and published related 3D face databases,


e.g., Face Recognition Vendor Tests (FRVT-2000) [7],
FRVT-2002 [8], the Face Recognition Grand
Challenge (FRGC) [9], and FRVT-2006 [10]. This
suggests that there is a close relationship between
large datasets and 3D face recognition techniques, so
we also summarize existing public 3D face databases
and their data augmentation methods as well as
reviewing recognition technologies.
Several relevant surveys have been conducted
by researchers from different perspectives. In
2006, Ref. [3] reviewed research trends in 3D face
recognition. Ref. [11] summarized the associated
literature up to the year 2007. Ref. [12] studied
various algorithms for expression invariant 3D face
recognition and evaluated the complexity of existing Fig. 1 Taxonomy of 3D face recognition methods.
3D face databases. Later, Ref. [2] categorized
face recognition algorithms into single-modal and and hybrid. As for deep learning-based methods,
multi-modal ones in 2014. Ref. [1] studied 3D nearly all use pre-trained networks and then fine-tune
face recognition techniques, comprehensively covering these networks with converted data (e.g., 2D maps
conventional methods. Recently, Refs. [13] and from 3D faces). Popular deep learning-based face
[6] each presented a review of 3D face recognition recognition networks include VGGNet [16], ResNet
algorithms, but only a few deep learning-based [17], ANN [18], and recent lightweight CNNs such as
methods were covered. Refs. [14] and [15] reviewed MobileNetV2 [19].
deep learning-based face recognition methods in 2018, The structure of this survey is as follows. Section 2
but their focus was mainly on 2D face recognition. introduces widely used 3D face databases and
In this paper, we focus on 3D face recognition. datasets. Section 3 covers data preprocessing and
Compared to the existing literature, the main augmentation. Sections 4 and 5 respectively review
contributions of our work are follows: conventional 3D face recognition methods and deep
• this is the first survey paper to comprehensively learning-based methods. Section 6 compares these
cover conventional methods and deep learning- methods and discusses future research directions,
based methods for 3D face recognition; followed by conclusions in Section 7.
• unlike existing surveys, it pays special attention to
deep learning-based 3D face recognition methods;
2 3D face databases
• it covers the latest and most advanced
developments in 3D face recognition, providing a Large-scale 3D face databases and datasets are
clear progress chart for the topic; and essential for the development of 3D face recognition.
• it provides a comprehensive comparison of They are used to train feature extraction algorithms
existing methods using available databases, and and evaluate their output. To meet this demand,
suggests future research challenges and directions. many research institutions and researchers have
According to the feature extraction methods created various 3D face databases. Table 1 lists
adopted, 3D face recognition techniques can be current prominent 3D face databases and compares
divided into two categories: conventional methods their data formats, number of persons contained
and deep learning-based methods (see Fig. 1). To (IDs), image variations (e.g., expression, pose, and
extract face features, conventional methods always occlusion), and scanning devices. Four different 3D
use traditional algorithms, either linear or nonlinear, data formats are in use: point clouds (Fig. 2(a)),
e.g., principle component analysis (PCA). They can 3D meshes (Fig. 2(b)), range images (Fig. 2(c)), and
be further divided into three types: local, global, depth maps plus 3D video.
3D face recognition: A comprehensive survey in 2022 659

Table 1 3D face databases. Datatype: M=mesh, P=point cloud, R=range image, V=video, 3DV=3D video; Expression: S=smile/happiness,
M=multiple expressions; Pose: LR=slight left/right turn, UD=slight up/down turn, M=multiple poses
Data
Year type IDs Scans Texture Expression Pose Occlusion Scanner

Standard CCD black and white


3DRMA [21] 2000 M 120 720 Yes — LR, UD —
camera

FSU [22] 2003 M 37 222 No — — — Minolta Vivid 700

Minolta Vi-700 laser range


GavabDB [23] 2004 M 61 427 No S ±30◦ —
scanner

FRGC v2 [9] 2005 R 466 4007 Yes S ±15◦ — Minolta Vivid 3D scanner

±45 ,
UND [24] 2005 R 275 670 Yes — — Minolta Vivid 900
±60◦

S, surprise, InSpeck 3D MEGA Capturor


ZJU-3DFED [25] 2006 M 40 360 No — —
sad DF

Stereo photography, 3DMD


BU3D-FE [26] 2006 M 100 2500 Yes M — —
digitizer

CASIA [27] 2006 R 123 4059 No M ±90◦ — Minolta Vivid 910

Minolta Vivid 700 red laser


FRAV3D [28] 2006 M 105 1696 Yes M M —
light scanner

ND-2006 [29] 2007 R 888 13,450 Yes M ±15◦ — Minolta Vivid 910

Inspeck Mega Capturor II 3D


Bosphorus [30] 2008 P 105 4666 Yes M M 4
scanner

UoY [31] 2008 M 350 5000 Yes M UD — Stereo vision 3D camera

SHREC08 [32] 2008 R 61 427 No M UD — —

BJUT-3D [33] 2009 M 500 1200 Yes — — — Cyberware 3D laser scanner

Texas-3D [34] 2010 R 118 1149 Yes M ±10◦ — MU-2 stereo imaging system

S, anger,
UMBDB [20] 2011 R 143 1473 Yes — 7 Minolta Vivid 900 laser scanner
bored

3D-TEC [35] 2011 R 214 428 Yes S — — Minolta scanner

SHREC11 [36] 2011 R 130 780 No — M — Escan laser scanner

Konica Minolta Vivid 910


NPU3D [37] 2012 M 300 10,500 No M M 4
non-contact 3D laser scanner

Di3D (Dimensional Imaging)


BU4D-FE [38] 2013 3DV 101 60,600 Yes — — —
dynamic system

N, S,
KinectFaceDB [39] 2014 R 52 936 Yes LR Multiple Kinect
surprise

Random
Lock3DFace [40] 2016 R 509 5711 Yes M ±90◦ Kinect
cover

Semi-
Half
F3D-FD [41] 2018 R 2476 — Yes — lateral Vectra M1 scanner
face
with ear

LS3DFace [42] 2018 P 1853 31,860 Yes — — — —

4DFAB [43] 2018 V 180 1800k+ Yes M M — DI4D capturing system

WFFD [44] 2020 V 241 285 Yes — — — —

SIAT-3DFE [45] 2020 3D 500 8000 Yes M — 2 Structured light (CASZM-3D)

FaceScape [46] 2020 Y 938 18,760 Yes M — — 68 DSLR cameras

CAS-AIR-3D Face [47] 2021 V 3093 24,713 No S, surprise ±90◦ Glasses Intel RealSense SR305
660 Y. Jing, X. Lu, S. Gao

landmarks is Texas-3D [34], whose 3D images have


been preprocessed, and 25 manual landmarks added.
Therefore, it provides a good option for researchers to
focus specifically on developing 3D face recognition
algorithms, without considering initial preprocessing
of 3D images [34].
BU3D-FE [26] (Binghamton University 3D Facial
Expression) is a database specifically developed for
3D facial expression recognition. It contains 100
identities with 6 expression types: anger, happiness,
sadness, surprise, disgust, and fear [26]. The
FRAV3D [28] database involves 81 males and 24
females; three kinds of images (3D meshes, 2.5D range
Fig. 2 3D face data representations. Reproduced with permission
from Ref. [20], c IEEE 2011.
data, and 2D color images) were captured using the
MINOLTA VIVID-700 red laser scanner [28]. BJUT-
Before 2004, there were few public 3D face 3D [33] is one of the largest Chinese 3D face databases
databases. Some representatives include 3DRMA and includes 1200 Chinese 3D face images [33]. The
[21], FSU [22], and GavabDB [23]. The GavabDB two smallest databases are ZJU-3DFED [25] and
database contains 61 individuals, aged between 18 SHREC08 [32]. ZJU-3DFED consists of 40 identities
and 40. Each has 3 frontal images with different and 9 scans with four different expressions for each
expressions and 4 rotated images with a neutral identity [25]. SHREC08 consists of 61 people with 7
expression [23]. In 2005, the FRGC v2 database was scans for each [32].
designed to improve the results of face recognition Between 2010 and 2015, there were six noteworthy
algorithms. It had a huge impact on the development databases created. UMBDB [20] is an excellent
of 3D face recognition [9], and it is still used as a database for testing 3D face recognition algorithms
standard reference database for evaluating 3D face with varying occlusion. It contains 578 occluded
recognition algorithms. In the same year, another acquisitions [20]; some examples are shown in Fig. 3.
important database, the University of Notre Dame 3D-TEC [35] (3D Twins Expression Challenge) is a
UND database [24] was released, in which each person challenging dataset as it contains 107 pairs of twins
has only one 3D image and multiple 2D images [24]. with similar faces and different expressions [35]. Thus,
From 2006 to 2010, further databases were created. this database is helpful when 3D face recognition
The largest is ND-2006 [29] which is a superset of must work in the presence of varying expressions.
FRGC v2. It contains 13,450 images and 888 persons
with as many as 63 images per person [29]. The
second largest is UoY [31], which consists of more
than 5000 models (350 persons) and is owned by
the University of York (UK) [31]. CASIA [27] and
Bosphorus [30] are similar in size, with close to
5000 images. CASIA was collected in 2004 using
a non-contact 3D digitizer, a Minolta Vivid 910,
and contains 4059 images of 123 subjects [27]. It
not only considers separate variations in expression,
pose, and illumination, but also introduces combined
changes, with different expressions in different
poses. Bosphorus has 381 individuals and the
most expression and pose changes. It provides
manual marking of 24 facial landmarks for each Fig. 3 Examples from the UMB database. Row 1: neutral faces
under differing illumination. Rows 2, 3: faces partially occluded by
scanned image, including nose tip, chin centre, and objects such as scarves and hands. Reproduced with permission from
eye corners [30]. Another database with manual Ref. [20], c IEEE 2011.
3D face recognition: A comprehensive survey in 2022 661

SHREC11 [36] is built upon a new collection of 130 databases. Increasing the gallery size may degrade
masks with 6 3D face scans [36]. Like BJUT-3D, the performance of face recognition [48]. Although
Northwestern Polytechnic University 3D (NPU3D some algorithms have achieved good results on these
[37]) is another large-scale Chinese 3D face database, existing 3D face databases, they still cannot be used
composed of 10,500 3D face captures, corresponding in the real world due to less controlled conditions.
to 300 individuals [37]. BU4D-FE [38] is a 3D The establishment of large-scale 3D face databases to
video database that records spontaneous expressions simulate real-world situations is essential to facilitate
of various young people by completing 8 emotional research into 3D face recognition. In addition,
expression elicitation tasks [38]. KinectFaceDB [39] collecting 3D face data is a time-consuming and
was the first publicly available face database based on resource-demanding task. Research into large dataset
the Kinect sensor and contains four data modalities generation algorithms is one of our suggestions for
(2D, 2.5D, 3D, and video-based) [39]. future work (see also Section 3.2).
Recently, another large-scale 3D face database
Lock3DFace [40] was released. It is based on Kinect
3 Data preprocessing and augmentation
and contains several variations in expression, pose,
time-lapse, and occlusion [40]. F3D-FD [41] is a large 3.1 Data preprocessing
dataset, and has the most individuals: 2476. For each
In most situations, the acquired raw 3D face data
individual, it includes partial 3D scans from frontal
cannot be directly input to feature extraction systems
and two semi-lateral views, and a one-piece face
as they may contain redundant information [6]. For
with lateral parts (including ears and earless, with
example, presence of hair, neck, and background may
landmarks) [41]. LS3DFace [42] is the largest dataset
affect the accuracy of recognition. Thus, 3D data
so far, including 31,860 3D face scans of 1853 people.
are usually preprocessed before being passed into a
It is combines data from multiple challenging public
feature extraction model.
datasets, including FRGC v2, BU3D-FE, Bosphorus,
In general, the data preprocessing phase includes
GavabDB, Texas-3D, BU4D-FE, CASIA, UMBDB,
three main steps: facial landmark detection and
3D-TEC, and ND-2006 [42]. 4DFAB [43] is a large
orientation, data segmentation, and face registration.
dynamic high-resolution 3D face database; it contains
Facial landmarks are a set of keypoints defined by
4D videos of subjects showing spontaneous and posed
anthropometric studies [49] and can be used to
facial behavior.
automatically localize and register a face. Some
The large-scale Wax Figure Face Database (WFFD
databases already provide landmarks for each face
[44]) is designed to address vulnerabilities in existing
image. Data segmentation is the process of utilizing
3D facial spoofing databases and to promote the
facial landmarks, such as nose tip and eye corners, to
research of 3D facial presentation attack detection
segment the facial surface [49]. This process is always
[44]. This database includes photo-based and video-
used by local conventional methods, which determine
based data. We only detail the video information
identifiable facial parts like the nose and eyes for
in Table 1. SIAT-3DFE [45] is a 3D facial
feature extraction. As an essential step before feature
expression dataset in which every identity has 16
extraction and matching, face registration aligns the
facial expressions including natural, happy, sad,
target surface (the entire face or face part) with the
surprised, and several exaggerated expressions (open
training surface in the gallery.
mouth, frowning, etc.), as well as two occluded 3D
cases [45]. Another recent database is FaceScape 3.2 Data augmentation
[46], which consists of 18,760 textured 3D data with To improve the performance and robustness of face
pore-level facial geometry [46]. CAS-AIR-3D [47] is recognition systems, large-scale datasets are required,
a large low-quality 3D face database, including 3093 especially for deep learning-based methods, since their
individuals. Each records 8 videos with different poses networks need to be trained using a large amount of
and expressions, occlusion, and distance change. training data.
It is well known that the performance of 3D face Several augmentation methods can be used to
recognition algorithms may vary on different 3D face increase the size of the training and test datasets.
662 Y. Jing, X. Lu, S. Gao

The easiest is to rotate and crop existing face data. (pose, expression), which can be further used for
Another popular approach is to use 3D morphable face manipulation and enhancement.
facial models (3DMM) [50] to generate new shapes
and expressions to synthesize new facial data [51, 52].
4 Conventional methods
Randomly selecting sub-feature sets from different
samples of a person and combining them to generate a 4.1 Approaches
new face is also a reliable way to enrich the identities According to Ref. [60] and our review of the last
in datasets [42]. decade’s literature, conventional face recognition
Recently, generative adversarial networks (GANs) algorithms can be classified into three types based
have been used to generate realistic synthetic images on their feature extraction approaches, local, global,
[53–58]. GANs usually consist of a generator and and hybrid, as shown in Fig. 1. Local approaches
a discriminator, which are alternately trained in mainly focus on local facial features such as the nose
minimax games. The discriminator is trained to and eyes [60]. In contrast to local methods, global
discriminate generated samples from real samples, approaches use the entire face to generate feature
and the generator is trained to generate images vectors for feature classification. Hybrid methods use
resembling real ones to minimize the success of the both local and global facial features.
discriminator. In local methods, fusion schemes are used to
Ref. [53] proposed UV-GAN, which first generates a improve accuracy. There are five levels for fusion
completed u–v map from a single image, then attaches schemes: sensor level, feature level, rank level,
the completed u–v to a fitted 3D mesh, and generates decision level, and score level [1]. Sensor level
synthetic faces in arbitrary poses to increase pose fusion merges the original sensor data at the initial
variation. In 3D-PIM [54], a 3DMM cooperates with stage of recognition; feature level fusion involves the
a GAN to automatically recover natural frontal face combination of features extracted from different facial
images from arbitrary poses. The 3DMM is used as a representations of one single object. For rank level
simulator to generate synthetic faces with normalized fusion, ranks are assigned to gallery images based on
poses, and the GAN is used to refine the realism of a descending sequence of confidence, while score level
the output detail. fusion combines matching scores of each classifier
based on a weighting scheme. Decision level fusion
FaceID-GAN [55] formulates a three-player GAN by
combines the decision of each classifier [1].
introducing an identity classifier that works with the
Details of the three conventional face recognition
discriminator to compete with the generator. Ref. [56]
approaches are discussed below.
also proposed a method to generate frontal faces
from profile faces by employing a GAN with dual- 4.2 Local methods
discriminator structure. The 3D GAN generator in 4.2.1 Classification
Ref. [57] is augmented with an integrated 3DMM, In the last decade, many local approaches have been
including two CNNs for facial texture and background built, where local feature descriptors are used to
generation, to ensure that identity is preserved in the describe 3D local facial information. Table 2 lists
synthetic images after manipulating pose. noteworthy 3D local methods and summarizes their
In FA-GAN [58], a graph-based two-stage important details.
architecture was proposed, consisting of two parts: a Following Ref. [13], these methods can be classified into
geometry preservation module (GPM) and a face three different types based on the descriptors: keypoint-
disentanglement module (FDM). For the GPM, based, curve-based, and region-based. Keypoint-based
a graph convolutional network (GCN) [59] is methods detect a set of 3D keypoints based on face
introduced to explore the relationship between geometry and build feature descriptors by calculating
different face regions and better preservation of relationships between these keypoints. Curve-based
geometric information. FDM disentangles the methods use a set of curves on each face surface as
encoded facial feature embeddings into identity feature vectors. Region-based methods extract features
representations and deformation attribute codes from certain regions of the face surface [13].
3D face recognition: A comprehensive survey in 2022 663

Table 2 Local techniques. RR1 = rank-1 recognition rate. Advantages: (i) changes to which the method is relatively robust, or (ii) other
benefits such as speed, etc. Limitations note circumstances under which the corresponding method under-performs
Author/year Category Method Advantage Limitation Database RR1 (%)

89.2
Berretti et al. (2011) Covariance matrix, Keypoints
SIFT keypoint Partial facial FRGC v2 (partial
[61] X 2 dist redundancy
faces)

Mesh-based Histograms, cosine


Li et al. (2011) [62] Expression Pose Bosphorus 94.1
keypoint dist

Creusot et al. (2011) Landmark Computationally


Linear combination Expression FRGC v2 —
[63] keypoint expensive

SVM-based fusion, Simple


Zhang and Wang (2011) Landmarks
six similarity preprocessing, Occlusion FRGC v2 96.2
[64] keypoint
measures noise, resolution

Inan and Halici (2012) Neutral


SIFT keypoint Cosine dist Noise FRGC v2 97.5
[65] expression

FRGC v2 95.6
Berretti et al. (2013) Large pose,
Curve Sparse Missing parts GavabDB 97.13
[66] expression
UND 75

Expression, hair Exaggerated


Li and Da (2012) [67] Curve PCA FRGC v2 97.80
occlusion expressions

Euclidean dist, Efficient, data


Ballihi et al. (2012) [68] Curve Occlusion FRGC v2 98
AdaBoost storage

Berretti et al. (2013) Mesh-based


X 2 dist Missing parts Low accuracy UND 77.1
[69] keypoint

Mesh-based Expression, Bosphorus 93.7


Smeets et al. (2013) [70] Angles comparison Noise
keypoint partial data FRGC v2 89.6

Linear (LDA),
Creusot et al. (2013) Mesh-based Complexity, FRGC v2 —
non-linear Expression
[71] landmark keypoint occlusion Bosphorus —
(AdaBoost)

LBP,
Region Occlusion,
Tang et al. (2013) [72] Nearest-neighbor Expression FRGC v2 94.89
(LBP-based) missing data
(NN)

Region (geometric FRGC v2 95.6


Lei et al. (2013) [73] SVM Expression Occlusion
feature) BU-3DFE 97.7

Elaiwat et al. (2014) Illumination,


Region Curvelet transform Occlusion FRGC v2 —
[74] expression

Extreme
Riemannian Pose, missing
Drira et al. (2013) [75] Curve expression, FRGC v2 97.7
framework data
complexity

Region
Li et al. (2014) [76] ICP, sparse-based Expression, fast Pose, occlusion FRGC v2 96.3
(LBP-based)

Noise,
Berretti et al. (2014) Mesh-based Occlusion,
Classifier low-resolution Bosphorus 94.5
[77] keypoint missing parts
images

Efficient, FRGC v2 —
Lei et al. (2014) [78] Curve KPCA, SVM Occlusion
expression SHREC08 —

Region (Geometric
Tabia et al. (2014) [79] Riemannian metric Expression Occlusion GavabDB 94.91
features)

Vezzetti et al. (2014) Landmark Expression,


Euclidean distance Low accuracy Bosphorus —
[80] keypoint occlusion

Gaussian filters, Expression,


Mesh-based
Li et al. (2015) [81] fine-grained occlusion, Cost Bosphorus 96.56
keypoint
matcher registration-free
664 Y. Jing, X. Lu, S. Gao

Table 2 Local techniques (continued)


Author/year Category Method Advantage Limitation Database RR1 (%)

Elaiwat et al. (2015) Mesh-based Curvelet transform, Illumination,


Occlusion FRGC v2 97.1
[82] keypoint cosine dist expressions

Al-Osaimi (2016) [83] Curve Euclidean dist Fast, expression Occlusion FRGC 97.78

FRGC v2 —
Regional, global Large pose, Patches
Ming (2015) [84] Region CASIA —
regression efficient detection
BU-3DFE —

Rotational
Occlusion,
Guo et al. (2016) [85] Keypoint Projection Cost FRGC v2 97
expression and
Statistics (RoPS),
pose
average dist

Soltanpour and Wu Histogram


SIFT keypoint Expression Pose FRGC v2 96.9
(2017) [86] matching

Missing parts,
Two-Phase Extreme pose,
Lei et al. (2016) [87] SIFT keypoint occlusion, data FRGC v2 96.3
Weighted expression
corruptions

Emambakhsh and Mahalanobis, cosine Expression,


Curve Occlusion FRGC v2 97.9
Evans (2017) [88] dist single sample

Region Expression, BU-3DFE —


Werghi et al. (2016) [89] Cosine, X 2 dist pose
(LBP-based) missing data Bosphorus —

Region (geometric
Hariri et al. (2016) [90] Geodesic dist Expression, pose Partial occlusion FRGC v2 99.2
features)

Soltanpour and Wu Extreme pose, FRGC v2 98.1


Region (LDP) ICP Expression
(2017) [91] missing data Bosphorus 97.3

Riemannian kernel Expression,


Deng et al. (2017) [92] SIFT keypoints Low-complex FRGC v2 97.3
sparse coding occlusion

Expression, time Occlusion,


Abbad et al. (2018) [93] Curve Angles comparison GavabDB 99.18
consumption missing data

Soltanpour and Wu Computational


Region (LDP) ICP Pose FRGC v2 99.3
(2019) [94] cost

Region Low
Shi et al. (2020) [95] LBP, SVM Pose, occlusion Texas-3D 96.83
(LBP-based) consumption

4.2.2 Keypoint-based methods Ref. [86] used SIFT keypoint detection on


As keypoint-based methods use a set of keypoints pyramidal shape maps to obtain 3D geometric
and their geometric relationships to represent facial information and combine it with 2D keypoints.
features, two important steps are involved: keypoint However, this method is sensitive to pose changes.
detection and feature descriptor construction [13]. Ref. [85] used a 3D point cloud registration algorithm
One of the most commonly used keypoint detectors combined with local features to achieve both pose
is the scale invariant feature transformation (SIFT) and expression invariance. Later, a keypoint-based
[98]. For example, Ref. [61] used SIFT to detect multiple triangle statistics (KMTS) [87] method was
relevant keypoints of a 3D depth image (see Fig. 1 proposed to address partial facial data, pose change,
in Ref. [61]), where local shape descriptors are and large facial expression variation. Recently, SIFT
adopted to measure the changes of face depth in has also been used to detect keypoints in Ref. [92];
each keypoint’s neighborhood. In Ref. [65], to obtain it uses local covariance descriptors and Riemann
feature vectors, SIFT descriptors are applied to 2D kernel sparse coding to improve the accuracy of 3D
matrices including shape index, curvedness, and face recognition. Accuracy was further improved in
Gaussian and mean curvature values generated from Ref. [99].
3D face data. In order to improve the robustness to large
3D face recognition: A comprehensive survey in 2022 665

occlusion or pose variation, SIFT keypoint detection based 3D face recognition method was proposed and
is directly used on 3D mesh data. The extension 17 landmarks were automatically extracted based on
of SIFT to 3D meshes is called MeshSIFT [70]. In facial geometrical characteristics; this was further
Ref. [70], salient points on a 3D face surface were first extended in Ref. [100].
detected as extreme values in scale space, and then 4.2.3 Curve-based methods
an orientation was assigned to these points. A feature A curve-based method uses a set of curves to construct
vector was used to describe them by concatenating feature descriptors. It is difficult to decide whether
the histograms of slant angles and shape indices. such methods are local or global, because these
Before this approach was applied, Ref. [62] also curves usually cover the entire face, and capture
used minimum and maximum curvatures within a geometric information from different face regions to
3D Gaussian scale space to detect salient points represent the 3D face. The curves can be grouped
and used the histograms of multiple order surface into level curves and radial curves according to their
differential quantities to characterize the local facial distribution. Level curves are non-intersecting closed
surface. The descriptors of detected local regions curves of different lengths; radial curves are open
were further used in 3D face local matching. Ref. [81] curves, usually starting from the nose tip.
also described an extension to this work, in which Level curves can be further divided into iso-depth
a fine-grained matching of 3D keypoint descriptors and iso-geodesic curves [13] (see Fig. 4 in Ref. [96]
was proposed to enlarge intra-subject similarity and Fig. 5 in Ref. [101]). Iso-depth curves can be
and reduce inter-subject similarity. However, a obtained by translating a plane across the facial
large number of keypoints were detected by these surface in one direction and were first introduced by
methods. Samir et al. [97]. Ref. [96] expanded this work and
A meshDOG keypoint detector was proposed by proposed iso-geodesic curves which are level curves
Berretti et al. [69, 77]. They first used the meshDOG of surface distance from the nose tip. However, both
keypoint detector and local geometric histogram (GH) kinds are sensitive to occlusion, missing parts, and
descriptor to extract features, and then selected the large facial expressions. Thus, radial curves were
most effective feature based on an analysis of the introduced in Ref. [101] and extended in Ref. [75].
optimal scale, distribution and clustering of keypoints, These curves can better handle occlusion and missing
and the features of local descriptors. Recently, parts as it is uncommon to lose a full radial curve
Ref. [82] exploited a curvelet-based multimodal and at least some parts of a radial curve can be
keypoint detector and local surface descriptor that used. Also, they can be associated with different
extracts both texture and 3D local features. It facial expressions as the radial curves pass through
reduces the computational cost of keypoint detection different facial regions.
and feature building, as the curvelet transform is In Ref. [67], facial curves in the nose region of a
based on the FFT. target face were first extracted to form a rejection
Additionally, a set of facial landmarks is used classifier, which was used to quickly and effectively
for creating feature vectors in some methods, and eliminate different faces in the gallery. Then the
a shape index is widely used to detect landmarks. face was segmented into six facial regions. A facial
In Ref. [63], keypoints were extracted from a deformation mapping was produced by using curves
shape dictionary, which was learned on a set of 14 in these regions. Finally, adaptive regions were
manually placed landmarks on a human face. As selected to match the two identities. In Ref. [68],
an extension, Ref. [71] used a dictionary of learned geometric curves from the level sets (circular curves)
local shapes to detect keypoints, and evaluated them and streamlines (radial curves) through the Euclidean
through linear (LDA) and nonlinear (AdaBoost) distance functions of 3D faces were combined for high-
methods. Ref. [64] detected resolution invariant accuracy face recognition.
keypoints and scale-space extremes on shape index A highly compact signature of a 3D face can be
images based on scale-space analysis, and used six characterized by a small set of features selected by
scale-invariant similarity measures to calculate the the Adaboost algorithm [102], a well-known machine
matching score. In Ref. [80], an entirely geometry- learning feature selection method. By using the
666 Y. Jing, X. Lu, S. Gao

selected curves, face recognition time was reduced was proposed to reduce the influence of local facial
from 2.64 to 0.68 s, showing that feature selection can distortion.
effectively improve system performance. To provide Recently, Ref. [95] used the LBP algorithm to
high discriminative feature vectors and improve extract features of 3D depth images and the SVM
computational efficiency, angular radial signatures algorithm to classify them. The feature extraction
(ARSs) were proposed by Lei et al. [78]. An ASR is time of each depth map in Texas-3D was reduced
a set of curves from the nose tip (the origin of facial to 0.19 s while Ref. [70] required 23.54 s. Inspired
range images) at intervals of θ radians. by LBP, Ref. [76] proposed the multi-scale and
Another type of facial curve was introduced by multi-component local normal patterns (MSMC-
Berretti et al. [66]. SIFT was utilized to detect LNP) descriptor, which can describe normal facial
keypoints of 3D depth images and which were information more compactly. The Mesh-LBP method
connected to form the facial curves. A 3D face can was used in Ref. [89], where LBP descriptors were
be represented by a set of facial curves built from directly computed on the 3D face mesh surface, fusing
matched keypoints. Ref. [83] provided some extended both shape and texture information.
applications of facial curves. 3D curves were formed Another type of local method is based on
by intersecting three spheres with the 3D surface geometric features. Ref. [73] proposed a low-level
and used to compute adjustable integral kernels geometric feature approach, which extracted region-
(RAIKs) in Ref. [83]. A sequence of RAIKs generated based histogram descriptors from a facial scan.
from the surface patch around each keypoint can Feature regions include the nose and the eyes-and-
be represented by 2D images such that certain forehead, which are comparatively less affected by
characteristics of the represented 2D images have the deformation caused by facial expressions. A
a positive impact on matching accuracy, speed, and support vector machine (SVM) and fusion of these
robustness. descriptors at both feature and score level were
Ref. [88] introduced nasal patches and curves. First, applied to improve accuracy. In Ref. [79], a covariance
seven landmarks in the nasal region were detected. matrix of features was used as the descriptor for 3D
A set of planes was created using pairs of landmarks. shape analysis, rather than the features themselves.
A set of spherical patches and curves were yielded by Compared to feature-based vectors, covariance-based
the intersection of these planes with the nasal surface descriptors can fuse and encode all types of features
to create the feature descriptor. Then the feature into a compact representation [90]. Their work was
vectors were taken by concatenating histograms of x, expanded in Ref. [90].
y, and z components of the surface normal vectors
There are other local methods. In Ref. [74],
of Gabor-wavelet filtered depth maps. Features were
local surface descriptors were constructed around
selected by a genetic algorithm for stability under
keypoints, which were defined by checking the curvelet
changes in facial expression. Compared to previous
coefficient in each sub-band. Each keypoint is
methods, this method shows excellent separability.
represented by multiple attributes, such as curvelet
Recently, Ref. [93] presented a geometry and local
position, direction, spatial position, scale, and size.
shape descriptor based on the wave kernel signature
A set of rotation-invariant local features can be
(WKS) [103], to overcome distortions caused by facial
obtained by rearranging the descriptors according
expressions.
to the orientation of the key points. The method in
4.2.4 Region-based methods Ref. [84] used the regional boundary sphere descriptor
A representative local descriptor is the local binary (RBSR) to reduce the computational cost and improve
pattern (LBP) [104]. It was initially used for 2D the classification accuracy.
images. Local geometric features extracted from Ref. [91] proposed a local derivative pattern (LDP)
certain regions of the face surface can be robust to descriptor based on local derivative changes. It
face expression variations [13]. LBPs were used to can capture more detailed information than LBP.
represent the facial depth and normal information of Recently, Yu et al. [105] recommended utilizing the
each face region in Ref. [72], where a feature-based iterative closest point (ICP) with resampling and
3D face division pattern (see Fig. 11 in Ref. [72]) denoising (RDICP) method to register each face patch
3D face recognition: A comprehensive survey in 2022 667

to achieve high registration accuracy. With rigid is based on a vertical symmetry plane determined by
registration, all face patches can be used to recognize the nose tip and nose orientation. A 3D point cloud
the face, significantly improving accuracy as they are surface is transformed into a face coordinate system
less sensitive to expression or occlusion. and PCA-LDA is used to extract features from the
4.2.5 Summary range image obtained from the newly transformed
Most local methods can better handle facial data. Ref. [107] presented a method named UR3D-C,
expression and occlusion changes as they use salient which used LDA to train the dataset and compress
points and rigid feature regions, such as nose and the biometric signature to only 57 coefficients. It
eyes, to recognize a face. The main objective of local still shows high discrimination with these compact
methods is to extract distinctive compact features feature vectors. Bounding sphere representation
[13]. We summarize local methods as follows: (BSR), introduced in Ref. [108], was used to represent
• Keypoint-based methods can process partial face both depth and 3D geometric shape information by
images with missing parts or occlusion, since the projecting preprocessed 3D point clouds onto their
feature representations are generated from a set bounding spheres.
of keypoints and their geometric relationships. Shape-based spherical harmonic features (SHF)
However, if the number of keypoints is excessive, were proposed in Ref. [109], where SHFs were
the computational cost increases; if the keypoints calculated based on the spherical depth map (SDM).
are too few, some key features will be lost and SHFs can capture the gross shape and fine surface
recognition performance is affected. In addition, details of a 3D face as the strengths of spherical
algorithms for measuring the neighborhoods of harmonics at different frequencies. Ref. [110] used
keypoints play an important role as the geometric 2DPCA to extract features and employed Euclidean
relationships of keypoints are used to build feature distances for matching. Ref. [111] proposed a
vectors. computationally efficient and simple nose detection
• Most curved-based methods use radial curves algorithm. It constructs a low-resolution wide-nose
since level curves are sensitive to occlusion and eigenface space using a set of training nose regions.
missing parts. Generally, a reference point is A pixel in an input scan is determined to be the nose
required in a curve-based method. The nose tip if the mean square error between the candidate
region is rigid and has more distinctive shape feature vector and its projection on the Eigenface
features than other regions, so the nose tip is space is less than a predefined threshold.
used as the reference point in most curve-based Ref. [112] introduced a rigid-area orthogonal
methods [13]. Therefore, its detection is a crucial spectral regression (ROSR) method, where curvature
step. Inaccurate positioning of the nose tip can information was used to segment rigid facial areas and
affect the extraction of curves and compromise OSR was utilized to extract discriminative features.
results of the face recognition system. In Ref. [113], a 3D point cloud was registered in the
• Most region-based methods are robust to changes inherent coordinate system with the nose tip as the
in facial expression and pose as the feature vectors origin, and a two-layer ensemble classifier was used for
are extracted from rigid regions of the face surface. face recognition. A local facial surface descriptor was
Some also need highly accurate nose tip detection proposed by Ref. [114]. This descriptor is constructed
as the nose tip is used for face segmentation. based on three principal curvatures estimated by
asymptotic cones. The asymptotic cone is an essential
4.3 Global methods extension of an asymptotic direction to a mesh model.
Unlike local methods, global methods extract features It allows the generation of three principal curvatures
from the entire 3D face surface. They are very effective representing the geometric characteristics of each
and can perform well given complete, frontal, fixed- vertex.
expression 3D faces. Table 3 summarizes noteworthy Ref. [115] proposed a region-based 3D deformable
endeavors in this area. model (R3DM), which was formed from densely
An intrinsic coordinate system for 3D face corresponding faces. Recently, kernel PCA was used
registration was proposed by Ref. [106]. This system for 3D face recognition. As faces exhibiting non-linear
668 Y. Jing, X. Lu, S. Gao

Table 3 Global techniques


Author/year Method Advantage Limitation Database RR1 (%)

Spreeuwers (2011) [106] PCA-LDA Less registration time Expression, occlusion FRGC v2 99

Ll norm, ICP, LDA,


Ocegued et al. (2011) [107] Simulated Annealing Speed efficient Expression, occlusion FRGC v2 99.7
algorithm

Robust group sparse


Ming and Ruan (2012) FRGC v2 —
regression model Expression, pose Distorted images
[108] CASIA —
(RGSRM)

SHREC2007 97.86
Liu et al. (2012) [109] — Faster, cost-effective Expression, occlusion FRGC v2 96.94
Bosphorus 95.63

Taghizadegan et al. (2012)


PCA, Euclidean distance Expression Occlusion CASIA 98
[110]

Mohammadzade and Computation,


PCA Occlusion, pose FRGC —
Hatzinakos (2012) [111] expression

PCA, Spectral Expression,


Ming (2014) [112] Regression, the computational cost, Occlusion FRGC v2 95.24
orthogonal constraint storage space

PCA, Mahalanobis Occlusion, missing GavabDB 100


Ratyal et al. (2015) [113] Pose, expression
Cosine (MahCos) part FRGC v2 98.93

Tang et al. (2015) [114] Principal curvatures Computational cost Expression, occlusion FRGC v2 93.16

PCA, use CNN for Faster, expressions,


Gilani et al. (2017) [115] Occlusion Bosphorus 98.1
landmark detection poses

Peter et al. (2019) [116] Kernel-based PCA Higher accuracy rate — FRGC v2 —

Table 4 Hybrid techniques


Author/year Method Advantage Limitation Database RR1 (%)

Pose, occlusion, Expression, low


Passalis et al. (2011) [117] PCA UND —
missing data accuracy

SIFT-based, extended Registration-free Large pose


Huang et al. (2012) [118] FRGC v2 97.6
LBP (frontal) (alignment required)

Alyüz et al. (2012) [119] ICP, PCA, LDA Occlusion Expression Bosphorus 83.99

Fadaifard et al. (2013) Noise, computational


L1-norm Occlusion, expression GavabDB 86.89
[120] efficiency

Occlusion, missing Bosphorus —


Alyüz et al. (2013) [121] ICP, PCA, LDA Expression
data UMBDB —

Bagchi et al. (2014) [122] ICP, PCA Pose, occlusion Pose Bosphorus 91.3

GavabDB 96.92
Bagchi et al. (2015) [123] ICP, KPCA Pose Expression Bosphorus 96.25
FRAV3D 92.25

Liang et al. (2017) [124] HK classification Pose Expression Bosphorus 94.79

shapes, non-linear PCA was used in Ref. [116] to pose and scale may affect recognition accuracy
extract 3D face features, as it has notable benefits when using global features, as global algorithms
for data representation in high-dimensional space. create discriminating features based on all visible
To sum up, most global methods have faster facial shape information. This requires accurate
speed and lower computational demands, but they normalization for pose and scale. However, it is not
are unsuitable for handling occluded faces or faces easy to perform accurate pose normalization given
with missing parts. In addition, variations in noisy or low-resolution 3D scanning.
3D face recognition: A comprehensive survey in 2022 669

4.4 Hybrid methods become one of the most popular approaches for face
Hybrid face recognition systems use both local recognition. Compared to conventional approaches,
features and global features. A comparison of recent deep learning-based methods have great advantages
hybrid methods is provided in Table 4. over image processing [125]. For conventional
Ref. [117] used an automatic landmark detector methods, the key step is to find robust feature
to estimate poses and detect occluded areas, and points and descriptors based on geometric information
utilized facial symmetry to deal with missing data. in 3D face data [51]. Compared to end-to-
Ref. [118] proposed a hybrid matching scheme using end deep learning models, these methods have
multiscale extended LBP and SIFT-based strategies. good recognition performance, but involve relatively
In Ref. [119], the problem of external occlusion was complex algorithmic operations to detect key features
addressed and a two-step registration framework was [51]. For deep learning-based methods, robust face
proposed. First, a non-occluded model is selected representations can be learned by training a deep
for each face with the occluded parts removed. neural network on large datasets [51], which can
Then a set of non-occluded distinct regions is used hugely improve face recognition speed.
to compute the masked projection. This method There are a variety of deep neural networks for
relies on accurate nose tip detection, and results facial recognition. Convolutional neural networks
are adversely affected if the nose area is occluded. (CNN) are the most popular. The robust and
Ref. [121] extended this work in 2013. Ref. [120] discriminative feature representations learned via
proposed a scale-space-based representation for 3D CNN can significantly improve the accuracy of
shape matching which is stable in the presence of face recognition, as demonstrated by Refs. [42, 51].
surface noise. Recently, graph convolutional networks (GCNs) have
In Ref. [122], Bagchi et al. used ICP to register a also been considered in the face recognition field to
3D range image and PCA to restore the occluded solve the problem of large face deformation in real life.
region. This method is robust to the noise and GCNs utilize filters to identify high-level similarities
occlusion. Later, they improved the registration between nodes by extracting high-dimensional features
method and proposed an across-pose method in of nodes and their neighborhoods in a graph [58].
Ref. [123]. Ref. [124] also proposed a 3D face Figure 4 depicts a common face recognition process
recognition method with pose-invariant and a coarse- based on a deep-CNN (DCNN). In the training
to-fine approach to detect landmarks under large yaw phase, the training dataset is preprocessed (see
variations. At the coarse search step, HK curvature Section 3) to generate a unified feature map. The
analysis is used to detect candidate landmarks feature map is resized to fit the input tensor to the
and subdivide them according to the classification DCNN architecture (in terms of the height, width,
strategy based on facial geometry. At the fine and number of channels of the input network layer,
search step, the candidate landmarks are identified and the number of images). Then, the DCNN is
and marked by comparison with the face landmark trained with the preprocessed maps. In the testing
model. phase, a 3D face scan is selected for each identity
Hybrid 3D face recognition methods may use more from the test dataset as the identity dataset. The
complex structures than local or global methods, and feature representations of the identity dataset are
as a result, may achieve better recognition accuracy obtained through the trained network as the feature
at a higher computational cost. As in global methods, library. Then, the feature vector of the probes
face registration is an important step for hybrid 3D is obtained from the trained DCNN and used to
methods, especially for overcoming pose variation match the features in the given gallery. In the
and occlusion. matching process, the feature gallery is scanned and
the distance between each feature representation and
5 Deep learning-based 3D face recognition the feature vector of the probed surface is calculated.
The identity with the closest matching distance is
5.1 Overview
then returned.
In the last decade, deep neural networks have With the application of CNN, the accuracy of 2D
670 Y. Jing, X. Lu, S. Gao

Fig. 4 Overview of 3D deep learning-based face recognition methods.

face recognition systems (DeepFace [126], DeepID Table 5 Network architectures based on 3D deep learning techniques
series [127–130], VGG-Face [16], FaceNet [131]) Category Backbone/architecture Reference
has significantly improved. In these systems, face
VGG-Face [51, 134]
representations are learned directly from 2D facial
ResNet [5, 135, 136]
images by training deep neural networks on large 2D-input
MobileNet [137]
datasets. Accuracy is close to 100% on some specific
Others [42, 138–144]
databases (such as LFW). The high recognition
rate for 2D face recognition shows that CNN- 3D-input PointNet++, PointFace [145–148]
based methods are superior to the conventional
Graph-input GCN [149]
feature extraction methods. Based on the intrinsic
advantages of 3D faces relative to 2D faces in
handling uncontrolled conditions such as changes in 5.2 2D-input networks
pose, illumination, and expression, researchers are Kim et al. [51] proposed the first 3D face recognition
attracted to applying DCNNs to 3D face recognition. model with DCNN. They adopted VGG-Face [16]
Indeed, some of these 2D face recognition networks pre-trained on 2D face images as their network, and
are still being used by some 3D methods. In such 3D then fine-tuned the network with augmented 2D depth
face recognition methods, 3D faces are converted into maps. The last FC layer of VGG-Face is replaced
2D maps as input of the network. Other networks by a new last FC layer and a softmax layer. In the
directly accept 3D data as input, such as PointNet new last layer, weights are randomly initialized using
[132], PointNet++ [133]. Based on the input formats a Gaussian distribution with a mean of zero and a
of the network, we classify deep learning-based 3D standard deviation of 0.01. The size of the dataset
face recognition methods into three categories: 2D- is expanded by augmenting the 3D point cloud of
input, 3D-input, and graph-input networks. Table 5 face scans with expression and pose variations during
lists these approaches; details of these methods are the training phase. A multi-linear 3DMM is used
discussed below. to generate more data, including variations in both
3D face recognition: A comprehensive survey in 2022 671

Table 6 Deep learning-based methods


Author/year Network Data augmentation Loss function Matching Database RR1 (%)

Fine-tuning
Kim et al. (2017) [51] 3DMM — Cosine distance Bosphorus 99.2
VGG-Face

Zulqarnain Gilani and Synthesized from


FR3DNet — Cosine distance Texas-3D 100.0
Mian (2018) [42] original 3D face pairs

Fine-tuning CurtinFace
Ding et al. (2019) [134] — — SVM 93.41
VGG-Face [150]

Feng et al. (2019) [139] ANN — — — CASIA 98.44

Euclidean
Xu et al. (2019) [138] LeNet5 — — CASIA —
distance

Randomly occlude
AMSoftmax
Tan et al. (2019) [136] ResNet-18 depth-maps with 1-6 Cosine distance CASIA 99.7
[151]
patches

Pose augmentation,
Cross entropy
Mu et al. (2019) [140] MSFF shape jittering, shape Cosine distance Lock3DFace 84.22
loss
scaling

Binary Similarity tensor


Feature tensor based
Lin et al. (2019) [135] ResNet-18 cross-entropy calculated from 2 Bosphorus 99.71
augmentation
loss feature tensors

Olivetti et al. (2020) Rotate the original


MobileNetV2 — — Bosphorus 97.56
[137] depth map

Linear SVM
Dutta et al. (2020) [141] SpPCANet — — Frav3D 96.93
[152]

Generate depth
Lin et al. (2021) [143] MQFNet image by pix2pix Weighted loss — Lock3DFace 86.55
[153]

Pre-ResNet-34,
Resolution Multi-scale Euclidean
Cai et al. (2019) [5] Pre-ResNet-24, FRGC v2 100
augmentation triplet loss distance
Pre-ResNet-14

Cao et al. (2020) [142] ANN — — — H3D [142] —

Mask-guided Generate depth Attribute-aware


Chiu et al. (2021) [144] Cosine distance Lock3DFace 96.43
RGB-D Net image by DepthNet loss [154]

Bhople and Prakash P-level data Squared


Triplet Net Triplet loss Bosphorus 97.55
(2021) [147] augmentation distances

Feature
Jiang et al. (2021) [148] PointFace Random crop Cosine distance Lock3DFace 87.18
similarity loss

Zhang et al. (2021) [145] PointNet++ GPMM-based Triplet loss Cosine similarity Bosphorus 99.68

Papadopoulos et al.
Face-GCN — — — BU4DFE 88.45
(2021) [149]

shape (α) and expression (β). A 3D point cloud can cloud to produce pose variations. During data
be represented by preprocessing, a nose tip is first found in the 3D
X = X + Ps α + Pe β (1) point cloud, and then the 3D point cloud is cropped
where X is the average facial point cloud, Ps is within a 100 mm radius. Classical rigid-ICP [157]
the shape information provided by the Basel Face between the cropped 3D data and the reference face
Model [155], and Pe is an expression provided by model is used to align the 3D data. In order to fit
FaceWarehouse [156]. Expression variations are the input size of the CNN architecture, the aligned
created by randomly changing the expression β 3D data are orthogonally projected to 2D images to
parameter in the 3DMM. Randomly generated rigid generate a 224×224×3 depth map. In addition, eight
transformations are applied to the input 3D point 18 × 18 patches are randomly placed on the depth
672 Y. Jing, X. Lu, S. Gao

map to simulate occlusion and prevent overfitting to maps built from 3D raw data. The outputs of
specific regions of the face. The model was evaluated the two feature layers are fused as the final input
on three public 3D databases: Bosphorus [30], BU3D- to an artificial neural network (ANN) recognition
FE [26], and 3D-TEC [35], yielding recognition rates system. It was tested on CASIA (V1) and compared
of 99.2%, 95.0%, and 94.8%, respectively. recognition rates using the 2D feature layer, 3D
A deep 3D face recognition network (FR3DNet) [42] feature layer, and the fusion of both layers. A higher
was trained on 3.1 million 3D faces; it is specifically RR1 (98.44%) was obtained with fused features.
designed for 3D face recognition. It is also based on Xu et al. [138] also designed a dual neural network
VGG-Face [16]. A rectifier layer is added for every to reduce the number of training samples needed. The
convolutional layer. Compared to Kim et al.’s work network consists of a dual-channel input layer that
[51], a much larger dataset is generated and expanded can fuse a 2D texture image and a 3D depth map into
by new identities. A new face F̂ is generated from one channel, and two parallel LeNet5-based CNNs.
a pair of faces (Fi , Fj ) with the maximum non-rigid Each CNN processes the fused image separately to
shape difference: obtain its feature maps, which are used to calculate
F̂ = (Fi + Fj )/2 (2) similarity. The gray-scale depth map obtained from
The synthetic faces generated by this method have the point cloud, combined with the corresponding
richer shape changes and details than statistical face 2D texture, is used as the dual-channel input. The
models [155]. However, the computational cost is very most important preprocessing step is face hole filling,
high as they are all generated from high-dimensional
to provide a more intact face. The basic idea is to
raw 3D faces. In addition, 15 synthetic cameras are
first extract the 3D hole edge points, then project the
deployed in the frontal hemisphere of the 3D face to
hole edge points onto the 2D mesh plane to fill the
simulate pose variations and occlusions in each 3D
hole points, and map them back to the original 3D
scan. To fit the input of FR3DNet, the 3D point
point cloud. Experiments were conducted to show
cloud data are preprocessed to a 160 × 160 × 3 image
the influence of depth map features and the size of
[155]. Before aligning and cropping the face, the
training set on the accuracy of recognition rate.
point cloud is converted into a three-channel image.
Tan et al. [136] designed a framework to specifically
These three channels indicate three surfaces generated
process the low-quality 3D data captured by portable
by using the gridfit algorithm [158] to give depth
3D acquisition hardware such as mobile phones. The
map z(x, y), azimuth map θ(x, y), and elevation map
framework includes two parts: face registration and
φ(x, y), where θ and φ are azimuth and elevation
angles of the normal vectors of the 3D point cloud face recognition. At the face registration stage, a
surface, respectively. Experiments were conducted PointNet-like deep registration network (DRNet) is
on most public databases; the highest recognition used to reconstruct the dense 3D point cloud from low-
accuracy was achieved on the Texas-3D database, quality sequences. The DRNet is based on ResNet-18
reaching 100%. and takes a pair of 256 × 256 × 3 coordinate-maps
Ding et al. [134] proposed an SVM based RGB- as input. To obtain the desired sparse samples from
D face recognition algorithm combining 2D color the raw datasets, noise and random pose variation
and 3D depth features. A fine-tuned VGG-face are added to the face scan. Then the new point
network is used to determine 2D features from color cloud is projected onto a 2D plane with 1000 grid
images. 3D geometric features are obtained by cells of the same size. A sparse face of 1000 points is
computing expression-invariant geodesic distances obtained by randomly selecting a point from each cell.
between facial landmarks on a 3D face mesh. The 2D Six sparse faces are generated from each face scan
and 3D features are then used as RGB-D classifiers and passed to DRNet to generate a new dense point
to train the SVM. Experiments were performed on cloud. Then the fused data are used as the input
the CurtinFace [150] database, achieving good results to a face recognition network (FRNet) also based on
with pose variations (93.41%) and neutral expression ResNet-18. Compared to FR3DNet, its facial RR1
variations (100%). on UMBDB is higher, reaching 99.2%.
Feng et al. [139] adopted a two-DCNN module Mu et al. [140] proposed a lightweight CNN for
to extract features from color images and depth 3D face recognition, especially for low-quality data.
3D face recognition: A comprehensive survey in 2022 673

This network contains 4 blocks with 32, 64, 128, and input data instead of pure facial depth maps. The
256 convolution filters. The feature maps from these selection of geometric feature descriptors is based on
four convolutional blocks are captured by different the GH-EXIN network. The reliability of geometric
receptive fields, downsampled to fixed size by max- descriptors based on curvature is demonstrated in
pooling and integrated to form another conversion Ref. [159]. The input is a three-channel image
block. This process is completed by a multi-scale including the 3D facial depth map, the shape index,
feature fusion module. The aim is to efficiently and the curvedness, which enhances the accuracy of
improve the representation of low-quality face data. the network. A 97.56% RR1 was achieved on the
A spatial attention vectorization (SAV) module is Bosphorus database.
used to replace the global average pooling layer (also Dutta et al. [141] also proposed a lightweight sparse
used by ResNet) to vectorize feature maps. The SAV principal component analysis network (SpPCANet).
highlights important spatial facial clues and conveys It includes three parts: a convolutional layer, a
more discriminative cues by adding an attention nonlinear processing layer, and a feature merging
weight map to each feature map. In addition, layer. For data preprocessing, usual methods are
three methods are used to augment the training used to detect and crop the face area. First, an ICP-
data: pose generation (by adjusting virtual camera based registration technology is used to register a 3D
parameters), shape jittering (by adding Gaussian point cloud, and then the 3D point cloud is converted
noise to simulate rough surface changes), and shape into a depth image. Finally all faces are cropped to
scaling (by zooming in 1.1× to the depth face image). rectangles based on the position of the nose tip. The
system obtained a 98.54% RR1 on Bosphorus.
During data preprocessing, as in the above methods,
Lin et al. [135] adopted ResNet-18 [17] as the
a 10 × 10 patch surface is first cropped around the
backbone of their network. The big difference
given nose tip with outlier removal. Then the cropped
from other work is their data augmentation method.
3D point cloud is projected onto a 2D space (depth
Instead of generating 3D face samples from 2D face
surface) to generate a normal map image.
images, they generated feature tensors directly based
Lin et al. [143] also designed a multi-quality fusion
on Voronoi diagram subdivision. The salient points
network (MQFNet) to improve the performance of
are detected from a 3D face point cloud with its
low-quality 3D face recognition. First, the pix2pix
corresponding 2D face image and divided into 13
network [153] is used to generate high-quality depth
subdivisions based on the Voronoi diagram. The face
maps from low-quality faces. To avoid the effect
can be expressed as F = [f1 , · · ·, f13 ] and the sub-
of loss of identity features in images generated by
feature is SubFi . The feature tensor is extracted
pix2pix, MQFNet contains two pipelines to extract from a 3D mesh by detecting the salient points and
and fuse features from low-quality and high-quality integrating features of all the salient points, which
images and to generate more discriminative features. can be represented as
This work was also tested on the Lock3DFace
database. Compared to Ref. [140], the average F k = ∪13 k
i=1 SubFi , k = 1, · · ·, K (3)
accuracy was improved by 8.11%. where K is the number of 3D face samples of the
Olivetti et al. [137] proposed a method based on same person. A new feature set can be synthesized
MobileNetV2. MobileNet is a comparatively new by randomly choosing the ith sub-feature set from the
neural network specifically designed for mobile phones. K samples. Excellent results were achieved on both
It is easy to train and requires only a few parameters Bosphorus and BU3D-FE databases with accuracies
to be tuned. This work was based on the Bosphorus of 99.71% and 96.2%, respectively.
database, which only contains 105 identities with Cai et al. [5] designed three deep residual networks
4666 images. To obtain sufficient training samples, with different layers based on ResNet: Pre-ResNet-
they augmented the data by rotating the original 14, Pre-ResNet-24, and Pre-ResNet-34. Multi-scale
depth map (clockwise 25circ , counterclockwise 40circ ) triplet loss supervision is constructed by combining a
and horizontally minoring each depth map. The softmax loss and the two triplet losses as supervision
most important part of their work is the input on the last fully connected layer and the last feature
data for DCNN. Geometric descriptors are used as layer. To enlarge the size of the training set, the data
674 Y. Jing, X. Lu, S. Gao

are augmented in three ways: pose augmentation guided face recognition module. DepthNet is used
based on 3D scans, resolution augmentation, and to convert 2D face datasets to RGB-D face datasets
transformational augmentation based on the range and to address the a lack of identities in 3D face
images. For the preprocessing algorithm, raw 3D datasets for network training. The augmented 3D
data are converted into a 96 × 96 range image and database is used to train a mask-guided RGB-D
only the center of the two pupils and nose tip are face recognition network, which has a two-stream
used for alignment. For the preprocessing algorithm, multi-head architecture with three branches: an
three overlapping face components (the upper half RGB recognition branch, a depth map recognition
face, the small upper half face, and the nose tip) branch, and an auxiliary segmentation mask branch
and the entire facial region are generated from the with a spatial attention module. The latter shares
raw 3D data. The most important part of this weights between the two recognition branches and
method is detecting the nose tip and two pupils. provides auxiliary information from the segmentation
Three landmarks are detected from the 2D textured branch to help the recognition branches focus on
image of the corresponding 3D face data and are informative parts such as eyes, nose, eyebrows, and
mapped to the 3D model. Then, a new nose tip is lips. This module achieved good results on multiple
calculated by taking the highest point of the nose databases and had a higher average accuracy (96.43%)
region (centered on the tip of the nose with a radius of on Lock3DFace than Refs. [140, 143].
25 mm). The nose tip is re-detected on the 3D model 5.3 3D-input networks
as the 2D domain detection may reduce detection Bhople et al. [146] utilized a Siamese network with
accuracy due to pose variations. Another reason for PointNet-CNN (PointNet implementation with CNN)
detecting the nose tip by this means is that the lower to determine the similarity and dissimilarity of point
dimensional feature vectors generated can be used to cloud data. In Ref. [147], they continued their work
detect the new nose tip to reduce computational cost. and proposed a triplet network with triplet loss, in a
Finally, the feature vectors of the four patches can be variant of the Siamese network. The triplet network
used alone or in combination for matching. It obtained is a shared network consisting of three parallel
high accuracy on four public 3D face databases: FRGC symmetric CPNs using the PointNet architecture
v2, Bosphorus, BU-3DFE, and 3D-TEC, with 100%, and CNN. The input to the network is a triplet of
99.75%, 99.88%, and 99.07%, respectively. three 3D face scans: positive, anchor, and negative.
Cao et al. [142] believed that the key to a reliable Since the input is in the form of a triplet, more
face recognition system is rich data sources and information is captured during training. The Siamese
paid more attention to data acquisition. Therefore, network and the triplet network were compared in
a holoscopic 3D (H3D) face image database was terms of recognition rates on the Bosphorus and IIT
created, which contains 154 raw H3D images. H3D Indore databases. The triplet network achieved better
imaging was recorded by using a regularly closely accuracy on the IIT Indore database, but not on
packed array of small lenses connected to a recording Bosphorus. They also performed point cloud-level
device. It can display 3D images with continuous data augmentation by rotating the point cloud data
parallax and full-color images can be viewed from a in a fixed orientation by randomly perturbing the
wider viewing area. A wavelet transform is used for points by a small rotation and jittering the position
feature extraction, as it performs well in the presence of each point slightly.
of illumination change and face orientation change, PointFace [148] consists of two weight-shared
reducing image information redundancy and retaining encoders that extract discriminative features from a
the most important facial features. While this is a pair of point cloud faces. In the training phase, each
new direction for 3D face recognition, the accuracy encoder learns identity information from each sample
of this method is quite low, only reaching just over itself, while using feature similarity loss to evaluate
80% on the H3D database. the embedding similarity of two samples. The feature
Chiu et al. [144] applied an attention mechanism similarity loss function can be represented by
M
in a face recognition system that has two main parts:
[D(fia , fip ) + m − D(fia , fin )]
X
a depth estimation module (DepthNet) and a mask- Lsim = (4)
i=1
3D face recognition: A comprehensive survey in 2022 675

where fia , fip , and fin , i = 1, · · ·, M , are the L2 recognition accuracy and run quickly. For example,
normalized feature vectors of the anchor, positive, and Ref. [42] gets 100% RR1 on Texas-3D and Ref. [5] only
negative samples, respectively. D(., .) is the distance requires 0.84 s to identify a target face from a gallery
between two vectors. The encoder can distinguish of 466 faces. There are three important parts in a
3D faces from different individuals and compactly deep learning-based system: data preprocessing, data
cluster features coming from faces of the same person. augmentation, and network architecture. Usually, the
Compared to Refs. [140, 143], the model has higher input data need to be preprocessed (face registration)
average accuracy (87.18%) on Lock3DFace. to find correspondences between all vertices of the
Zhang et al. [145] proposed a 3D face point cloud face mesh, since CNNs are usually intolerant of
recognition framework based on PointNet++. It pose changes. Deep learning-based methods always
consists of three modules: training data generator, require a large amount of data to train the network,
face point cloud network, and transfer learning. The especially if training the network from scratch. To
most important part of this work is the training avoid this, some works [51, 163] transfer learning
data generator. All training sets are unreal data, from a pre-trained model and fine-tune the network
synthesized by sampling from a statistical 3DMM of on a small dataset, which also takes less training time.
face shape and expression based on a GPMM [160]. But lacking large-scale 3D face datasets is still an
This method addresses the problem of lack of a large open problem for DCNN-based 3D face recognition
training dataset. After classification training, triplet research. Data augmentation is an important method
loss is used to fine-tune the network with real faces to enlarge 3D face databases by generating new faces
to give better results. from existing ones. In addition, adopting a suitable
network is important. Most of the above-reviewed
5.4 Graph-input networks
works use a single CNN but a few use dual CNNs, such
Papadopoulos et al. [149] introduced a registration- as Ref. [138]. As more networks are adopted in this
free method for dynamic 3D face recognition based field, reorganization of existing network architectures
on spatiotemporal graph convolutional networks (ST- may also be a topic of future research.
GCN). First, facial landmarks are estimated from a
3D mesh. Landmarks alone are insufficient for facial
6 Discussion
recognition because crucial geometric and texture
information are left out. To describe local facial In the past decade, 3D face recognition has achieved
patterns around landmarks, new points are first significant advances in 3D face databases, recognition
interpolated between the estimated landmarks, and rates, and robustness to face data variations, such as
then a kD-tree search is used to find the closest low-resolution, expression, pose, and occlusion. In
points to each landmark. For each frame, a facial this paper, conventional methods and deep learning-
landmark corresponds to a vertex (vi ∈ V ) in a graph based methods have been thoroughly reviewed in
(G = V, E), and the landmarks are connected as Sections 4 and 5, respectively. Based on the feature
spatial edges according to a defined relationship. For extraction algorithms, conventional methods are
the 3D sequences of meshes, identical landmarks are divided into three types: local, global, and hybrid
connected in consecutive frames as temporal edges. methods.
This work was tested on BU4DFE with an average • Local feature descriptors extract features from
accuracy of 88.45%. The performance is not as good small regions of a 3D facial surface. In some cases,
as other state-of-the-art methods [42, 161, 162], but it the region can be reduced to small patches around
demonstrates the feasibility of using GCN for dynamic detected keypoints. The number of extracted
3D face recognition. local descriptors is related to the content of the
input face (entire or partial). It is commonly
5.5 Summary assumed that only a small number of facial
This section has reviewed deep learning-based 3D regions are affected by occlusion, missing data,
face recognition techniques and classified them into or distortion caused by data corruption, while
three categories based on their network input formats. most other regions persist unchanged. Face
Most deep learning-based 3D methods achieve high representation is derived from a combination of
676 Y. Jing, X. Lu, S. Gao

many local descriptors. Therefore, local facial face recognition systems, the following (future)
descriptors are not compromised when dealing directions are suggested, concerning new face data
with changes to a few parts caused by facial generation, data preprocessing, network design, and
expressions or occlusion [87]. loss functions.
• A global representation is extracted from an entire • Large-scale 3D face databases. Current 3D
3D face, which usually makes global methods face databases are often smaller than their
compact and therefore computationally efficient. counterparts in 2D color face recognition; nearly
While these methods can achieve great accuracy all the deep learning-based 3D face recognition
in the presence of complete neutral faces, they methods fine-tune pre-trained networks on
rely on the availability of full face scans and are converted data from 3D faces. Larger-scale 3D
sensitive to face alignment, occlusion, and data face databases could enable training from scratch
corruption. and improve recognition difficulty, closing the gap
• Hybrid methods can handle more conditions, such to real-world applications.
as pose and occlusion variations. • Augmenting face data. As Section 5 notes, almost
Since 2016, much research on deep learning-based every proposed method provides a strategy for
3D face recognition has been carried out. Table 7 augmenting face training data, as a large amount
summarizes the RR1 of our surveyed methods tested of training data are required to train networks.
on different databases. Compared to conventional A network trained with sufficient data can better
face recognition algorithms, deep learning-based distinguish features, while a small number of
methods have the advantages of simpler pipelines samples may result in overfitting. We can increase
and greater accuracy. the size of the 3D database by generating more
To improve the accuracy and performance of images for existing identities or synthesizing new

Table 7 RR1 (%) of deep learning-based methods on various databases. H=high-quality image. L=low-quality image. F=fine-tuning
FRGC BU3D- BU4D- Texas- 3D- ND-
Reference Bosphorus CASIA GavabDB UMBDB Lock3DFace
v2 FE FE 3D TEC 2006

[51] — 95.00 — 99.20 — — — 94.80 — — —

[42] 97.06 98.64 95.53 96.18 98.37 96.39 100 97.90 91.17 95.62 —

[42] (F) 99.88 99.96 98.04 100 99.74 99.70 100 99.12 97.20 99.13 —

[139] — — — — 85.93 — — — — — —

[5] 100 99.88 — 99.75 — — — 99.07 — — —

[135] — 96.20 — 99.71 — — — — — — —

[137] — — — 97.56 — — — — — — —

[136] — — — 99.20 99.70 — — — 99.2 — —

[52] 92.74 — — 93.38 — — — — — — —

[52] (F) 98.73 — — 97.50 — — — — — — —

[140] (H) — — — 91.27 — — — — — — —

[140] (L) — — — 90.70 — — — — — — —

[141] — — — 98.54 88.80 — — — — — —

[143] — — — — — — — — — — 86.55

[144] 99.27 100 — 100 100 — — — — — 96.43

[147] — — — 97.55 — — — — — — —

[145] 99.60 — — 99.68 — — — — — — —

[148] — — — — — — — — — — 87.18

[149] — — 88.45 — — — — — — — —
3D face recognition: A comprehensive survey in 2022 677

identities. Common ways to generate new images recognition, highly discriminative features are
are: rotating and cropping existing 3D data, or required because the difference between two faces
using 3DMM to slightly change the expression. may be small, such as in twins. Therefore,
To generate new identities, some model is applying loss functions to supervise the network
designed to synthesize new faces from existing layers has become one active research topic.
identities [42, 155, 164, 165]. Recently, generative For example, Ref. [5] adopted multi-scale loss
adversarial networks (GANs) have been used supervision to improve extraction efficiency by
for face augmentation where a face simulator combining one softmax loss and two triplet losses.
is trained to generate realistic synthetic images. In addition to the above issues, researchers
Recent works are summarized in Section 3. can consider combining conventional methods with
• Data preprocessing. This is also key to improving CNNs. For example, keypoint detection techniques
face recognition accuracy. Besides removing in conventional 3D face recognition methods could be
redundant information, another goal of data incorporated into the deep learning-based methods to
preprocessing is to perform registration. A well- better pay attention to the area of interest. 3D face
known problem of rigid-ICP registration is that recognition methods for low-quality (low-resolution)
it cannot guarantee optimal convergence [51]: data also need more work.
it may not be possible to accurately register To apply 3D face technology to real-world
all 3D faces in different poses to the reference applications, several things need to be considered:
face. For 3D input networks, all 3D faces are recognition time, quality of the input data, and pose
taken as point-to-point correspondences for non- and expression variations of the subject. Lightweight
rigid registration. As an alternative, Ref. [149] networks [140, 141] can reduce recognition time
proposed a registration-free method based on and improve efficiency. In Refs. [136, 140, 143],
GCN to avoid this step. However, further work is representations of low-quality face data are improved
needed to improve face recognition performance. by fusing features from high-quality images. To handle
• Data conversion. As Section 5 explains, some pose and expression variations, the network can be
works are based on 2D-input networks. To use trained using face datasets with rich expressions and
them, better conversion techniques (e.g., from 3D pose changes to improve its robustness. Furthermore,
faces to 2D maps) would improve face recognition dynamic 3D face recognition using 3D face sequences
performance. as input should be considered in the future.
• Network architecture. Many networks are
available for 3D face recognition (see Table 5). 7 Conclusions
Some researchers directly adopt pre-trained
networks and then fine-tune them using training 3D face recognition has become an active and popular
data generated from 3D faces, which can greatly research topic in the field of image processing and
improve the training speed. Also, dual or multiple computer vision in recent years. In this paper, a
networks can be used to handle different tasks, summary of public 3D face databases is first provided,
as did Refs. [5, 138]. followed by a comprehensive survey on 3D face
• Appropriate loss functions. Using effective recognition methods proposed in the past decade.
loss functions can reduce the complexity of They are divided into two categories based on their
training and improve feature learning capabilities. feature extraction methods: conventional and deep
Most loss functions share a similar basic idea learning-based.
and aim to facilitate the training process by Conventional techniques are further classified into
amplifying discriminative features from different local, global, and hybrid methods. We have reviewed
individuals and compacting clustering features these methods by comparing their performance
from the same individual. A commonly used loss on different databases, computational cost, and
function is the softmax loss, which encourages robustness to expression change, occlusion, and pose
separability between classes but is incapable of variation. Local methods can better handle face
supporting compactness within classes. For face expressions and occluded images at the cost of greater
678 Y. Jing, X. Lu, S. Gao

computation than global methods. Hybrid methods on deeply learned face representation. Neurocomputing
can achieve better results and address challenges such Vol. 363, 375–397, 2019.
as pose variation, illumination change, and facial [6] Zhou, S.; Xiao, S. 3D face recognition: A survey.
expressions. Human-Centric Computing and Information Sciences
We have reviewed recent advances in 3D face Vol. 8, No. 1, 35, 2018.
recognition based on deep learning, mainly focusing [7] Blackburn, D. M.; Bone, M.; Phillips, P. J. Face
on face augmentation, data preprocessing, network recognition vendor test 2000: Evaluation report.
architecture, and loss functions. According to the Technical report. Defense Advanced Research Projects
input formats of the network adopted, the deep Agency Arlington VA, 2001.
learning-based 3D face recognition methods may be [8] Phillips, P. J.; Grother, P.; Micheals, R.; Blackburn,
broadly divided into 2D-input, 3D-input, and graph- D. M.; Tabassi, E.; Bone, M. Face recognition vendor
test 2002. In: Proceedings of the IEEE International
input networks. With these powerful networks, the
SOI Conference, 44, 2003.
performance of 3D face recognition has been greatly
[9] Phillips, P. J.; Flynn, P. J.; Scruggs, T.; Bowyer, K.
improved.
W.; Chang, J.; Hoffman, K.; Marques, J.; Min, J.;
We have also discussed the characteristics and
Worek, W. Overview of the face recognition grand
challenges involved, and have provided potential
challenge. In: Proceedings of the IEEE Computer
future directions for 3D face recognition. For instance,
Society Conference on Computer Vision and Pattern
large-scale 3D face databases are greatly needed to Recognition, 947–954, 2005.
advance 3D face recognition in the future. We believe [10] Phillips, P. J.; Scruggs, W. T.; O’Toole, A. J.; Flynn,
our survey will provide valuable information and P. J.; Bowyer, K. W.; Schott, C. L.; Sharpe, M. FRVT
insight to readers and the community. 2006 and ICE 2006 large-scale experimental results.
IEEE Transactions on Pattern Analysis and Machine
Declaration of competing interest Intelligence Vol. 32, No. 5, 831–846, 2010.
The authors have no competing interests to declare [11] Abate, A. F.; Nappi, M.; Riccio, D.; Sabatino,
that are relevant to the content of this article. G. 2D and 3D face recognition: A survey. Pattern
Recognition Letters Vol. 28, No. 14, 1885–1906, 2007.
References [12] Smeets, D.; Claes, P.; Hermans, J.; Vandermeulen,
D.; Suetens, P. A comparative study of 3-D
[1] Patil, H.; Kothari, A.; Bhurchandi, K. 3-D face face recognition under expression variations. IEEE
recognition: Features, databases, algorithms and Transactions on Systems, Man, and Cybernetics, Part
challenges. Artificial Intelligence Review Vol. 44, No. C (Applications and Reviews) Vol. 42, No. 5, 710–727,
3, 393–441, 2015. 2012.
[2] Zhou, H. L.; Mian, A.; Wei, L.; Creighton, D.; Hossny, [13] Soltanpour, S.; Boufama, B.; Jonathan Wu, Q. M. A
M.; Nahavandi, S. Recent advances on singlemodal survey of local feature methods for 3D face recognition.
and multimodal face recognition: A survey. IEEE Pattern Recognition Vol. 72, 391–406, 2017.
Transactions on Human-Machine Systems Vol. 44, No. [14] Guo, G. D.; Zhang, N. A survey on deep learning
6, 701–716, 2014. based face recognition. Computer Vision and Image
[3] Bowyer, K. W.; Chang, K.; Flynn, P. A survey of Understanding Vol. 189, 102805, 2019.
approaches and challenges in 3D and multi-modal 3D [15] Masi, I.; Wu, Y.; Hassner, T.; Natarajan, P. Deep
+ 2D face recognition. Computer Vision and Image face recognition: A survey. In: Proceedings of the
Understanding Vol. 101, No. 1, 1–15, 2006. 31st SIBGRAPI Conference on Graphics, Patterns
[4] Huang, G. B.; Mattar, M.; Berg, T.; Learned-Miller, and Images, 471–478, 2018.
E. Labeled faces in the wild: A database for studying [16] Parkhi, O. M.; Vedaldi, A.; Zisserman, A. Deep face
face recognition in unconstrained environments. In: recognition. In: Proceedings of the British Machine
Proceedings of the Workshop on Faces in ‘Real- Vision Conference, 1–12, 2015.
Life’ Images: Detection, Alignment, and Recognition, [17] He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J.
2008. Deep residual learning for image recognition. In:
[5] Cai, Y.; Lei, Y. J.; Yang, M. L.; You, Z. S.; Shan, S. G. Proceedings of the IEEE Conference on Computer
A fast and robust 3D face recognition approach based Vision and Pattern Recognition, 770–778, 2016.
3D face recognition: A comprehensive survey in 2022 679

[18] Lawrence, S.; Giles, C. L.; Tsoi, A. C.; Back, A. [29] Faltemier, T. C.; Bowyer, K. W.; Flynn, P. J.
D. Face recognition: A convolutional neural-network Using a multi-instance enrollment representation to
approach. IEEE Transactions on Neural Networks Vol. improve 3D face recognition. In: Proceedings of the
8, No. 1, 98–113, 1997. IEEE International Conference on Biometrics: Theory,
[19] Howard, A.; Zhmoginov, A.; Chen, L. C.; Sandler, M.; Applications, and Systems, 1–6, 2007.
Zhu, M. Inverted residuals and linear bottlenecks: [30] Savran, A.; Alyüz, N.; Dibeklioğlu, H.; Çeliktutan,
Mobile networks for classification, detection and O.; Gökberk, B.; Sankur, B.; Akarun, L. Bosphorus
segmentation. arXiv preprint arXiv:1801.04381, database for 3D face analysis. In: Biometrics and
2018. Identity Management. Lecture Notes in Computer
[20] Colombo, A.; Cusano, C.; Schettini, R. UMB- Science, Vol. 5372. Schouten, B.; Juul, N. C.;
DB: A database of partially occluded 3D faces. In: Drygajlo, A.; Tistarelli, M. Eds. Springer Berlin
Proceedings of the IEEE International Conference on Heidelberg, 47–56, 2008.
Computer Vision Workshops, 2113–2119, 2011. [31] Heseltine, T.; Pears, N.; Austin, J. Three-dimensional
[21] Beumier, C.; Acheroy, M. Automatic 3D face face recognition using combinations of surface
authentication. Image and Vision Computing Vol. 18, feature map subspace components. Image and Vision
No. 4, 315–321, 2000. Computing Vol. 26, No. 3, 382–396, 2008.
[32] Ter Haar, F. B.; Daoudi, M.; Veltkamp, R. C.
[22] Hesher, C.; Srivastava, A.; Erlebacher, G. A novel
SHape REtrieval contest 2008: 3D face scans. In:
technique for face recognition using range imaging.
Proceedings of the IEEE International Conference on
In: Proceedings of the 7th International Symposium
Shape Modeling and Applications, 225–226, 2008.
on Signal Processing and Its Applications, 201–204,
[33] Yin, B. C.; Sun, Y. F.; Wang, C. Z.; Gai,
2003.
Y. BJUT-3D large scale 3D face database and
[23] Moreno, A. GavabDB: A 3D face database. In:
information processing. Journal of Computer Research
Proceedings of the 2nd COST275 Workshop on
and Development Vol. 46, No. 6, 1009–1018, 2009. (in
Biometrics on the Internet, 75–80, 2004.
Chinese)
[24] Chang, K. I.; Bowyer, K. W.; Flynn, P. J. An
[34] Gupta, S.; Castleman, K. R.; Markey, M. K.;
evaluation of multimodal 2D 3D face biometrics.
Bovik, A. C. Texas 3D face recognition database. In:
IEEE Transactions on Pattern Analysis and Machine
Proceedings of the IEEE Southwest Symposium on
Intelligence Vol. 27, No. 4, 619–624, 2005.
Image Analysis & Interpretation, 97–100, 2010.
[25] Wang, Y. M.; Pan, G.; Wu, Z. H.; Wang, Y. [35] Vijayan, V.; Bowyer, K. W.; Flynn, P. J.; Huang, D.;
G. Exploring facial expression effects in 3D face Chen, L. M.; Hansen, M.; Ocegueda, O.; Shah, S. K.;
recognition using partial ICP. In: Computer Vision – Kakadiaris, I. A. Twins 3D face recognition challenge.
ACCV 2006. Lecture Notes in Computer Science, Vol. In: Proceedings of the International Joint Conference
3851. Narayanan, P. J.; Nayar, S. K.; Shum, H. Y. on Biometrics, 1–7, 2011.
Eds. Springer Berlin Heidelberg, 581–590, 2006. [36] Veltkamp, R.; van Jole, S.; Drira, H.; Amor, B.;
[26] Yin, L. J.; Wei, X. Z.; Sun, Y.; Wang, J.; Daoudi, M.; Li, H. B.; Chen, L. M.; Claes, P.;
Rosato, M. J. A 3D facial expression database for Smeets, D.; Hermans, J.; et al. SHREC’11 track:
facial behavior research. In: Proceedings of the 7th 3D face models retrieval. In: Proceedings of the 4th
International Conference on Automatic Face and Eurographics Conference on 3D Object Retrieval, 89–
Gesture Recognition, 211–216, 2006. 95, 2011.
[27] Xu, C. H.; Tan, T. N.; Li, S.; Wang, Y. H.; Zhong, [37] Zhang, Y.; Guo, Z.; Lin, Z.; Zhang, H.; Zhang, C.
C. Learning effective intrinsic features to boost 3D- The NPU multi-case Chinese 3D face database and
based face recognition. In: Computer Vision – ECCV information processing. Chinese Journal of Electronics
2006. Lecture Notes in Computer Science, Vol. 3952. Vol. 21, No. 2, 283–286, 2012.
Leonardis, A.; Bischof, H.; Pinz, A. Eds. Springer [38] Zhang, X.; Yin, L. J.; Cohn, J. F.; Canavan, S.;
Berlin Heidelberg, 416–427, 2006. Reale, M.; Horowitz, A.; Peng, L. A high-resolution
[28] Conde, C.; Serrano, A.; Cabello, E. Multimodal spontaneous 3D dynamic facial expression database.
2D, 2.5D & 3D face verification. In: Proceedings In: Proceedings of the 10th IEEE International
of the International Conference on Image Processing, Conference and Workshops on Automatic Face and
2061–2064, 2006. Gesture Recognition, 1–6, 2013.
680 Y. Jing, X. Lu, S. Gao

[39] Min, R.; Kose, N.; Dugelay, J. L. KinectFaceDB: [51] Kim, D.; Hernandez, M.; Choi, J.; Medioni, G. Deep
A kinect database for face recognition. IEEE 3D face identification. In: Proceedings of the IEEE
Transactions on Systems, Man, and Cybernetics: International Joint Conference on Biometrics, 133–
Systems Vol. 44, No. 11, 1534–1548, 2014. 142, 2017.
[40] Zhang, J. J.; Huang, D.; Wang, Y. H.; Sun, J. [52] Zhang, Z. Y.; Da, F. P.; Yu, Y. Data-free point
Lock3DFace: A large-scale database of low-cost cloud network for 3D face recognition. arXiv preprint
Kinect 3D faces. In: Proceedings of the International arXiv:1911.04731, 2019.
Conference on Biometrics, 1–8, 2016. [53] Deng, J. K.; Cheng, S. Y.; Xue, N. N.; Zhou, Y.
[41] Urbanová, P.; Ferková, Z.; Jandová, M.; Jurda, M.; X.; Zafeiriou, S. UV-GAN: Adversarial facial UV
Černý, D.; Sochor, J. Introducing the FIDENTIS 3D map completion for pose-invariant face recognition.
face database. Anthropological Review Vol. 81, No. 2, In: Proceedings of the IEEE/CVF Conference on
202–223, 2018. Computer Vision and Pattern Recognition, 7093–7102,
[42] Zulqarnain Gilani, S.; Mian, A. Learning from millions 2018.
of 3D scans for large-scale 3D face recognition. [54] Zhao, J.; Xiong, L.; Cheng, Y.; Cheng, Y.; Li, J.;
In: Proceedings of the IEEE/CVF Conference on Zhou, L.; Xu, Y.; Karlekar, J.; Pranata, S.; Shen, S.;
Computer Vision and Pattern Recognition, 1896–1905, et al. 3D-aided deep pose-invariant face recognition. In:
2018. Proceedings of the 27th International Joint Conference
[43] Cheng, S. Y.; Kotsia, I.; Pantic, M.; Zafeiriou, S. on Artificial Intelligence, 1184–1190, 2018.
4DFAB: A large scale 4D database for facial expression [55] Shen, Y. J.; Luo, P.; Yan, J. J.; Wang, X. G.;
analysis and biometric applications. In: Proceedings Tang, X. O. FaceID-GAN: Learning a symmetry three-
of the IEEE/CVF Conference on Computer Vision player GAN for identity-preserving face synthesis.
and Pattern Recognition, 5117–5126, 2018. In: Proceedings of the IEEE/CVF Conference on
[44] Jia, S.; Li, X.; Hu, C. B.; Guo, G. D.; Xu, Z. Q. Computer Vision and Pattern Recognition, 821–830,
3D face anti-spoofing with factorized bilinear coding. 2018.
arXiv preprint arXiv:2005.06514, 2020. [56] Zhang, X. Y.; Zhao, Y.; Zhang, H. Dual-discriminator
[45] Ye, Y. P.; Song, Z.; Guo, J. G.; Qiao, Y. SIAT-3DFE: GAN: A GAN way of profile face recognition. In:
A high-resolution 3D facial expression dataset. IEEE Proceedings of the IEEE International Conference
Access Vol. 8, 48205–48211, 2020. on Artificial Intelligence and Computer Applications,
[46] Yang, H. T.; Zhu, H.; Wang, Y. R.; Huang, M. 162–166, 2020.
K.; Shen, Q.; Yang, R. G.; Cao, X. FaceScape: A [57] Marriott, R. T.; Romdhani, S.; Chen, L. M. A
large-scale high quality 3D face dataset and detailed 3D GAN for improved large-pose facial recognition.
riggable 3D face prediction. In: Proceedings of the In: Proceedings of the IEEE/CVF Conference on
IEEE/CVF Conference on Computer Vision and Computer Vision and Pattern Recognition, 13440–
Pattern Recognition, 598–607, 2020. 13450, 2021.
[47] Li, Q.; Dong, X. X.; Wang, W. N.; Shan, C. F. CAS- [58] Luo, M. D.; Cao, J.; Ma, X.; Zhang, X. Y.; He, R.
AIR-3D face: A low-quality, multi-modal and multi- FA-GAN: Face augmentation GAN for deformation-
pose 3D face database. In: Proceedings of the IEEE invariant face recognition. IEEE Transactions on
International Joint Conference on Biometrics, 1–8, Information Forensics and Security Vol. 16, 2341–
2021. 2355, 2021.
[48] Gilani, S. Z.; Mian, A. Towards large-scale 3D face [59] Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y.
recognition. In: Proceedings of the International Spectral networks and locally connected networks
Conference on Digital Image Computing: Techniques on graphs. arXiv preprint arXiv:1312.6203, 2013.
and Applications, 1–8, 2016. [60] Zhao, W.; Chellappa, R.; Phillips, P. J.; Rosenfeld, A.
[49] Farkas, L. G. Anthropometry of the Head and Face. Face recognition. ACM Computing Surveys Vol. 35,
Raven Press, 1994. No. 4, 399–458, 2003.
[50] Blanz, V.; Vetter, T. A morphable model for [61] Berretti, S.; del Bimbo, A.; Pala, P. 3D partial
the synthesis of 3D faces. In: Proceedings of the face matching using local shape descriptors. In:
26th Annual Conference on Computer Graphics and Proceedings of the Joint ACM Workshop on Human
Interactive Techniques, 187–194, 1999. Gesture and Behavior Understanding, 65–71, 2011.
3D face recognition: A comprehensive survey in 2022 681

[62] Li, H. B.; Huang, D.; Lemaire, P.; Morvan, J. M.; [74] Elaiwat, S.; Bennamoun, M.; Boussaid, F.; El-Sallam,
Chen, L. M. Expression robust 3D face recognition A. 3-D face recognition using curvelet local features.
via mesh-based histograms of multiple order surface IEEE Signal Processing Letters Vol. 21, No. 2, 172–
differential quantities. In: Proceedings of the 18th 175, 2014.
IEEE International Conference on Image Processing, [75] Drira, H.; Ben Amor, B.; Srivastava, A.; Daoudi,
3053–3056, 2011. M.; Slama, R. 3D face recognition under expressions,
[63] Creusot, C.; Pears, N.; Austin, J. Automatic occlusions, and pose variations. IEEE Transactions
keypoint detection on 3D faces using a dictionary on Pattern Analysis and Machine Intelligence Vol. 35,
of local shapes. In: Proceedings of the International No. 9, 2270–2283, 2013.
Conference on 3D Imaging, Modeling, Processing, [76] Li, H. B.; Huang, D.; Morvan, J. M.; Chen,
Visualization and Transmission, 204–211, 2011. L. M.; Wang, Y. H. Expression-robust 3D face
[64] Zhang, G. P.; Wang, Y. H. Robust 3D face recognition recognition via weighted sparse representation
based on resolution invariant features. Pattern of multi-scale and multi-component local normal
Recognition Letters Vol. 32, No. 7, 1009–1019, 2011. patterns. Neurocomputing Vol. 133, 179–193, 2014.
[65] Inan, T.; Halici, U. 3-D face recognition with local [77] Berretti, S.; Werghi, N.; Bimbo, A.; Pala, P.
shape descriptors. IEEE Transactions on Information Selecting stable keypoints and local descriptors for
Forensics and Security Vol. 7, No. 2, 577–587, 2012. person identification using 3D face scans. The Visual
[66] Berretti, S.; del Bimbo, A.; Pala, P. Sparse matching Computer Vol. 30, No. 11, 1275–1292, 2014.
of salient facial curves for recognition of 3-D faces [78] Lei, Y. J.; Bennamoun, M.; Hayat, M.; Guo, Y. L.
with missing parts. IEEE Transactions on Information An efficient 3D face recognition approach using local
Forensics and Security Vol. 8, No. 2, 374–389, 2013. geometrical signatures. Pattern Recognition Vol. 47,
[67] Li, X. L.; Da, F. P. Efficient 3D face recognition No. 2, 509–524, 2014.
handling facial expression and hair occlusion. Image [79] Tabia, H.; Laga, H.; Picard, D.; Gosselin, P. H.
and Vision Computing Vol. 30, No. 9, 668–679, 2012. Covariance descriptors for 3D shape matching and
[68] Ballihi, L.; Ben Amor, B.; Daoudi, M.; Srivastava, retrieval. In: Proceedings of the IEEE Conference on
A.; Aboutajdine, D. Boosting 3-D-geometric features Computer Vision and Pattern Recognition, 4185–4192,
for efficient face recognition and gender classification. 2014.
IEEE Transactions on Information Forensics and [80] Vezzetti, E.; Marcolin, F.; Fracastoro, G. 3D
Security Vol. 7, No. 6, 1766–1779, 2012. face recognition: An automatic strategy based on
[69] Berretti, S.; Werghi, N.; del Bimbo, A.; Pala, P. geometrical descriptors and landmarks. Robotics and
Matching 3D face scans using interest points and Autonomous Systems Vol. 62, No. 12, 1768–1776, 2014.
local histogram descriptors. Computers & Graphics [81] Li, H. B.; Huang, D.; Morvan, J. M.; Wang, Y. H.;
Vol. 37, No. 5, 509–525, 2013. Chen, L. M. Towards 3D face recognition in the real: A
[70] Smeets, D.; Keustermans, J.; Vandermeulen, D.; registration-free approach using fine-grained matching
Suetens, P. meshSIFT: Local surface features for of 3D keypoint descriptors. International Journal of
3D face recognition under expression variations Computer Vision Vol. 113, No. 2, 128–142, 2015.
and partial data. Computer Vision and Image [82] Elaiwat, S.; Bennamoun, M.; Boussaid, F.; El-Sallam,
Understanding Vol. 117, No. 2, 158–169, 2013. A. A curvelet-based approach for textured 3D face
[71] Creusot, C.; Pears, N.; Austin, J. A machine-learning recognition. Pattern Recognition Vol. 48, No. 4, 1235–
approach to keypoint detection and landmarking on 1246, 2015.
3D meshes. International Journal of Computer Vision [83] Al-Osaimi, F. R. A novel multi-purpose matching
Vol. 102, Nos. 1–3, 146–179, 2013. representation of local 3D surfaces: A rotationally
[72] Tang, H. L.; Yin, B. C.; Sun, Y. F.; Hu, Y. L. 3D invariant, efficient, and highly discriminative approach
face recognition using local binary patterns. Signal with an adjustable sensitivity. IEEE Transactions on
Processing Vol. 93, No. 8, 2190–2198, 2013. Image Processing Vol. 25, No. 2, 658–672, 2016.
[73] Lei, Y. J.; Bennamoun, M.; El-Sallam, A. A. An [84] Ming, Y. Robust regional bounding spherical
efficient 3D face recognition approach based on descriptor for 3D face recognition and emotion
the fusion of novel local low-level features. Pattern analysis. Image and Vision Computing Vol. 35, 14–22,
Recognition Vol. 46, No. 1, 24–37, 2013. 2015.
682 Y. Jing, X. Lu, S. Gao

[85] Guo, Y. L.; Lei, Y. J.; Liu, L.; Wang, Y.; Bennamoun, [97] Samir, C.; Srivastava, A.; Daoudi, M. Three-
M.; Sohel, F. EI3D: Expression-invariant 3D face dimensional face recognition using shapes of facial
recognition based on feature and shape matching. curves. IEEE Transactions on Pattern Analysis and
Pattern Recognition Letters Vol. 83, 403–412, 2016. Machine Intelligence Vol. 28, No. 11, 1858–1863, 2006.
[86] Soltanpour, S.; Wu, Q. J. Multimodal 2D–3D face [98] Lowe, D. G. Distinctive image features from
recognition using local descriptors: Pyramidal shape scale-invariant keypoints. International Journal of
map and structural context. IET Biometrics Vol. 6, Computer Vision Vol. 60, No. 2, 91–110, 2004.
No. 1, 27–35, 2017. [99] Deng, X.; Da, F.; Shao, H. J.; Jiang, Y. T. A multi-
[87] Lei, Y. J.; Guo, Y. L.; Hayat, M.; Bennamoun, M.; scale three-dimensional face recognition approach with
Zhou, X. Z. A Two-Phase Weighted Collaborative sparse representation-based classifier and fusion of
Representation for 3D partial face recognition with local covariance descriptors. Computers & Electrical
single sample. Pattern Recognition Vol. 52, 218–237, Engineering Vol. 85, 106700, 2020.
2016. [100] Vezzetti, E.; Marcolin, F.; Tornincasa, S.; Ulrich,
[88] Emambakhsh, M.; Evans, A. Nasal patches and L.; Dagnes, N. 3D geometry-based automatic
curves for expression-robust 3D face recognition. landmark localization in presence of facial occlusions.
IEEE Transactions on Pattern Analysis and Machine Multimedia Tools and Applications Vol. 77, No. 11,
Intelligence Vol. 39, No. 5, 995–1007, 2017. 14177–14205, 2018.
[89] Werghi, N.; Tortorici, C.; Berretti, S.; Del Bimbo, A. [101] Drira, H.; Benamor, B.; Daoudi, M.; Srivastava, A.
Boosting 3D LBP-based face recognition by fusing Pose and expression-invariant 3D face recognition
shape and texture descriptors on the mesh. IEEE using elastic radial curves. In: Proceedings of the
Transactions on Information Forensics and Security British Machine Vision Conference, 1–11, 2010.
Vol. 11, No. 5, 964–979, 2016. [102] Freund, Y.; Schapire, R. E. A short introduction to
[90] Hariri, W.; Tabia, H.; Farah, N.; Benouareth, A.; boosting. Journal of Japanese Society for Artificial
Declercq, D. 3D face recognition using covariance Intelligence Vol. 14, No. 5, 771–780, 1999.
based descriptors. Pattern Recognition Letters Vol. 78, [103] Aubry, M.; Schlickewei, U.; Cremers, D. The
1–7, 2016. wave kernel signature: A quantum mechanical
[91] Soltanpour, S.; Jonathan Wu, Q. M. High-order approach to shape analysis. In: Proceedings of the
local normal derivative pattern (LNDP) for 3D face IEEE International Conference on Computer Vision
recognition. In: Proceedings of the IEEE International Workshops, 1626–1633, 2011.
Conference on Image Processing, 2811–2815, 2017. [104] Ojala, T.; Pietikainen, M.; Maenpaa, T.
[92] Deng, X.; Da, F. P.; Shao, H. J. Efficient 3D Multiresolution gray-scale and rotation invariant
face recognition using local covariance descriptor texture classification with local binary patterns.
and Riemannian kernel sparse coding. Computers & IEEE Transactions on Pattern Analysis and Machine
Electrical Engineering Vol. 62, 81–91, 2017. Intelligence Vol. 24, No. 7, 971–987, 2002.
[93] Abbad, A.; Abbad, K.; Tairi, H. 3D face recognition: [105] Yu, Y.; Da, F. P.; Guo, Y. F. Sparse ICP with
Multi-scale strategy based on geometric and local resampling and denoising for 3D face verification.
descriptors. Computers & Electrical Engineering Vol. IEEE Transactions on Information Forensics and
70, 525–537, 2018. Security Vol. 14, No. 7, 1917–1927, 2019.
[94] Soltanpour, S.; Wu, Q. M. J. Weighted extreme sparse [106] Spreeuwers, L. Fast and accurate 3D face recognition.
classifier and local derivative pattern for 3D face International Journal of Computer Vision Vol. 93, No.
recognition. IEEE Transactions on Image Processing 3, 389–414, 2011.
Vol. 28, No. 6, 3020–3033, 2019. [107] Ocegueda, O.; Passalis, G.; Theoharis, T.; Shah, S.
[95] Shi, L. L.; Wang, X.; Shen, Y. L. Research on 3D face K.; Kakadiaris, I. A. UR3D-C: Linear dimensionality
recognition method based on LBP and SVM. Optik reduction for efficient 3D face recognition. In:
Vol. 220, 165157, 2020. Proceedings of the International Joint Conference
[96] Samir, C.; Srivastava, A.; Daoudi, M.; Klassen, E. on Biometrics, 1–6, 2011.
An intrinsic framework for analysis of facial surfaces. [108] Ming, Y.; Ruan, Q. Q. Robust sparse bounding sphere
International Journal of Computer Vision Vol. 82, No. for 3D face recognition. Image and Vision Computing
1, 80–95, 2009. Vol. 30, No. 8, 524–534, 2012.
3D face recognition: A comprehensive survey in 2022 683

[109] Liu, P. J.; Wang, Y. H.; Huang, D.; Zhang, Z. X.; [120] Fadaifard, H.; Wolberg, G.; Haralick, R. Multiscale 3D
Chen, L. M. Learning the spherical harmonic features feature extraction and matching with an application
for 3-D face recognition. IEEE Transactions on Image to 3D face recognition. Graphical Models Vol. 75, No.
Processing Vol. 22, No. 3, 914–925, 2013. 4, 157–176, 2013.
[110] Taghizadegan, Y.; Ghassemian, H.; Naser- [121] Alyuz, N.; Gokberk, B.; Akarun, L. 3-D face
Moghaddasi, M. 3D face recognition method using recognition under occlusion using masked projection.
2DPCA-Euclidean distance classification. ACEEE IEEE Transactions on Information Forensics and
International Journal on Control System and Security Vol. 8, No. 5, 789–802, 2013.
Instrumentation Vol. 3, No. 1, 1–5, 2012. [122] Bagchi, P.; Bhattacharjee, D.; Nasipuri, M. Robust
[111] Mohammadzade, H.; Hatzinakos, D. Iterative 3D face recognition in presence of pose and
closest normal point for 3D face recognition. IEEE partial occlusions or missing parts. arXiv preprint
Transactions on Pattern Analysis and Machine arXiv:1408.3709, 2014.
Intelligence Vol. 35, No. 2, 381–397, 2013. [123] Bagchi, P.; Bhattacharjee, D.; Nasipuri, M. 3D Face
[112] Ming, Y. Rigid-area orthogonal spectral regression Recognition using surface normals. In: Proceedings of
for efficient 3D face recognition. Neurocomputing Vol. the TENCON 2015 - 2015 IEEE Region 10 Conference,
129, 445–457, 2014. 1–6, 2015.
[124] Liang, Y.; Zhang, Y.; Zeng, X. X. Pose-invariant 3D
[113] Ratyal, N. I.; Ahmad Taj, I.; Bajwa, U. I.;
face recognition using half face. Signal Processing:
Sajid, M. 3D face recognition based on pose
Image Communication Vol. 57, 84–90, 2017.
and expression invariant alignment. Computers &
[125] LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning.
Electrical Engineering Vol. 46, 241–255, 2015.
Nature Vol. 521, No. 7553, 436–444, 2015.
[114] Tang, Y. H.; Sun, X.; Huang, D.; Morvan, J. M.;
[126] Taigman, Y.; Yang, M.; Ranzato, M.; Wolf,
Wang, Y. H.; Chen, L. M. 3D face recognition
L. DeepFace: Closing the gap to human-level
with asymptotic cones based principal curvatures.
performance in face verification. In: Proceedings of
In: Proceedings of the International Conference on
the IEEE Conference on Computer Vision and Pattern
Biometrics, 466–472, 2015.
Recognition, 1701–1708, 2014.
[115] Gilani, S. Z.; Mian, A.; Eastwood, P. Deep,
[127] Sun, Y.; Chen, Y. H.; Wang, X. G.; Tang, X. O. Deep
dense and accurate 3D face correspondence for
learning face representation by joint identification-
generating population specific deformable models.
verification. In: Proceedings of the 27th International
Pattern Recognition Vol. 69, 238–250, 2017.
Conference on Neural Information Processing Systems,
[116] Peter, M.; Minoi, J. L.; Hipiny, I. H. M. 3D face Vol. 2, 1988–1996, 2014.
recognition using kernel-based PCA approach. In:
[128] Sun, Y.; Wang, X. G.; Tang, X. O. Deep learning
Computational Science and Technology. Lecture Notes face representation from predicting 10,000 classes. In:
in Electrical Engineering, Vol. 481. Alfred, R.; Lim, Proceedings of the IEEE Conference on Computer
Y.; Ibrahim, A.; Anthony, P. Eds. Springer Singapore, Vision and Pattern Recognition, 1891–1898, 2014.
77–86, 2019. [129] Sun, Y.; Wang, X. G.; Tang, X. O. Deeply learned
[117] Passalis, G.; Perakis, P.; Theoharis, T.; Kakadiaris, I. face representations are sparse, selective, and robust.
A. Using facial symmetry to handle pose variations In: Proceedings of the IEEE Conference on Computer
in real-world 3D face recognition. IEEE Transactions Vision and Pattern Recognition, 2892–2900, 2015.
on Pattern Analysis and Machine Intelligence Vol. 33, [130] Sun, Y.; Liang, D.; Wang, X. G.; Tang, X. O.
No. 10, 1938–1951, 2011. DeepID3: Face recognition with very deep neural
[118] Huang, D.; Ardabilian, M.; Wang, Y. H.; Chen, networks. arXiv preprint arXiv:1502.00873, 2015.
L. M. 3-D face recognition using eLBP-based facial [131] Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A
description and local feature hybrid matching. IEEE unified embedding for face recognition and clustering.
Transactions on Information Forensics and Security In: Proceedings of the IEEE Conference on Computer
Vol. 7, No. 5, 1551–1565, 2012. Vision and Pattern Recognition, 815–823, 2015.
[119] Alyüz, N.; Gökberk, B.; Spreeuwers, L.; Veldhuis, [132] Charles, R. Q.; Hao, S.; Mo, K. C.; Guibas, L.
R.; Akarun, L. Robust 3D face recognition in the J. PointNet: Deep learning on point sets for 3D
presence of realistic occlusions. In: Proceedings of classification and segmentation. In: Proceedings of the
the 5th IAPR International Conference on Biometrics, IEEE Conference on Computer Vision and Pattern
111–118, 2012. Recognition, 77–85, 2017.
684 Y. Jing, X. Lu, S. Gao

[133] Qi, C. R.; Yi, L.; Su, H.; Guibas, L. J. PointNet++: the 7th International Conference on Signal Processing
Deep hierarchical feature learning on point sets in a and Integrated Networks, 696–701, 2020.
metric space. In: Proceedings of the 31st International [143] Lin, S. S.; Jiang, C. Y.; Liu, F.; Shen, L. L. High
Conference on Neural Information Processing System, quality facial data synthesis and fusion for 3D low-
5105–5114, 2017. quality face recognition. In: Proceedings of the IEEE
[134] Ding, Y. Q.; Li, N. Y.; Young, S. S.; Ye, J. W. Efficient International Joint Conference on Biometrics, 1–8,
3D face recognition in uncontrolled environment. In: 2021.
Advances in Visual Computing. Lecture Notes in [144] Chiu, M. T.; Cheng, H. Y.; Wang, C. Y.; Lai,
Computer Science, Vol. 11844. Springer Cham, 430– S. H. High-accuracy RGB-D face recognition via
443, 2019. segmentation-aware face depth estimation and mask-
[135] Lin, S. S.; Liu, F.; Liu, Y. H.; Shen, L. L. Local feature guided attention network. In: Proceedings of the 16th
tensor based deep learning for 3D face recognition. IEEE International Conference on Automatic Face
In: Proceedings of the 14th IEEE International and Gesture Recognition, 1–8, 2021.
Conference on Automatic Face & Gesture Recognition, [145] Zhang, Z. Y.; Da, F. P.; Yu, Y. Learning directly
1–5, 2019. from synthetic point clouds for “in-the-wild” 3D face
[136] Tan, Y.; Lin, H. X.; Xiao, Z. L.; Ding, S. Y.; Chao, H. recognition. Pattern Recognition Vol. 123, 108394, 2022.
Y. Face recognition from sequential sparse 3D data via [146] Bhople, A. R.; Shrivastava, A. M.; Prakash, S. Point
deep registration. In: Proceedings of the International cloud based deep convolutional neural network for 3D
Conference on Biometrics, 1–8, 2019. face recognition. Multimedia Tools and Applications
Vol. 80, No. 20, 30237–30259, 2021.
[137] Olivetti, E. C.; Ferretti, J.; Cirrincione, G.; Nonis,
[147] Bhople, A. R.; Prakash, S. Learning similarity
F.; Tornincasa, S.; Marcolin, F. Deep CNN for 3D
and dissimilarity in 3D faces with triplet network.
face recognition. In: Design Tools and Methods in
Multimedia Tools and Applications Vol. 80, Nos. 28–
Industrial Engineering. Lecture Notes in Mechanical
29, 35973–35991, 2021.
Engineering. Rizzi, C.; Andrisano, A. O.; Leali, F.;
[148] Jiang, C. Y.; Lin, S. S.; Chen, W.; Liu, F.; Shen,
Gherardini, F.; Pini, F.; Vergnano, A. Eds. Springer
L. L. PointFace: Point set based feature learning for
Cham, 665–674, 2020.
3D face recognition. In: Proceedings of the IEEE
[138] Xu, K. M.; Wang, X. M.; Hu, Z. H.; Zhang, Z. H.
International Joint Conference on Biometrics, 1–8,
3D face recognition based on twin neural network
2021.
combining deep map and texture. In: Proceedings
[149] Papadopoulos, K.; Kacem, A.; Shabayek, A.; Aouada,
of the IEEE 19th International Conference on
D. Face-GCN: A graph convolutional network for
Communication Technology, 1665–1668, 2019.
3D dynamic face identification/recognition. arXiv
[139] Feng, J. Y.; Guo, Q.; Guan, Y. D.; Wu, M.
preprint arXiv:2104.09145, 2021.
D.; Zhang, X. R.; Ti, C. L. 3D face recognition
[150] Li, B. Y. L.; Mian, A. S.; Liu, W. Q.; Krishna,
method based on deep convolutional neural network
A. Using Kinect for face recognition under varying
In: Smart Innovations in Communication and
poses, expressions, illumination and disguise. In:
Computational Sciences. Advances in Intelligent
Proceedings of the IEEE Workshop on Applications
Systems and Computing, Vol. 670. Panigrahi, B.;
of Computer Vision, 186–192, 2013.
Trivedi, M.; Mishra, K.; Tiwari, S.; Singh, P. Eds.
[151] Wang, F.; Cheng, J.; Liu, W. Y.; Liu, H. J. Additive
Springer Singapore, 123–130, 2019.
margin softmax for face verification. IEEE Signal
[140] Mu, G. D.; Huang, D.; Hu, G. S.; Sun, J.; Wang, Y. Processing Letters Vol. 25, No. 7, 926–930, 2018.
H. Led3D: A lightweight and efficient deep approach [152] Cortes, C.; Vapnik, V. Support-vector networks.
to recognizing low-quality 3D faces. In: Proceedings Machine Learning Vol. 20, No. 3, 273–297, 1995.
of the IEEE/CVF Conference on Computer Vision [153] Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-
and Pattern Recognition, 5766–5775, 2019. to-image translation with conditional adversarial
[141] Dutta, K.; Bhattacharjee, D.; Nasipuri, M. networks. In: Proceedings of the IEEE Conference on
SpPCANet: A simple deep learning-based feature Computer Vision and Pattern Recognition, 5967–5976,
extraction approach for 3D face recognition. 2017.
Multimedia Tools and Applications Vol. 79, Nos. 41–42, [154] Jiang, L.; Zhang, J. Y.; Deng, B. L. Robust
31329–31352, 2020. RGB-D face recognition using attribute-aware loss.
[142] Cao, C. Q.; Swash, M. R.; Meng, H. Y. Reliable IEEE Transactions on Pattern Analysis and Machine
holoscopic 3D face recognition. In: Proceedings of Intelligence Vol. 42, No. 10, 2552–2566, 2020.
3D face recognition: A comprehensive survey in 2022 685

[155] Paysan, P.; Knothe, R.; Amberg, B.; Romdhani, S.; Yaping Jing received her bachelor
Vetter, T. A 3D face model for pose and illumination degree (Hons) in information technology
invariant face recognition. In: Proceedings of the 6th from Deakin University, Australia in
IEEE International Conference on Advanced Video 2016. She is currently a Ph.D. candidate
and Signal Based Surveillance, 296–301, 2009. in the School of Information Technology
[156] Cao, C.; Weng, Y. L.; Zhou, S.; Tong, Y. Y.; Zhou, K. at Deakin University. Her research
FaceWarehouse: A 3D facial expression database for interests include 3D face recognition, 3D
visual computing. IEEE Transactions on Visualization data processing, and machine learning.
and Computer Graphics Vol. 20, No. 3, 413–425, 2014.
[157] Castellani, U.; Bartoli, A. 3D shape registration. In:
3D Imaging, Analysis and Applications. Liu, Y.; Pears, Xuequan Lu is a lecturer (assistant
N.; Rosin, P. L.; Huber, P. Eds. Springer Cham, 353– professor) in Deakin University,
411, 2020. Australia. He spent more than two years
working as a research fellow in Singapore.
[158] D’Errico, J. Surface fitting using gridfit. MATLAB
Prior to that, he received his Ph.D.
Central File Exchange. 2005. Available at https:// degree from Zhejiang University, China
www.mathworks.com/matlabcentral/fileexchange/8998- in 2016. His research interests mainly
surface-fitting-using-gridfit. fall into the category of visual data
[159] Ciravegna, G.; Cirrincione, G.; Marcolin, F.; Barbiero, computing, for example, geometry modeling, processing and
P.; Dagnes, N.; Piccolo, E. Assessing discriminating analysis, animation, simulation, 2D data processing and
capability of geometrical descriptors for 3D face analysis.
recognition by using the GH-EXIN neural network. In:
Neural Approaches to Dynamics of Signal Exchanges.
Smart Innovation, Systems and Technologies, Vol. Shang Gao received her Ph.D. degree
151. Esposito, A.; Faundez-Zanuy, M.; Morabito, F.; in computer science from Northeastern
Pasero, E. Eds. Springer Singapore, 223–233, 2020. University, China in 2000. She is
[160] Lüthi, M.; Gerig, T.; Jud, C.; Vetter, T. Gaussian currently a senior lecturer in the School
process morphable models. IEEE Transactions on of Information Technology, Deakin
Pattern Analysis and Machine Intelligence Vol. 40, University. Her current research interests
No. 8, 1860–1873, 2017. include cybersecurity, cloud computing,
and machine learning.
[161] Gilani, S. Z.; Mian, A.; Shafait, F.; Reid, I. Dense 3D
face correspondence. IEEE Transactions on Pattern
Analysis and Machine Intelligence Vol. 40, No. 7,
1584–1598, 2018. Open Access This article is licensed under a Creative
[162] El Rahman Shabayek, A.; Aouada, D.; Cherenkova, Commons Attribution 4.0 International License, which
K.; Gusev, G.; Ottersten, B. 3D deformation signature permits use, sharing, adaptation, distribution and reproduc-
for dynamic face recognition. In: Proceedings of the tion in any medium or format, as long as you give appropriate
IEEE International Conference on Acoustics, Speech credit to the original author(s) and the source, provide a link
and Signal Processing, 2138–2142, 2020. to the Creative Commons licence, and indicate if changes
were made.
[163] Smith, M.; Smith, L.; Huang, N.; Hansen, M.;
Smith, M. Deep 3D face recognition using 3D data The images or other third party material in this article are
augmentation and transfer learning. In: Proceedings included in the article’s Creative Commons licence, unless
of the 16th International Conference on Machine indicated otherwise in a credit line to the material. If material
is not included in the article’s Creative Commons licence and
Learning and Data Mining, 209–218, 2020.
your intended use is not permitted by statutory regulation or
[164] Dou, P. F.; Shah, S. K.; Kakadiaris, I. A. End-to-end exceeds the permitted use, you will need to obtain permission
3D face reconstruction with deep neural networks. In: directly from the copyright holder.
Proceedings of the IEEE Conference on Computer To view a copy of this licence, visit http://
Vision and Pattern Recognition, 1503–1512, 2017. creativecommons.org/licenses/by/4.0/.
[165] Richardson, E.; Sela, M. T.; Kimmel, R. 3D face Other papers from this open access journal are available
reconstruction by learning from synthetic data. In: free of charge from http://www.springer.com/journal/41095.
Proceedings of the 4th International Conference on To submit a manuscript, please go to https://www.
3D Vision, 460–469, 2016. editorialmanager.com/cvmj.

You might also like