Ls 7
Ls 7
Review Article
Abstract In the past ten years, research on face                 Thus, 3D face recognition has become an active
recognition has shifted to using 3D facial surfaces, as          research topic in recent years.
3D geometric information provides more discriminative               Face recognition mainly involves the process of
features. This comprehensive survey reviews 3D                   extracting feature representations from the input
face recognition techniques developed in the past                face, matching the extracted features with existing
decade, both conventional methods and deep learning              databases and predicting the personal identity of
methods. These methods are evaluated with detailed               the input face. Therefore, using rich facial features
descriptions of selected representative works. Their             is critical to the recognition result. In 3D face
advantages and disadvantages are summarized in terms             recognition, 3D face data are used for training and
of accuracy, complexity, and robustness to facial                testing purposes. Compared to 2D images, 3D faces
variations (expression, pose, occlusion, etc.). A review         contain richer geometric information, that can provide
of 3D face databases is also provided, and a discussion
                                                                 more discriminative features and help face recognition
of future research challenges and directions of the topic.
                                                                 systems overcome the inherent defects and drawbacks
Keywords      3D face recognition; 3D face databases;            of 2D face recognition, such as facial expression,
              deep learning; local features; global features     occlusion, and pose variation. Furthermore, 3D data
                                                                 are relatively unchanged after scaling, rotation, and
1   Introduction                                                 illumination change [5]. Most 3D scanners can acquire
                                                                 both 3D meshes/point clouds and corresponding
Face recognition has become a commonly used                      textures. This allows us to integrate advanced 2D
biometric technology. It is widely applied in public             face recognition algorithms into 3D face recognition
surveillance, authentication, security, intelligence,            systems for better results.
and many other systems [1]. During recent decades,                  One of the main challenges to 3D face recognition
many 2D face recognition techniques have achieved                is the acquisition of 3D training images—this cannot
strong results in controlled environments. The                   be accomplished by crawling the Web, unlike for
accuracy of 2D face recognition has been greatly                 2D face images. It requires special hardware instead.
enhanced especially by the emergence of deep learning.           According to the technologies used, collection systems
However, these techniques are still challenged by the            can be broadly divided into active acquisition and
intrinsic limitations of 2D images, due to variations            passive acquisition [6]. An active collection system
in illumination, pose, and expression, occlusion,                actively emits invisible light (e.g., an infrared laser
disguises, time delays, and image quality [2]. 3D face           beam) to illuminate the target face and obtain the
recognition can outperform 2D face recognition [3]               shape features of the target by measuring reflectivity.
with greater recognition accuracy and robustness, as it is       A passive acquisition system consists of several
less sensitive to pose, illumination, and expression [4].        cameras placed apart. It matches points observed
                                                                 to those from other cameras and calculates the 3D
1 The School of Information Technology, Deakin University,
  Waurn Ponds, VIC, Australia. E-mail: Y. Jing, jingyap@
                                                                 position of the matched point. The 3D surface is
  deakin.edu.au; X. Lu, xuequan.lu@deakin.edu.au ( );            formed from a set of matched points.
  S. Gao, shang.gao@deakin.edu.au.                                  Since 2000, many researchers have begun to
Manuscript received: 2022-04-14; accepted: 2022-09-29            assess 3D face recognition algorithms on large-scale
                                                           657
658                                                                                              Y. Jing, X. Lu, S. Gao
Table 1 3D face databases. Datatype: M=mesh, P=point cloud, R=range image, V=video, 3DV=3D video; Expression: S=smile/happiness,
M=multiple expressions; Pose: LR=slight left/right turn, UD=slight up/down turn, M=multiple poses
                               Data
                        Year   type   IDs    Scans    Texture   Expression      Pose      Occlusion   Scanner
 FRGC v2 [9]            2005    R     466     4007     Yes           S          ±15◦        —         Minolta Vivid 3D scanner
                                                                                      ◦
                                                                                ±45 ,
 UND [24]               2005    R     275     670      Yes          —                       —         Minolta Vivid 900
                                                                                ±60◦
ND-2006 [29] 2007 R 888 13,450 Yes M ±15◦ — Minolta Vivid 910
Texas-3D [34] 2010 R 118 1149 Yes M ±10◦ — MU-2 stereo imaging system
                                                                 S, anger,
 UMBDB [20]             2011    R     143    1473      Yes                       —           7        Minolta Vivid 900 laser scanner
                                                                  bored
                                                                   N, S,
 KinectFaceDB [39]      2014    R      52     936      Yes                       LR       Multiple    Kinect
                                                                 surprise
                                                                                          Random
 Lock3DFace [40]        2016    R     509     5711     Yes          M           ±90◦                  Kinect
                                                                                           cover
                                                                                 Semi-
                                                                                            Half
 F3D-FD [41]            2018    R     2476    —        Yes          —           lateral               Vectra M1 scanner
                                                                                            face
                                                                               with ear
 CAS-AIR-3D Face [47]   2021    V     3093   24,713    No       S, surprise     ±90◦      Glasses     Intel RealSense SR305
660                                                                                                         Y. Jing, X. Lu, S. Gao
SHREC11 [36] is built upon a new collection of 130         databases. Increasing the gallery size may degrade
masks with 6 3D face scans [36]. Like BJUT-3D,             the performance of face recognition [48]. Although
Northwestern Polytechnic University 3D (NPU3D              some algorithms have achieved good results on these
[37]) is another large-scale Chinese 3D face database,     existing 3D face databases, they still cannot be used
composed of 10,500 3D face captures, corresponding         in the real world due to less controlled conditions.
to 300 individuals [37]. BU4D-FE [38] is a 3D              The establishment of large-scale 3D face databases to
video database that records spontaneous expressions        simulate real-world situations is essential to facilitate
of various young people by completing 8 emotional          research into 3D face recognition. In addition,
expression elicitation tasks [38]. KinectFaceDB [39]       collecting 3D face data is a time-consuming and
was the first publicly available face database based on    resource-demanding task. Research into large dataset
the Kinect sensor and contains four data modalities        generation algorithms is one of our suggestions for
(2D, 2.5D, 3D, and video-based) [39].                      future work (see also Section 3.2).
   Recently, another large-scale 3D face database
Lock3DFace [40] was released. It is based on Kinect
                                                           3     Data preprocessing and augmentation
and contains several variations in expression, pose,
time-lapse, and occlusion [40]. F3D-FD [41] is a large     3.1    Data preprocessing
dataset, and has the most individuals: 2476. For each
                                                           In most situations, the acquired raw 3D face data
individual, it includes partial 3D scans from frontal
                                                           cannot be directly input to feature extraction systems
and two semi-lateral views, and a one-piece face
                                                           as they may contain redundant information [6]. For
with lateral parts (including ears and earless, with
                                                           example, presence of hair, neck, and background may
landmarks) [41]. LS3DFace [42] is the largest dataset
                                                           affect the accuracy of recognition. Thus, 3D data
so far, including 31,860 3D face scans of 1853 people.
                                                           are usually preprocessed before being passed into a
It is combines data from multiple challenging public
                                                           feature extraction model.
datasets, including FRGC v2, BU3D-FE, Bosphorus,
                                                             In general, the data preprocessing phase includes
GavabDB, Texas-3D, BU4D-FE, CASIA, UMBDB,
                                                           three main steps: facial landmark detection and
3D-TEC, and ND-2006 [42]. 4DFAB [43] is a large
                                                           orientation, data segmentation, and face registration.
dynamic high-resolution 3D face database; it contains
                                                           Facial landmarks are a set of keypoints defined by
4D videos of subjects showing spontaneous and posed
                                                           anthropometric studies [49] and can be used to
facial behavior.
                                                           automatically localize and register a face. Some
   The large-scale Wax Figure Face Database (WFFD
                                                           databases already provide landmarks for each face
[44]) is designed to address vulnerabilities in existing
                                                           image. Data segmentation is the process of utilizing
3D facial spoofing databases and to promote the
                                                           facial landmarks, such as nose tip and eye corners, to
research of 3D facial presentation attack detection
                                                           segment the facial surface [49]. This process is always
[44]. This database includes photo-based and video-
                                                           used by local conventional methods, which determine
based data. We only detail the video information
                                                           identifiable facial parts like the nose and eyes for
in Table 1.       SIAT-3DFE [45] is a 3D facial
                                                           feature extraction. As an essential step before feature
expression dataset in which every identity has 16
                                                           extraction and matching, face registration aligns the
facial expressions including natural, happy, sad,
                                                           target surface (the entire face or face part) with the
surprised, and several exaggerated expressions (open
                                                           training surface in the gallery.
mouth, frowning, etc.), as well as two occluded 3D
cases [45]. Another recent database is FaceScape           3.2    Data augmentation
[46], which consists of 18,760 textured 3D data with       To improve the performance and robustness of face
pore-level facial geometry [46]. CAS-AIR-3D [47] is        recognition systems, large-scale datasets are required,
a large low-quality 3D face database, including 3093       especially for deep learning-based methods, since their
individuals. Each records 8 videos with different poses    networks need to be trained using a large amount of
and expressions, occlusion, and distance change.           training data.
   It is well known that the performance of 3D face          Several augmentation methods can be used to
recognition algorithms may vary on different 3D face       increase the size of the training and test datasets.
662                                                                                            Y. Jing, X. Lu, S. Gao
The easiest is to rotate and crop existing face data.      (pose, expression), which can be further used for
Another popular approach is to use 3D morphable            face manipulation and enhancement.
facial models (3DMM) [50] to generate new shapes
and expressions to synthesize new facial data [51, 52].
                                                           4     Conventional methods
Randomly selecting sub-feature sets from different
samples of a person and combining them to generate a       4.1     Approaches
new face is also a reliable way to enrich the identities   According to Ref. [60] and our review of the last
in datasets [42].                                          decade’s literature, conventional face recognition
  Recently, generative adversarial networks (GANs)         algorithms can be classified into three types based
have been used to generate realistic synthetic images      on their feature extraction approaches, local, global,
[53–58]. GANs usually consist of a generator and           and hybrid, as shown in Fig. 1. Local approaches
a discriminator, which are alternately trained in          mainly focus on local facial features such as the nose
minimax games. The discriminator is trained to             and eyes [60]. In contrast to local methods, global
discriminate generated samples from real samples,          approaches use the entire face to generate feature
and the generator is trained to generate images            vectors for feature classification. Hybrid methods use
resembling real ones to minimize the success of the        both local and global facial features.
discriminator.                                               In local methods, fusion schemes are used to
  Ref. [53] proposed UV-GAN, which first generates a       improve accuracy. There are five levels for fusion
completed u–v map from a single image, then attaches       schemes: sensor level, feature level, rank level,
the completed u–v to a fitted 3D mesh, and generates       decision level, and score level [1]. Sensor level
synthetic faces in arbitrary poses to increase pose        fusion merges the original sensor data at the initial
variation. In 3D-PIM [54], a 3DMM cooperates with          stage of recognition; feature level fusion involves the
a GAN to automatically recover natural frontal face        combination of features extracted from different facial
images from arbitrary poses. The 3DMM is used as a         representations of one single object. For rank level
simulator to generate synthetic faces with normalized      fusion, ranks are assigned to gallery images based on
poses, and the GAN is used to refine the realism of        a descending sequence of confidence, while score level
the output detail.                                         fusion combines matching scores of each classifier
                                                           based on a weighting scheme. Decision level fusion
  FaceID-GAN [55] formulates a three-player GAN by
                                                           combines the decision of each classifier [1].
introducing an identity classifier that works with the
                                                             Details of the three conventional face recognition
discriminator to compete with the generator. Ref. [56]
                                                           approaches are discussed below.
also proposed a method to generate frontal faces
from profile faces by employing a GAN with dual-           4.2     Local methods
discriminator structure. The 3D GAN generator in           4.2.1   Classification
Ref. [57] is augmented with an integrated 3DMM,            In the last decade, many local approaches have been
including two CNNs for facial texture and background       built, where local feature descriptors are used to
generation, to ensure that identity is preserved in the    describe 3D local facial information. Table 2 lists
synthetic images after manipulating pose.                  noteworthy 3D local methods and summarizes their
  In FA-GAN [58], a graph-based two-stage                  important details.
architecture was proposed, consisting of two parts: a         Following Ref. [13], these methods can be classified into
geometry preservation module (GPM) and a face              three different types based on the descriptors: keypoint-
disentanglement module (FDM). For the GPM,                 based, curve-based, and region-based. Keypoint-based
a graph convolutional network (GCN) [59] is                methods detect a set of 3D keypoints based on face
introduced to explore the relationship between             geometry and build feature descriptors by calculating
different face regions and better preservation of          relationships between these keypoints. Curve-based
geometric information.        FDM disentangles the         methods use a set of curves on each face surface as
encoded facial feature embeddings into identity            feature vectors. Region-based methods extract features
representations and deformation attribute codes            from certain regions of the face surface [13].
3D face recognition: A comprehensive survey in 2022                                                                                   663
Table 2 Local techniques. RR1 = rank-1 recognition rate. Advantages: (i) changes to which the method is relatively robust, or (ii) other
benefits such as speed, etc. Limitations note circumstances under which the corresponding method under-performs
  Author/year                  Category            Method               Advantage           Limitation        Database      RR1 (%)
                                                                                                                               89.2
  Berretti et al. (2011)                           Covariance matrix,                       Keypoints
                               SIFT keypoint                            Partial facial                        FRGC v2        (partial
  [61]                                             X 2 dist                                 redundancy
                                                                                                                              faces)
                                                                                                              FRGC v2          95.6
  Berretti et al. (2013)                                                                    Large pose,
                               Curve               Sparse               Missing parts                         GavabDB         97.13
  [66]                                                                                      expression
                                                                                                              UND               75
                                                   Linear (LDA),
  Creusot et al. (2013)        Mesh-based                                                   Complexity,       FRGC v2          —
                                                   non-linear           Expression
  [71]                         landmark keypoint                                            occlusion         Bosphorus        —
                                                   (AdaBoost)
                                                   LBP,
                               Region                                                       Occlusion,
  Tang et al. (2013) [72]                          Nearest-neighbor     Expression                            FRGC v2         94.89
                               (LBP-based)                                                  missing data
                                                   (NN)
                                                                                            Extreme
                                                   Riemannian           Pose, missing
  Drira et al. (2013) [75]     Curve                                                        expression,       FRGC v2          97.7
                                                   framework            data
                                                                                            complexity
                               Region
  Li et al. (2014) [76]                            ICP, sparse-based    Expression, fast    Pose, occlusion   FRGC v2          96.3
                               (LBP-based)
                                                                                            Noise,
  Berretti et al. (2014)       Mesh-based                               Occlusion,
                                                   Classifier                               low-resolution    Bosphorus        94.5
  [77]                         keypoint                                 missing parts
                                                                                            images
                                                                        Efficient,                            FRGC v2          —
  Lei et al. (2014) [78]       Curve               KPCA, SVM                                Occlusion
                                                                        expression                            SHREC08          —
                               Region (Geometric
  Tabia et al. (2014) [79]                         Riemannian metric    Expression          Occlusion         GavabDB         94.91
                               features)
Al-Osaimi (2016) [83] Curve Euclidean dist Fast, expression Occlusion FRGC 97.78
                                                                                                                FRGC v2       —
                                                 Regional, global        Large pose,        Patches
 Ming (2015) [84]            Region                                                                             CASIA         —
                                                 regression              efficient          detection
                                                                                                                BU-3DFE       —
                                                 Rotational
                                                                         Occlusion,
 Guo et al. (2016) [85]      Keypoint            Projection                                 Cost                FRGC v2       97
                                                                         expression and
                                                 Statistics (RoPS),
                                                                         pose
                                                 average dist
                                                                         Missing parts,
                                                 Two-Phase                                  Extreme pose,
 Lei et al. (2016) [87]      SIFT keypoint                               occlusion, data                        FRGC v2      96.3
                                                 Weighted                                   expression
                                                                         corruptions
                             Region (geometric
 Hariri et al. (2016) [90]                       Geodesic dist           Expression, pose   Partial occlusion   FRGC v2      99.2
                             features)
                             Region                                      Low
 Shi et al. (2020) [95]                          LBP, SVM                                   Pose, occlusion     Texas-3D     96.83
                             (LBP-based)                                 consumption
occlusion or pose variation, SIFT keypoint detection        based 3D face recognition method was proposed and
is directly used on 3D mesh data. The extension             17 landmarks were automatically extracted based on
of SIFT to 3D meshes is called MeshSIFT [70]. In            facial geometrical characteristics; this was further
Ref. [70], salient points on a 3D face surface were first   extended in Ref. [100].
detected as extreme values in scale space, and then         4.2.3 Curve-based methods
an orientation was assigned to these points. A feature      A curve-based method uses a set of curves to construct
vector was used to describe them by concatenating           feature descriptors. It is difficult to decide whether
the histograms of slant angles and shape indices.           such methods are local or global, because these
Before this approach was applied, Ref. [62] also            curves usually cover the entire face, and capture
used minimum and maximum curvatures within a                geometric information from different face regions to
3D Gaussian scale space to detect salient points            represent the 3D face. The curves can be grouped
and used the histograms of multiple order surface           into level curves and radial curves according to their
differential quantities to characterize the local facial    distribution. Level curves are non-intersecting closed
surface. The descriptors of detected local regions          curves of different lengths; radial curves are open
were further used in 3D face local matching. Ref. [81]      curves, usually starting from the nose tip.
also described an extension to this work, in which             Level curves can be further divided into iso-depth
a fine-grained matching of 3D keypoint descriptors          and iso-geodesic curves [13] (see Fig. 4 in Ref. [96]
was proposed to enlarge intra-subject similarity            and Fig. 5 in Ref. [101]). Iso-depth curves can be
and reduce inter-subject similarity. However, a             obtained by translating a plane across the facial
large number of keypoints were detected by these            surface in one direction and were first introduced by
methods.                                                    Samir et al. [97]. Ref. [96] expanded this work and
   A meshDOG keypoint detector was proposed by              proposed iso-geodesic curves which are level curves
Berretti et al. [69, 77]. They first used the meshDOG       of surface distance from the nose tip. However, both
keypoint detector and local geometric histogram (GH)        kinds are sensitive to occlusion, missing parts, and
descriptor to extract features, and then selected the       large facial expressions. Thus, radial curves were
most effective feature based on an analysis of the          introduced in Ref. [101] and extended in Ref. [75].
optimal scale, distribution and clustering of keypoints,    These curves can better handle occlusion and missing
and the features of local descriptors. Recently,            parts as it is uncommon to lose a full radial curve
Ref. [82] exploited a curvelet-based multimodal             and at least some parts of a radial curve can be
keypoint detector and local surface descriptor that         used. Also, they can be associated with different
extracts both texture and 3D local features. It             facial expressions as the radial curves pass through
reduces the computational cost of keypoint detection        different facial regions.
and feature building, as the curvelet transform is             In Ref. [67], facial curves in the nose region of a
based on the FFT.                                           target face were first extracted to form a rejection
   Additionally, a set of facial landmarks is used          classifier, which was used to quickly and effectively
for creating feature vectors in some methods, and           eliminate different faces in the gallery. Then the
a shape index is widely used to detect landmarks.           face was segmented into six facial regions. A facial
In Ref. [63], keypoints were extracted from a               deformation mapping was produced by using curves
shape dictionary, which was learned on a set of 14          in these regions. Finally, adaptive regions were
manually placed landmarks on a human face. As               selected to match the two identities. In Ref. [68],
an extension, Ref. [71] used a dictionary of learned        geometric curves from the level sets (circular curves)
local shapes to detect keypoints, and evaluated them        and streamlines (radial curves) through the Euclidean
through linear (LDA) and nonlinear (AdaBoost)               distance functions of 3D faces were combined for high-
methods. Ref. [64] detected resolution invariant            accuracy face recognition.
keypoints and scale-space extremes on shape index              A highly compact signature of a 3D face can be
images based on scale-space analysis, and used six          characterized by a small set of features selected by
scale-invariant similarity measures to calculate the        the Adaboost algorithm [102], a well-known machine
matching score. In Ref. [80], an entirely geometry-         learning feature selection method. By using the
666                                                                                         Y. Jing, X. Lu, S. Gao
selected curves, face recognition time was reduced         was proposed to reduce the influence of local facial
from 2.64 to 0.68 s, showing that feature selection can    distortion.
effectively improve system performance. To provide            Recently, Ref. [95] used the LBP algorithm to
high discriminative feature vectors and improve            extract features of 3D depth images and the SVM
computational efficiency, angular radial signatures        algorithm to classify them. The feature extraction
(ARSs) were proposed by Lei et al. [78]. An ASR is         time of each depth map in Texas-3D was reduced
a set of curves from the nose tip (the origin of facial    to 0.19 s while Ref. [70] required 23.54 s. Inspired
range images) at intervals of θ radians.                   by LBP, Ref. [76] proposed the multi-scale and
   Another type of facial curve was introduced by          multi-component local normal patterns (MSMC-
Berretti et al. [66]. SIFT was utilized to detect          LNP) descriptor, which can describe normal facial
keypoints of 3D depth images and which were                information more compactly. The Mesh-LBP method
connected to form the facial curves. A 3D face can         was used in Ref. [89], where LBP descriptors were
be represented by a set of facial curves built from        directly computed on the 3D face mesh surface, fusing
matched keypoints. Ref. [83] provided some extended        both shape and texture information.
applications of facial curves. 3D curves were formed          Another type of local method is based on
by intersecting three spheres with the 3D surface          geometric features. Ref. [73] proposed a low-level
and used to compute adjustable integral kernels            geometric feature approach, which extracted region-
(RAIKs) in Ref. [83]. A sequence of RAIKs generated        based histogram descriptors from a facial scan.
from the surface patch around each keypoint can            Feature regions include the nose and the eyes-and-
be represented by 2D images such that certain              forehead, which are comparatively less affected by
characteristics of the represented 2D images have          the deformation caused by facial expressions. A
a positive impact on matching accuracy, speed, and         support vector machine (SVM) and fusion of these
robustness.                                                descriptors at both feature and score level were
   Ref. [88] introduced nasal patches and curves. First,   applied to improve accuracy. In Ref. [79], a covariance
seven landmarks in the nasal region were detected.         matrix of features was used as the descriptor for 3D
A set of planes was created using pairs of landmarks.      shape analysis, rather than the features themselves.
A set of spherical patches and curves were yielded by      Compared to feature-based vectors, covariance-based
the intersection of these planes with the nasal surface    descriptors can fuse and encode all types of features
to create the feature descriptor. Then the feature         into a compact representation [90]. Their work was
vectors were taken by concatenating histograms of x,       expanded in Ref. [90].
y, and z components of the surface normal vectors
                                                              There are other local methods. In Ref. [74],
of Gabor-wavelet filtered depth maps. Features were
                                                           local surface descriptors were constructed around
selected by a genetic algorithm for stability under
                                                           keypoints, which were defined by checking the curvelet
changes in facial expression. Compared to previous
                                                           coefficient in each sub-band. Each keypoint is
methods, this method shows excellent separability.
                                                           represented by multiple attributes, such as curvelet
Recently, Ref. [93] presented a geometry and local
                                                           position, direction, spatial position, scale, and size.
shape descriptor based on the wave kernel signature
                                                           A set of rotation-invariant local features can be
(WKS) [103], to overcome distortions caused by facial
                                                           obtained by rearranging the descriptors according
expressions.
                                                           to the orientation of the key points. The method in
4.2.4   Region-based methods                               Ref. [84] used the regional boundary sphere descriptor
A representative local descriptor is the local binary      (RBSR) to reduce the computational cost and improve
pattern (LBP) [104]. It was initially used for 2D          the classification accuracy.
images. Local geometric features extracted from               Ref. [91] proposed a local derivative pattern (LDP)
certain regions of the face surface can be robust to       descriptor based on local derivative changes. It
face expression variations [13]. LBPs were used to         can capture more detailed information than LBP.
represent the facial depth and normal information of       Recently, Yu et al. [105] recommended utilizing the
each face region in Ref. [72], where a feature-based       iterative closest point (ICP) with resampling and
3D face division pattern (see Fig. 11 in Ref. [72])        denoising (RDICP) method to register each face patch
3D face recognition: A comprehensive survey in 2022                                                            667
to achieve high registration accuracy. With rigid           is based on a vertical symmetry plane determined by
registration, all face patches can be used to recognize     the nose tip and nose orientation. A 3D point cloud
the face, significantly improving accuracy as they are      surface is transformed into a face coordinate system
less sensitive to expression or occlusion.                  and PCA-LDA is used to extract features from the
4.2.5   Summary                                             range image obtained from the newly transformed
Most local methods can better handle facial                 data. Ref. [107] presented a method named UR3D-C,
expression and occlusion changes as they use salient        which used LDA to train the dataset and compress
points and rigid feature regions, such as nose and          the biometric signature to only 57 coefficients. It
eyes, to recognize a face. The main objective of local      still shows high discrimination with these compact
methods is to extract distinctive compact features          feature vectors. Bounding sphere representation
[13]. We summarize local methods as follows:                (BSR), introduced in Ref. [108], was used to represent
• Keypoint-based methods can process partial face           both depth and 3D geometric shape information by
    images with missing parts or occlusion, since the       projecting preprocessed 3D point clouds onto their
    feature representations are generated from a set        bounding spheres.
    of keypoints and their geometric relationships.            Shape-based spherical harmonic features (SHF)
    However, if the number of keypoints is excessive,       were proposed in Ref. [109], where SHFs were
    the computational cost increases; if the keypoints      calculated based on the spherical depth map (SDM).
    are too few, some key features will be lost and         SHFs can capture the gross shape and fine surface
    recognition performance is affected. In addition,       details of a 3D face as the strengths of spherical
    algorithms for measuring the neighborhoods of           harmonics at different frequencies. Ref. [110] used
    keypoints play an important role as the geometric       2DPCA to extract features and employed Euclidean
    relationships of keypoints are used to build feature    distances for matching. Ref. [111] proposed a
    vectors.                                                computationally efficient and simple nose detection
• Most curved-based methods use radial curves               algorithm. It constructs a low-resolution wide-nose
    since level curves are sensitive to occlusion and       eigenface space using a set of training nose regions.
    missing parts. Generally, a reference point is          A pixel in an input scan is determined to be the nose
    required in a curve-based method. The nose              tip if the mean square error between the candidate
    region is rigid and has more distinctive shape          feature vector and its projection on the Eigenface
    features than other regions, so the nose tip is         space is less than a predefined threshold.
    used as the reference point in most curve-based            Ref. [112] introduced a rigid-area orthogonal
    methods [13]. Therefore, its detection is a crucial     spectral regression (ROSR) method, where curvature
    step. Inaccurate positioning of the nose tip can        information was used to segment rigid facial areas and
    affect the extraction of curves and compromise          OSR was utilized to extract discriminative features.
    results of the face recognition system.                 In Ref. [113], a 3D point cloud was registered in the
• Most region-based methods are robust to changes           inherent coordinate system with the nose tip as the
    in facial expression and pose as the feature vectors    origin, and a two-layer ensemble classifier was used for
    are extracted from rigid regions of the face surface.   face recognition. A local facial surface descriptor was
    Some also need highly accurate nose tip detection       proposed by Ref. [114]. This descriptor is constructed
    as the nose tip is used for face segmentation.          based on three principal curvatures estimated by
                                                            asymptotic cones. The asymptotic cone is an essential
4.3     Global methods                                      extension of an asymptotic direction to a mesh model.
Unlike local methods, global methods extract features       It allows the generation of three principal curvatures
from the entire 3D face surface. They are very effective    representing the geometric characteristics of each
and can perform well given complete, frontal, fixed-        vertex.
expression 3D faces. Table 3 summarizes noteworthy             Ref. [115] proposed a region-based 3D deformable
endeavors in this area.                                     model (R3DM), which was formed from densely
   An intrinsic coordinate system for 3D face               corresponding faces. Recently, kernel PCA was used
registration was proposed by Ref. [106]. This system        for 3D face recognition. As faces exhibiting non-linear
668                                                                                                                Y. Jing, X. Lu, S. Gao
Spreeuwers (2011) [106] PCA-LDA Less registration time Expression, occlusion FRGC v2 99
                                                                                                                SHREC2007    97.86
  Liu et al. (2012) [109]        —                           Faster, cost-effective     Expression, occlusion   FRGC v2      96.94
                                                                                                                Bosphorus    95.63
Tang et al. (2015) [114] Principal curvatures Computational cost Expression, occlusion FRGC v2 93.16
Peter et al. (2019) [116] Kernel-based PCA Higher accuracy rate — FRGC v2 —
Alyüz et al. (2012) [119] ICP, PCA, LDA Occlusion Expression Bosphorus 83.99
Bagchi et al. (2014) [122] ICP, PCA Pose, occlusion Pose Bosphorus 91.3
                                                                                                                GavabDB       96.92
  Bagchi et al. (2015) [123]     ICP, KPCA                   Pose                       Expression              Bosphorus     96.25
                                                                                                                FRAV3D        92.25
shapes, non-linear PCA was used in Ref. [116] to                          pose and scale may affect recognition accuracy
extract 3D face features, as it has notable benefits                      when using global features, as global algorithms
for data representation in high-dimensional space.                        create discriminating features based on all visible
  To sum up, most global methods have faster                              facial shape information. This requires accurate
speed and lower computational demands, but they                           normalization for pose and scale. However, it is not
are unsuitable for handling occluded faces or faces                       easy to perform accurate pose normalization given
with missing parts. In addition, variations in                            noisy or low-resolution 3D scanning.
3D face recognition: A comprehensive survey in 2022                                                         669
4.4    Hybrid methods                                   become one of the most popular approaches for face
Hybrid face recognition systems use both local          recognition. Compared to conventional approaches,
features and global features. A comparison of recent    deep learning-based methods have great advantages
hybrid methods is provided in Table 4.                  over image processing [125].        For conventional
  Ref. [117] used an automatic landmark detector        methods, the key step is to find robust feature
to estimate poses and detect occluded areas, and        points and descriptors based on geometric information
utilized facial symmetry to deal with missing data.     in 3D face data [51].          Compared to end-to-
Ref. [118] proposed a hybrid matching scheme using      end deep learning models, these methods have
multiscale extended LBP and SIFT-based strategies.      good recognition performance, but involve relatively
In Ref. [119], the problem of external occlusion was    complex algorithmic operations to detect key features
addressed and a two-step registration framework was     [51]. For deep learning-based methods, robust face
proposed. First, a non-occluded model is selected       representations can be learned by training a deep
for each face with the occluded parts removed.          neural network on large datasets [51], which can
Then a set of non-occluded distinct regions is used     hugely improve face recognition speed.
to compute the masked projection. This method              There are a variety of deep neural networks for
relies on accurate nose tip detection, and results      facial recognition. Convolutional neural networks
are adversely affected if the nose area is occluded.    (CNN) are the most popular. The robust and
Ref. [121] extended this work in 2013. Ref. [120]       discriminative feature representations learned via
proposed a scale-space-based representation for 3D      CNN can significantly improve the accuracy of
shape matching which is stable in the presence of       face recognition, as demonstrated by Refs. [42, 51].
surface noise.                                          Recently, graph convolutional networks (GCNs) have
  In Ref. [122], Bagchi et al. used ICP to register a   also been considered in the face recognition field to
3D range image and PCA to restore the occluded          solve the problem of large face deformation in real life.
region. This method is robust to the noise and          GCNs utilize filters to identify high-level similarities
occlusion. Later, they improved the registration        between nodes by extracting high-dimensional features
method and proposed an across-pose method in            of nodes and their neighborhoods in a graph [58].
Ref. [123]. Ref. [124] also proposed a 3D face             Figure 4 depicts a common face recognition process
recognition method with pose-invariant and a coarse-    based on a deep-CNN (DCNN). In the training
to-fine approach to detect landmarks under large yaw    phase, the training dataset is preprocessed (see
variations. At the coarse search step, HK curvature     Section 3) to generate a unified feature map. The
analysis is used to detect candidate landmarks          feature map is resized to fit the input tensor to the
and subdivide them according to the classification      DCNN architecture (in terms of the height, width,
strategy based on facial geometry. At the fine          and number of channels of the input network layer,
search step, the candidate landmarks are identified     and the number of images). Then, the DCNN is
and marked by comparison with the face landmark         trained with the preprocessed maps. In the testing
model.                                                  phase, a 3D face scan is selected for each identity
  Hybrid 3D face recognition methods may use more       from the test dataset as the identity dataset. The
complex structures than local or global methods, and    feature representations of the identity dataset are
as a result, may achieve better recognition accuracy    obtained through the trained network as the feature
at a higher computational cost. As in global methods,   library. Then, the feature vector of the probes
face registration is an important step for hybrid 3D    is obtained from the trained DCNN and used to
methods, especially for overcoming pose variation       match the features in the given gallery. In the
and occlusion.                                          matching process, the feature gallery is scanned and
                                                        the distance between each feature representation and
5     Deep learning-based 3D face recognition           the feature vector of the probed surface is calculated.
                                                        The identity with the closest matching distance is
5.1    Overview
                                                        then returned.
In the last decade, deep neural networks have              With the application of CNN, the accuracy of 2D
670                                                                                                   Y. Jing, X. Lu, S. Gao
face recognition systems (DeepFace [126], DeepID              Table 5 Network architectures based on 3D deep learning techniques
series [127–130], VGG-Face [16], FaceNet [131])                  Category         Backbone/architecture          Reference
has significantly improved. In these systems, face
                                                                                        VGG-Face                 [51, 134]
representations are learned directly from 2D facial
                                                                                          ResNet                [5, 135, 136]
images by training deep neural networks on large                 2D-input
                                                                                        MobileNet                  [137]
datasets. Accuracy is close to 100% on some specific
                                                                                          Others               [42, 138–144]
databases (such as LFW). The high recognition
rate for 2D face recognition shows that CNN-                     3D-input         PointNet++, PointFace          [145–148]
based methods are superior to the conventional
                                                                 Graph-input               GCN                     [149]
feature extraction methods. Based on the intrinsic
advantages of 3D faces relative to 2D faces in
handling uncontrolled conditions such as changes in           5.2     2D-input networks
pose, illumination, and expression, researchers are           Kim et al. [51] proposed the first 3D face recognition
attracted to applying DCNNs to 3D face recognition.           model with DCNN. They adopted VGG-Face [16]
   Indeed, some of these 2D face recognition networks         pre-trained on 2D face images as their network, and
are still being used by some 3D methods. In such 3D           then fine-tuned the network with augmented 2D depth
face recognition methods, 3D faces are converted into         maps. The last FC layer of VGG-Face is replaced
2D maps as input of the network. Other networks               by a new last FC layer and a softmax layer. In the
directly accept 3D data as input, such as PointNet            new last layer, weights are randomly initialized using
[132], PointNet++ [133]. Based on the input formats           a Gaussian distribution with a mean of zero and a
of the network, we classify deep learning-based 3D            standard deviation of 0.01. The size of the dataset
face recognition methods into three categories: 2D-           is expanded by augmenting the 3D point cloud of
input, 3D-input, and graph-input networks. Table 5            face scans with expression and pose variations during
lists these approaches; details of these methods are          the training phase. A multi-linear 3DMM is used
discussed below.                                              to generate more data, including variations in both
3D face recognition: A comprehensive survey in 2022                                                                                671
                             Fine-tuning
 Kim et al. (2017) [51]                       3DMM                     —                  Cosine distance     Bosphorus     99.2
                             VGG-Face
                             Fine-tuning                                                                      CurtinFace
 Ding et al. (2019) [134]                     —                        —                  SVM                               93.41
                             VGG-Face                                                                         [150]
                                                                                          Euclidean
 Xu et al. (2019) [138]      LeNet5           —                        —                                      CASIA          —
                                                                                          distance
                                              Randomly occlude
                                                                        AMSoftmax
 Tan et al. (2019) [136]     ResNet-18        depth-maps with 1-6                         Cosine distance     CASIA         99.7
                                                                        [151]
                                              patches
                                              Pose augmentation,
                                                                        Cross entropy
 Mu et al. (2019) [140]      MSFF             shape jittering, shape                      Cosine distance     Lock3DFace    84.22
                                                                        loss
                                              scaling
                                                                                          Linear SVM
 Dutta et al. (2020) [141]   SpPCANet         —                        —                                      Frav3D        96.93
                                                                                          [152]
                                              Generate depth
 Lin et al. (2021) [143]     MQFNet           image by pix2pix          Weighted loss     —                   Lock3DFace    86.55
                                              [153]
                             Pre-ResNet-34,
                                              Resolution                Multi-scale       Euclidean
 Cai et al. (2019) [5]       Pre-ResNet-24,                                                                   FRGC v2        100
                                              augmentation              triplet loss      distance
                             Pre-ResNet-14
                                                                        Feature
 Jiang et al. (2021) [148]   PointFace        Random crop                                 Cosine distance     Lock3DFace    87.18
                                                                        similarity loss
Zhang et al. (2021) [145] PointNet++ GPMM-based Triplet loss Cosine similarity Bosphorus 99.68
 Papadopoulos et al.
                             Face-GCN         —                        —                  —                   BU4DFE        88.45
 (2021) [149]
shape (α) and expression (β). A 3D point cloud can                     cloud to produce pose variations. During data
be represented by                                                      preprocessing, a nose tip is first found in the 3D
               X = X + Ps α + Pe β             (1)                     point cloud, and then the 3D point cloud is cropped
where X is the average facial point cloud, Ps is                       within a 100 mm radius. Classical rigid-ICP [157]
the shape information provided by the Basel Face                       between the cropped 3D data and the reference face
Model [155], and Pe is an expression provided by                       model is used to align the 3D data. In order to fit
FaceWarehouse [156]. Expression variations are                         the input size of the CNN architecture, the aligned
created by randomly changing the expression β                          3D data are orthogonally projected to 2D images to
parameter in the 3DMM. Randomly generated rigid                        generate a 224×224×3 depth map. In addition, eight
transformations are applied to the input 3D point                      18 × 18 patches are randomly placed on the depth
672                                                                                         Y. Jing, X. Lu, S. Gao
map to simulate occlusion and prevent overfitting to      maps built from 3D raw data. The outputs of
specific regions of the face. The model was evaluated     the two feature layers are fused as the final input
on three public 3D databases: Bosphorus [30], BU3D-       to an artificial neural network (ANN) recognition
FE [26], and 3D-TEC [35], yielding recognition rates      system. It was tested on CASIA (V1) and compared
of 99.2%, 95.0%, and 94.8%, respectively.                 recognition rates using the 2D feature layer, 3D
   A deep 3D face recognition network (FR3DNet) [42]      feature layer, and the fusion of both layers. A higher
was trained on 3.1 million 3D faces; it is specifically   RR1 (98.44%) was obtained with fused features.
designed for 3D face recognition. It is also based on        Xu et al. [138] also designed a dual neural network
VGG-Face [16]. A rectifier layer is added for every       to reduce the number of training samples needed. The
convolutional layer. Compared to Kim et al.’s work        network consists of a dual-channel input layer that
[51], a much larger dataset is generated and expanded     can fuse a 2D texture image and a 3D depth map into
by new identities. A new face F̂ is generated from        one channel, and two parallel LeNet5-based CNNs.
a pair of faces (Fi , Fj ) with the maximum non-rigid     Each CNN processes the fused image separately to
shape difference:                                         obtain its feature maps, which are used to calculate
                    F̂ = (Fi + Fj )/2               (2)   similarity. The gray-scale depth map obtained from
The synthetic faces generated by this method have         the point cloud, combined with the corresponding
richer shape changes and details than statistical face    2D texture, is used as the dual-channel input. The
models [155]. However, the computational cost is very     most important preprocessing step is face hole filling,
high as they are all generated from high-dimensional
                                                          to provide a more intact face. The basic idea is to
raw 3D faces. In addition, 15 synthetic cameras are
                                                          first extract the 3D hole edge points, then project the
deployed in the frontal hemisphere of the 3D face to
                                                          hole edge points onto the 2D mesh plane to fill the
simulate pose variations and occlusions in each 3D
                                                          hole points, and map them back to the original 3D
scan. To fit the input of FR3DNet, the 3D point
                                                          point cloud. Experiments were conducted to show
cloud data are preprocessed to a 160 × 160 × 3 image
                                                          the influence of depth map features and the size of
[155]. Before aligning and cropping the face, the
                                                          training set on the accuracy of recognition rate.
point cloud is converted into a three-channel image.
                                                             Tan et al. [136] designed a framework to specifically
These three channels indicate three surfaces generated
                                                          process the low-quality 3D data captured by portable
by using the gridfit algorithm [158] to give depth
                                                          3D acquisition hardware such as mobile phones. The
map z(x, y), azimuth map θ(x, y), and elevation map
                                                          framework includes two parts: face registration and
φ(x, y), where θ and φ are azimuth and elevation
angles of the normal vectors of the 3D point cloud        face recognition. At the face registration stage, a
surface, respectively. Experiments were conducted         PointNet-like deep registration network (DRNet) is
on most public databases; the highest recognition         used to reconstruct the dense 3D point cloud from low-
accuracy was achieved on the Texas-3D database,           quality sequences. The DRNet is based on ResNet-18
reaching 100%.                                            and takes a pair of 256 × 256 × 3 coordinate-maps
   Ding et al. [134] proposed an SVM based RGB-           as input. To obtain the desired sparse samples from
D face recognition algorithm combining 2D color           the raw datasets, noise and random pose variation
and 3D depth features. A fine-tuned VGG-face              are added to the face scan. Then the new point
network is used to determine 2D features from color       cloud is projected onto a 2D plane with 1000 grid
images. 3D geometric features are obtained by             cells of the same size. A sparse face of 1000 points is
computing expression-invariant geodesic distances         obtained by randomly selecting a point from each cell.
between facial landmarks on a 3D face mesh. The 2D        Six sparse faces are generated from each face scan
and 3D features are then used as RGB-D classifiers        and passed to DRNet to generate a new dense point
to train the SVM. Experiments were performed on           cloud. Then the fused data are used as the input
the CurtinFace [150] database, achieving good results     to a face recognition network (FRNet) also based on
with pose variations (93.41%) and neutral expression      ResNet-18. Compared to FR3DNet, its facial RR1
variations (100%).                                        on UMBDB is higher, reaching 99.2%.
   Feng et al. [139] adopted a two-DCNN module               Mu et al. [140] proposed a lightweight CNN for
to extract features from color images and depth           3D face recognition, especially for low-quality data.
3D face recognition: A comprehensive survey in 2022                                                           673
This network contains 4 blocks with 32, 64, 128, and       input data instead of pure facial depth maps. The
256 convolution filters. The feature maps from these       selection of geometric feature descriptors is based on
four convolutional blocks are captured by different        the GH-EXIN network. The reliability of geometric
receptive fields, downsampled to fixed size by max-        descriptors based on curvature is demonstrated in
pooling and integrated to form another conversion          Ref. [159]. The input is a three-channel image
block. This process is completed by a multi-scale          including the 3D facial depth map, the shape index,
feature fusion module. The aim is to efficiently           and the curvedness, which enhances the accuracy of
improve the representation of low-quality face data.       the network. A 97.56% RR1 was achieved on the
A spatial attention vectorization (SAV) module is          Bosphorus database.
used to replace the global average pooling layer (also        Dutta et al. [141] also proposed a lightweight sparse
used by ResNet) to vectorize feature maps. The SAV         principal component analysis network (SpPCANet).
highlights important spatial facial clues and conveys      It includes three parts: a convolutional layer, a
more discriminative cues by adding an attention            nonlinear processing layer, and a feature merging
weight map to each feature map. In addition,               layer. For data preprocessing, usual methods are
three methods are used to augment the training             used to detect and crop the face area. First, an ICP-
data: pose generation (by adjusting virtual camera         based registration technology is used to register a 3D
parameters), shape jittering (by adding Gaussian           point cloud, and then the 3D point cloud is converted
noise to simulate rough surface changes), and shape        into a depth image. Finally all faces are cropped to
scaling (by zooming in 1.1× to the depth face image).      rectangles based on the position of the nose tip. The
                                                           system obtained a 98.54% RR1 on Bosphorus.
During data preprocessing, as in the above methods,
                                                              Lin et al. [135] adopted ResNet-18 [17] as the
a 10 × 10 patch surface is first cropped around the
                                                           backbone of their network. The big difference
given nose tip with outlier removal. Then the cropped
                                                           from other work is their data augmentation method.
3D point cloud is projected onto a 2D space (depth
                                                           Instead of generating 3D face samples from 2D face
surface) to generate a normal map image.
                                                           images, they generated feature tensors directly based
   Lin et al. [143] also designed a multi-quality fusion
                                                           on Voronoi diagram subdivision. The salient points
network (MQFNet) to improve the performance of
                                                           are detected from a 3D face point cloud with its
low-quality 3D face recognition. First, the pix2pix
                                                           corresponding 2D face image and divided into 13
network [153] is used to generate high-quality depth
                                                           subdivisions based on the Voronoi diagram. The face
maps from low-quality faces. To avoid the effect
                                                           can be expressed as F = [f1 , · · ·, f13 ] and the sub-
of loss of identity features in images generated by
                                                           feature is SubFi . The feature tensor is extracted
pix2pix, MQFNet contains two pipelines to extract          from a 3D mesh by detecting the salient points and
and fuse features from low-quality and high-quality        integrating features of all the salient points, which
images and to generate more discriminative features.       can be represented as
This work was also tested on the Lock3DFace
database. Compared to Ref. [140], the average                       F k = ∪13       k
                                                                           i=1 SubFi ,     k = 1, · · ·, K     (3)
accuracy was improved by 8.11%.                            where K is the number of 3D face samples of the
   Olivetti et al. [137] proposed a method based on        same person. A new feature set can be synthesized
MobileNetV2. MobileNet is a comparatively new              by randomly choosing the ith sub-feature set from the
neural network specifically designed for mobile phones.    K samples. Excellent results were achieved on both
It is easy to train and requires only a few parameters     Bosphorus and BU3D-FE databases with accuracies
to be tuned. This work was based on the Bosphorus          of 99.71% and 96.2%, respectively.
database, which only contains 105 identities with             Cai et al. [5] designed three deep residual networks
4666 images. To obtain sufficient training samples,        with different layers based on ResNet: Pre-ResNet-
they augmented the data by rotating the original           14, Pre-ResNet-24, and Pre-ResNet-34. Multi-scale
depth map (clockwise 25circ , counterclockwise 40circ )    triplet loss supervision is constructed by combining a
and horizontally minoring each depth map. The              softmax loss and the two triplet losses as supervision
most important part of their work is the input             on the last fully connected layer and the last feature
data for DCNN. Geometric descriptors are used as           layer. To enlarge the size of the training set, the data
674                                                                                                Y. Jing, X. Lu, S. Gao
are augmented in three ways: pose augmentation             guided face recognition module. DepthNet is used
based on 3D scans, resolution augmentation, and            to convert 2D face datasets to RGB-D face datasets
transformational augmentation based on the range           and to address the a lack of identities in 3D face
images. For the preprocessing algorithm, raw 3D            datasets for network training. The augmented 3D
data are converted into a 96 × 96 range image and          database is used to train a mask-guided RGB-D
only the center of the two pupils and nose tip are         face recognition network, which has a two-stream
used for alignment. For the preprocessing algorithm,       multi-head architecture with three branches: an
three overlapping face components (the upper half          RGB recognition branch, a depth map recognition
face, the small upper half face, and the nose tip)         branch, and an auxiliary segmentation mask branch
and the entire facial region are generated from the        with a spatial attention module. The latter shares
raw 3D data. The most important part of this               weights between the two recognition branches and
method is detecting the nose tip and two pupils.           provides auxiliary information from the segmentation
Three landmarks are detected from the 2D textured          branch to help the recognition branches focus on
image of the corresponding 3D face data and are            informative parts such as eyes, nose, eyebrows, and
mapped to the 3D model. Then, a new nose tip is            lips. This module achieved good results on multiple
calculated by taking the highest point of the nose         databases and had a higher average accuracy (96.43%)
region (centered on the tip of the nose with a radius of   on Lock3DFace than Refs. [140, 143].
25 mm). The nose tip is re-detected on the 3D model        5.3   3D-input networks
as the 2D domain detection may reduce detection            Bhople et al. [146] utilized a Siamese network with
accuracy due to pose variations. Another reason for        PointNet-CNN (PointNet implementation with CNN)
detecting the nose tip by this means is that the lower     to determine the similarity and dissimilarity of point
dimensional feature vectors generated can be used to       cloud data. In Ref. [147], they continued their work
detect the new nose tip to reduce computational cost.      and proposed a triplet network with triplet loss, in a
Finally, the feature vectors of the four patches can be    variant of the Siamese network. The triplet network
used alone or in combination for matching. It obtained     is a shared network consisting of three parallel
high accuracy on four public 3D face databases: FRGC       symmetric CPNs using the PointNet architecture
v2, Bosphorus, BU-3DFE, and 3D-TEC, with 100%,             and CNN. The input to the network is a triplet of
99.75%, 99.88%, and 99.07%, respectively.                  three 3D face scans: positive, anchor, and negative.
  Cao et al. [142] believed that the key to a reliable     Since the input is in the form of a triplet, more
face recognition system is rich data sources and           information is captured during training. The Siamese
paid more attention to data acquisition. Therefore,        network and the triplet network were compared in
a holoscopic 3D (H3D) face image database was              terms of recognition rates on the Bosphorus and IIT
created, which contains 154 raw H3D images. H3D            Indore databases. The triplet network achieved better
imaging was recorded by using a regularly closely          accuracy on the IIT Indore database, but not on
packed array of small lenses connected to a recording      Bosphorus. They also performed point cloud-level
device. It can display 3D images with continuous           data augmentation by rotating the point cloud data
parallax and full-color images can be viewed from a        in a fixed orientation by randomly perturbing the
wider viewing area. A wavelet transform is used for        points by a small rotation and jittering the position
feature extraction, as it performs well in the presence    of each point slightly.
of illumination change and face orientation change,           PointFace [148] consists of two weight-shared
reducing image information redundancy and retaining        encoders that extract discriminative features from a
the most important facial features. While this is a        pair of point cloud faces. In the training phase, each
new direction for 3D face recognition, the accuracy        encoder learns identity information from each sample
of this method is quite low, only reaching just over       itself, while using feature similarity loss to evaluate
80% on the H3D database.                                   the embedding similarity of two samples. The feature
  Chiu et al. [144] applied an attention mechanism         similarity loss function can be represented by
                                                                           M
in a face recognition system that has two main parts:
                                                                                 [D(fia , fip ) + m − D(fia , fin )]
                                                                           X
a depth estimation module (DepthNet) and a mask-                  Lsim =                                               (4)
                                                                           i=1
3D face recognition: A comprehensive survey in 2022                                                           675
where fia , fip , and fin , i = 1, · · ·, M , are the L2   recognition accuracy and run quickly. For example,
normalized feature vectors of the anchor, positive, and    Ref. [42] gets 100% RR1 on Texas-3D and Ref. [5] only
negative samples, respectively. D(., .) is the distance    requires 0.84 s to identify a target face from a gallery
between two vectors. The encoder can distinguish           of 466 faces. There are three important parts in a
3D faces from different individuals and compactly          deep learning-based system: data preprocessing, data
cluster features coming from faces of the same person.     augmentation, and network architecture. Usually, the
Compared to Refs. [140, 143], the model has higher         input data need to be preprocessed (face registration)
average accuracy (87.18%) on Lock3DFace.                   to find correspondences between all vertices of the
  Zhang et al. [145] proposed a 3D face point cloud        face mesh, since CNNs are usually intolerant of
recognition framework based on PointNet++. It              pose changes. Deep learning-based methods always
consists of three modules: training data generator,        require a large amount of data to train the network,
face point cloud network, and transfer learning. The       especially if training the network from scratch. To
most important part of this work is the training           avoid this, some works [51, 163] transfer learning
data generator. All training sets are unreal data,         from a pre-trained model and fine-tune the network
synthesized by sampling from a statistical 3DMM of         on a small dataset, which also takes less training time.
face shape and expression based on a GPMM [160].           But lacking large-scale 3D face datasets is still an
This method addresses the problem of lack of a large       open problem for DCNN-based 3D face recognition
training dataset. After classification training, triplet   research. Data augmentation is an important method
loss is used to fine-tune the network with real faces      to enlarge 3D face databases by generating new faces
to give better results.                                    from existing ones. In addition, adopting a suitable
                                                           network is important. Most of the above-reviewed
5.4   Graph-input networks
                                                           works use a single CNN but a few use dual CNNs, such
Papadopoulos et al. [149] introduced a registration-       as Ref. [138]. As more networks are adopted in this
free method for dynamic 3D face recognition based          field, reorganization of existing network architectures
on spatiotemporal graph convolutional networks (ST-        may also be a topic of future research.
GCN). First, facial landmarks are estimated from a
3D mesh. Landmarks alone are insufficient for facial
                                                           6   Discussion
recognition because crucial geometric and texture
information are left out. To describe local facial         In the past decade, 3D face recognition has achieved
patterns around landmarks, new points are first            significant advances in 3D face databases, recognition
interpolated between the estimated landmarks, and          rates, and robustness to face data variations, such as
then a kD-tree search is used to find the closest          low-resolution, expression, pose, and occlusion. In
points to each landmark. For each frame, a facial          this paper, conventional methods and deep learning-
landmark corresponds to a vertex (vi ∈ V ) in a graph      based methods have been thoroughly reviewed in
(G = V, E), and the landmarks are connected as             Sections 4 and 5, respectively. Based on the feature
spatial edges according to a defined relationship. For     extraction algorithms, conventional methods are
the 3D sequences of meshes, identical landmarks are        divided into three types: local, global, and hybrid
connected in consecutive frames as temporal edges.         methods.
This work was tested on BU4DFE with an average             • Local feature descriptors extract features from
accuracy of 88.45%. The performance is not as good             small regions of a 3D facial surface. In some cases,
as other state-of-the-art methods [42, 161, 162], but it       the region can be reduced to small patches around
demonstrates the feasibility of using GCN for dynamic          detected keypoints. The number of extracted
3D face recognition.                                           local descriptors is related to the content of the
                                                               input face (entire or partial). It is commonly
5.5   Summary                                                  assumed that only a small number of facial
This section has reviewed deep learning-based 3D               regions are affected by occlusion, missing data,
face recognition techniques and classified them into           or distortion caused by data corruption, while
three categories based on their network input formats.         most other regions persist unchanged. Face
Most deep learning-based 3D methods achieve high               representation is derived from a combination of
676                                                                                                          Y. Jing, X. Lu, S. Gao
    many local descriptors. Therefore, local facial                 face recognition systems, the following (future)
    descriptors are not compromised when dealing                    directions are suggested, concerning new face data
    with changes to a few parts caused by facial                    generation, data preprocessing, network design, and
    expressions or occlusion [87].                                  loss functions.
• A global representation is extracted from an entire               • Large-scale 3D face databases. Current 3D
    3D face, which usually makes global methods                         face databases are often smaller than their
    compact and therefore computationally efficient.                    counterparts in 2D color face recognition; nearly
    While these methods can achieve great accuracy                      all the deep learning-based 3D face recognition
    in the presence of complete neutral faces, they                     methods fine-tune pre-trained networks on
    rely on the availability of full face scans and are                 converted data from 3D faces. Larger-scale 3D
    sensitive to face alignment, occlusion, and data                    face databases could enable training from scratch
    corruption.                                                         and improve recognition difficulty, closing the gap
• Hybrid methods can handle more conditions, such                       to real-world applications.
    as pose and occlusion variations.                               • Augmenting face data. As Section 5 notes, almost
  Since 2016, much research on deep learning-based                      every proposed method provides a strategy for
3D face recognition has been carried out. Table 7                       augmenting face training data, as a large amount
summarizes the RR1 of our surveyed methods tested                       of training data are required to train networks.
on different databases. Compared to conventional                        A network trained with sufficient data can better
face recognition algorithms, deep learning-based                        distinguish features, while a small number of
methods have the advantages of simpler pipelines                        samples may result in overfitting. We can increase
and greater accuracy.                                                   the size of the 3D database by generating more
  To improve the accuracy and performance of                            images for existing identities or synthesizing new
  Table 7    RR1 (%) of deep learning-based methods on various databases. H=high-quality image. L=low-quality image. F=fine-tuning
                 FRGC      BU3D-     BU4D-                                      Texas-      3D-                ND-
 Reference                                     Bosphorus   CASIA     GavabDB                        UMBDB               Lock3DFace
                  v2        FE        FE                                         3D        TEC                 2006
[42] 97.06 98.64 95.53 96.18 98.37 96.39 100 97.90 91.17 95.62 —
[42] (F) 99.88 99.96 98.04 100 99.74 99.70 100 99.12 97.20 99.13 —
[139] — — — — 85.93 — — — — — —
[137] — — — 97.56 — — — — — — —
[143] — — — — — — — — — — 86.55
[147] — — — 97.55 — — — — — — —
[148] — — — — — — — — — — 87.18
 [149]             —         —        88.45       —          —         —          —         —         —         —           —
3D face recognition: A comprehensive survey in 2022                                                          677
    identities. Common ways to generate new images             recognition, highly discriminative features are
    are: rotating and cropping existing 3D data, or            required because the difference between two faces
    using 3DMM to slightly change the expression.              may be small, such as in twins. Therefore,
    To generate new identities, some model is                  applying loss functions to supervise the network
    designed to synthesize new faces from existing             layers has become one active research topic.
    identities [42, 155, 164, 165]. Recently, generative       For example, Ref. [5] adopted multi-scale loss
    adversarial networks (GANs) have been used                 supervision to improve extraction efficiency by
    for face augmentation where a face simulator               combining one softmax loss and two triplet losses.
    is trained to generate realistic synthetic images.       In addition to the above issues, researchers
    Recent works are summarized in Section 3.              can consider combining conventional methods with
•   Data preprocessing. This is also key to improving      CNNs. For example, keypoint detection techniques
    face recognition accuracy. Besides removing            in conventional 3D face recognition methods could be
    redundant information, another goal of data            incorporated into the deep learning-based methods to
    preprocessing is to perform registration. A well-      better pay attention to the area of interest. 3D face
    known problem of rigid-ICP registration is that        recognition methods for low-quality (low-resolution)
    it cannot guarantee optimal convergence [51]:          data also need more work.
    it may not be possible to accurately register            To apply 3D face technology to real-world
    all 3D faces in different poses to the reference       applications, several things need to be considered:
    face. For 3D input networks, all 3D faces are          recognition time, quality of the input data, and pose
    taken as point-to-point correspondences for non-       and expression variations of the subject. Lightweight
    rigid registration. As an alternative, Ref. [149]      networks [140, 141] can reduce recognition time
    proposed a registration-free method based on           and improve efficiency. In Refs. [136, 140, 143],
    GCN to avoid this step. However, further work is       representations of low-quality face data are improved
    needed to improve face recognition performance.        by fusing features from high-quality images. To handle
•   Data conversion. As Section 5 explains, some           pose and expression variations, the network can be
    works are based on 2D-input networks. To use           trained using face datasets with rich expressions and
    them, better conversion techniques (e.g., from 3D      pose changes to improve its robustness. Furthermore,
    faces to 2D maps) would improve face recognition       dynamic 3D face recognition using 3D face sequences
    performance.                                           as input should be considered in the future.
•   Network architecture.        Many networks are
    available for 3D face recognition (see Table 5).       7   Conclusions
    Some researchers directly adopt pre-trained
    networks and then fine-tune them using training        3D face recognition has become an active and popular
    data generated from 3D faces, which can greatly        research topic in the field of image processing and
    improve the training speed. Also, dual or multiple     computer vision in recent years. In this paper, a
    networks can be used to handle different tasks,        summary of public 3D face databases is first provided,
    as did Refs. [5, 138].                                 followed by a comprehensive survey on 3D face
•   Appropriate loss functions.        Using effective     recognition methods proposed in the past decade.
    loss functions can reduce the complexity of            They are divided into two categories based on their
    training and improve feature learning capabilities.    feature extraction methods: conventional and deep
    Most loss functions share a similar basic idea         learning-based.
    and aim to facilitate the training process by            Conventional techniques are further classified into
    amplifying discriminative features from different      local, global, and hybrid methods. We have reviewed
    individuals and compacting clustering features         these methods by comparing their performance
    from the same individual. A commonly used loss         on different databases, computational cost, and
    function is the softmax loss, which encourages         robustness to expression change, occlusion, and pose
    separability between classes but is incapable of       variation. Local methods can better handle face
    supporting compactness within classes. For face        expressions and occluded images at the cost of greater
678                                                                                                   Y. Jing, X. Lu, S. Gao
computation than global methods. Hybrid methods                       on deeply learned face representation. Neurocomputing
can achieve better results and address challenges such                Vol. 363, 375–397, 2019.
as pose variation, illumination change, and facial                 [6] Zhou, S.; Xiao, S. 3D face recognition: A survey.
expressions.                                                           Human-Centric Computing and Information Sciences
  We have reviewed recent advances in 3D face                          Vol. 8, No. 1, 35, 2018.
recognition based on deep learning, mainly focusing                [7] Blackburn, D. M.; Bone, M.; Phillips, P. J. Face
on face augmentation, data preprocessing, network                      recognition vendor test 2000: Evaluation report.
architecture, and loss functions. According to the                     Technical report. Defense Advanced Research Projects
input formats of the network adopted, the deep                         Agency Arlington VA, 2001.
learning-based 3D face recognition methods may be                  [8] Phillips, P. J.; Grother, P.; Micheals, R.; Blackburn,
broadly divided into 2D-input, 3D-input, and graph-                    D. M.; Tabassi, E.; Bone, M. Face recognition vendor
                                                                       test 2002. In: Proceedings of the IEEE International
input networks. With these powerful networks, the
                                                                       SOI Conference, 44, 2003.
performance of 3D face recognition has been greatly
                                                                   [9] Phillips, P. J.; Flynn, P. J.; Scruggs, T.; Bowyer, K.
improved.
                                                                       W.; Chang, J.; Hoffman, K.; Marques, J.; Min, J.;
  We have also discussed the characteristics and
                                                                       Worek, W. Overview of the face recognition grand
challenges involved, and have provided potential
                                                                       challenge. In: Proceedings of the IEEE Computer
future directions for 3D face recognition. For instance,
                                                                       Society Conference on Computer Vision and Pattern
large-scale 3D face databases are greatly needed to                    Recognition, 947–954, 2005.
advance 3D face recognition in the future. We believe             [10] Phillips, P. J.; Scruggs, W. T.; O’Toole, A. J.; Flynn,
our survey will provide valuable information and                       P. J.; Bowyer, K. W.; Schott, C. L.; Sharpe, M. FRVT
insight to readers and the community.                                  2006 and ICE 2006 large-scale experimental results.
                                                                       IEEE Transactions on Pattern Analysis and Machine
Declaration of competing interest                                      Intelligence Vol. 32, No. 5, 831–846, 2010.
The authors have no competing interests to declare                [11] Abate, A. F.; Nappi, M.; Riccio, D.; Sabatino,
that are relevant to the content of this article.                      G. 2D and 3D face recognition: A survey. Pattern
                                                                       Recognition Letters Vol. 28, No. 14, 1885–1906, 2007.
References                                                        [12] Smeets, D.; Claes, P.; Hermans, J.; Vandermeulen,
                                                                       D.; Suetens, P. A comparative study of 3-D
  [1] Patil, H.; Kothari, A.; Bhurchandi, K. 3-D face                  face recognition under expression variations. IEEE
      recognition: Features, databases, algorithms and                 Transactions on Systems, Man, and Cybernetics, Part
      challenges. Artificial Intelligence Review Vol. 44, No.          C (Applications and Reviews) Vol. 42, No. 5, 710–727,
      3, 393–441, 2015.                                                2012.
  [2] Zhou, H. L.; Mian, A.; Wei, L.; Creighton, D.; Hossny,      [13] Soltanpour, S.; Boufama, B.; Jonathan Wu, Q. M. A
      M.; Nahavandi, S. Recent advances on singlemodal                 survey of local feature methods for 3D face recognition.
      and multimodal face recognition: A survey. IEEE                  Pattern Recognition Vol. 72, 391–406, 2017.
      Transactions on Human-Machine Systems Vol. 44, No.          [14] Guo, G. D.; Zhang, N. A survey on deep learning
      6, 701–716, 2014.                                                based face recognition. Computer Vision and Image
  [3] Bowyer, K. W.; Chang, K.; Flynn, P. A survey of                  Understanding Vol. 189, 102805, 2019.
      approaches and challenges in 3D and multi-modal 3D          [15] Masi, I.; Wu, Y.; Hassner, T.; Natarajan, P. Deep
      + 2D face recognition. Computer Vision and Image                 face recognition: A survey. In: Proceedings of the
      Understanding Vol. 101, No. 1, 1–15, 2006.                       31st SIBGRAPI Conference on Graphics, Patterns
  [4] Huang, G. B.; Mattar, M.; Berg, T.; Learned-Miller,              and Images, 471–478, 2018.
      E. Labeled faces in the wild: A database for studying       [16] Parkhi, O. M.; Vedaldi, A.; Zisserman, A. Deep face
      face recognition in unconstrained environments. In:              recognition. In: Proceedings of the British Machine
      Proceedings of the Workshop on Faces in ‘Real-                   Vision Conference, 1–12, 2015.
      Life’ Images: Detection, Alignment, and Recognition,        [17] He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J.
      2008.                                                            Deep residual learning for image recognition. In:
  [5] Cai, Y.; Lei, Y. J.; Yang, M. L.; You, Z. S.; Shan, S. G.        Proceedings of the IEEE Conference on Computer
      A fast and robust 3D face recognition approach based             Vision and Pattern Recognition, 770–778, 2016.
3D face recognition: A comprehensive survey in 2022                                                                 679
 [18] Lawrence, S.; Giles, C. L.; Tsoi, A. C.; Back, A.       [29] Faltemier, T. C.; Bowyer, K. W.; Flynn, P. J.
      D. Face recognition: A convolutional neural-network          Using a multi-instance enrollment representation to
      approach. IEEE Transactions on Neural Networks Vol.          improve 3D face recognition. In: Proceedings of the
      8, No. 1, 98–113, 1997.                                      IEEE International Conference on Biometrics: Theory,
 [19] Howard, A.; Zhmoginov, A.; Chen, L. C.; Sandler, M.;         Applications, and Systems, 1–6, 2007.
      Zhu, M. Inverted residuals and linear bottlenecks:      [30] Savran, A.; Alyüz, N.; Dibeklioğlu, H.; Çeliktutan,
      Mobile networks for classification, detection and            O.; Gökberk, B.; Sankur, B.; Akarun, L. Bosphorus
      segmentation. arXiv preprint arXiv:1801.04381,               database for 3D face analysis. In: Biometrics and
      2018.                                                        Identity Management. Lecture Notes in Computer
 [20] Colombo, A.; Cusano, C.; Schettini, R. UMB-                  Science, Vol. 5372. Schouten, B.; Juul, N. C.;
      DB: A database of partially occluded 3D faces. In:           Drygajlo, A.; Tistarelli, M. Eds. Springer Berlin
      Proceedings of the IEEE International Conference on          Heidelberg, 47–56, 2008.
      Computer Vision Workshops, 2113–2119, 2011.             [31] Heseltine, T.; Pears, N.; Austin, J. Three-dimensional
 [21] Beumier, C.; Acheroy, M. Automatic 3D face                   face recognition using combinations of surface
      authentication. Image and Vision Computing Vol. 18,          feature map subspace components. Image and Vision
      No. 4, 315–321, 2000.                                        Computing Vol. 26, No. 3, 382–396, 2008.
                                                              [32] Ter Haar, F. B.; Daoudi, M.; Veltkamp, R. C.
 [22] Hesher, C.; Srivastava, A.; Erlebacher, G. A novel
                                                                   SHape REtrieval contest 2008: 3D face scans. In:
      technique for face recognition using range imaging.
                                                                   Proceedings of the IEEE International Conference on
      In: Proceedings of the 7th International Symposium
                                                                   Shape Modeling and Applications, 225–226, 2008.
      on Signal Processing and Its Applications, 201–204,
                                                              [33] Yin, B. C.; Sun, Y. F.; Wang, C. Z.; Gai,
      2003.
                                                                   Y. BJUT-3D large scale 3D face database and
 [23] Moreno, A. GavabDB: A 3D face database. In:
                                                                   information processing. Journal of Computer Research
      Proceedings of the 2nd COST275 Workshop on
                                                                   and Development Vol. 46, No. 6, 1009–1018, 2009. (in
      Biometrics on the Internet, 75–80, 2004.
                                                                   Chinese)
 [24] Chang, K. I.; Bowyer, K. W.; Flynn, P. J. An
                                                              [34] Gupta, S.; Castleman, K. R.; Markey, M. K.;
      evaluation of multimodal 2D 3D face biometrics.
                                                                   Bovik, A. C. Texas 3D face recognition database. In:
      IEEE Transactions on Pattern Analysis and Machine
                                                                   Proceedings of the IEEE Southwest Symposium on
      Intelligence Vol. 27, No. 4, 619–624, 2005.
                                                                   Image Analysis & Interpretation, 97–100, 2010.
 [25] Wang, Y. M.; Pan, G.; Wu, Z. H.; Wang, Y.               [35] Vijayan, V.; Bowyer, K. W.; Flynn, P. J.; Huang, D.;
      G. Exploring facial expression effects in 3D face            Chen, L. M.; Hansen, M.; Ocegueda, O.; Shah, S. K.;
      recognition using partial ICP. In: Computer Vision –         Kakadiaris, I. A. Twins 3D face recognition challenge.
      ACCV 2006. Lecture Notes in Computer Science, Vol.           In: Proceedings of the International Joint Conference
      3851. Narayanan, P. J.; Nayar, S. K.; Shum, H. Y.            on Biometrics, 1–7, 2011.
      Eds. Springer Berlin Heidelberg, 581–590, 2006.         [36] Veltkamp, R.; van Jole, S.; Drira, H.; Amor, B.;
 [26] Yin, L. J.; Wei, X. Z.; Sun, Y.; Wang, J.;                   Daoudi, M.; Li, H. B.; Chen, L. M.; Claes, P.;
      Rosato, M. J. A 3D facial expression database for            Smeets, D.; Hermans, J.; et al. SHREC’11 track:
      facial behavior research. In: Proceedings of the 7th         3D face models retrieval. In: Proceedings of the 4th
      International Conference on Automatic Face and               Eurographics Conference on 3D Object Retrieval, 89–
      Gesture Recognition, 211–216, 2006.                          95, 2011.
 [27] Xu, C. H.; Tan, T. N.; Li, S.; Wang, Y. H.; Zhong,      [37] Zhang, Y.; Guo, Z.; Lin, Z.; Zhang, H.; Zhang, C.
      C. Learning effective intrinsic features to boost 3D-        The NPU multi-case Chinese 3D face database and
      based face recognition. In: Computer Vision – ECCV           information processing. Chinese Journal of Electronics
      2006. Lecture Notes in Computer Science, Vol. 3952.          Vol. 21, No. 2, 283–286, 2012.
      Leonardis, A.; Bischof, H.; Pinz, A. Eds. Springer      [38] Zhang, X.; Yin, L. J.; Cohn, J. F.; Canavan, S.;
      Berlin Heidelberg, 416–427, 2006.                            Reale, M.; Horowitz, A.; Peng, L. A high-resolution
 [28] Conde, C.; Serrano, A.; Cabello, E. Multimodal               spontaneous 3D dynamic facial expression database.
      2D, 2.5D & 3D face verification. In: Proceedings             In: Proceedings of the 10th IEEE International
      of the International Conference on Image Processing,         Conference and Workshops on Automatic Face and
      2061–2064, 2006.                                             Gesture Recognition, 1–6, 2013.
680                                                                                                 Y. Jing, X. Lu, S. Gao
 [39] Min, R.; Kose, N.; Dugelay, J. L. KinectFaceDB:          [51] Kim, D.; Hernandez, M.; Choi, J.; Medioni, G. Deep
      A kinect database for face recognition. IEEE                  3D face identification. In: Proceedings of the IEEE
      Transactions on Systems, Man, and Cybernetics:                International Joint Conference on Biometrics, 133–
      Systems Vol. 44, No. 11, 1534–1548, 2014.                     142, 2017.
 [40] Zhang, J. J.; Huang, D.; Wang, Y. H.; Sun, J.            [52] Zhang, Z. Y.; Da, F. P.; Yu, Y. Data-free point
      Lock3DFace: A large-scale database of low-cost                cloud network for 3D face recognition. arXiv preprint
      Kinect 3D faces. In: Proceedings of the International         arXiv:1911.04731, 2019.
      Conference on Biometrics, 1–8, 2016.                     [53] Deng, J. K.; Cheng, S. Y.; Xue, N. N.; Zhou, Y.
 [41] Urbanová, P.; Ferková, Z.; Jandová, M.; Jurda, M.;         X.; Zafeiriou, S. UV-GAN: Adversarial facial UV
      Černý, D.; Sochor, J. Introducing the FIDENTIS 3D           map completion for pose-invariant face recognition.
      face database. Anthropological Review Vol. 81, No. 2,         In: Proceedings of the IEEE/CVF Conference on
      202–223, 2018.                                                Computer Vision and Pattern Recognition, 7093–7102,
 [42] Zulqarnain Gilani, S.; Mian, A. Learning from millions        2018.
      of 3D scans for large-scale 3D face recognition.         [54] Zhao, J.; Xiong, L.; Cheng, Y.; Cheng, Y.; Li, J.;
      In: Proceedings of the IEEE/CVF Conference on                 Zhou, L.; Xu, Y.; Karlekar, J.; Pranata, S.; Shen, S.;
      Computer Vision and Pattern Recognition, 1896–1905,           et al. 3D-aided deep pose-invariant face recognition. In:
      2018.                                                         Proceedings of the 27th International Joint Conference
 [43] Cheng, S. Y.; Kotsia, I.; Pantic, M.; Zafeiriou, S.           on Artificial Intelligence, 1184–1190, 2018.
      4DFAB: A large scale 4D database for facial expression   [55] Shen, Y. J.; Luo, P.; Yan, J. J.; Wang, X. G.;
      analysis and biometric applications. In: Proceedings          Tang, X. O. FaceID-GAN: Learning a symmetry three-
      of the IEEE/CVF Conference on Computer Vision                 player GAN for identity-preserving face synthesis.
      and Pattern Recognition, 5117–5126, 2018.                     In: Proceedings of the IEEE/CVF Conference on
 [44] Jia, S.; Li, X.; Hu, C. B.; Guo, G. D.; Xu, Z. Q.             Computer Vision and Pattern Recognition, 821–830,
      3D face anti-spoofing with factorized bilinear coding.        2018.
      arXiv preprint arXiv:2005.06514, 2020.                   [56] Zhang, X. Y.; Zhao, Y.; Zhang, H. Dual-discriminator
 [45] Ye, Y. P.; Song, Z.; Guo, J. G.; Qiao, Y. SIAT-3DFE:          GAN: A GAN way of profile face recognition. In:
      A high-resolution 3D facial expression dataset. IEEE          Proceedings of the IEEE International Conference
      Access Vol. 8, 48205–48211, 2020.                             on Artificial Intelligence and Computer Applications,
 [46] Yang, H. T.; Zhu, H.; Wang, Y. R.; Huang, M.                  162–166, 2020.
      K.; Shen, Q.; Yang, R. G.; Cao, X. FaceScape: A          [57] Marriott, R. T.; Romdhani, S.; Chen, L. M. A
      large-scale high quality 3D face dataset and detailed         3D GAN for improved large-pose facial recognition.
      riggable 3D face prediction. In: Proceedings of the           In: Proceedings of the IEEE/CVF Conference on
      IEEE/CVF Conference on Computer Vision and                    Computer Vision and Pattern Recognition, 13440–
      Pattern Recognition, 598–607, 2020.                           13450, 2021.
 [47] Li, Q.; Dong, X. X.; Wang, W. N.; Shan, C. F. CAS-       [58] Luo, M. D.; Cao, J.; Ma, X.; Zhang, X. Y.; He, R.
      AIR-3D face: A low-quality, multi-modal and multi-            FA-GAN: Face augmentation GAN for deformation-
      pose 3D face database. In: Proceedings of the IEEE            invariant face recognition. IEEE Transactions on
      International Joint Conference on Biometrics, 1–8,            Information Forensics and Security Vol. 16, 2341–
      2021.                                                         2355, 2021.
 [48] Gilani, S. Z.; Mian, A. Towards large-scale 3D face      [59] Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y.
      recognition. In: Proceedings of the International             Spectral networks and locally connected networks
      Conference on Digital Image Computing: Techniques             on graphs. arXiv preprint arXiv:1312.6203, 2013.
      and Applications, 1–8, 2016.                             [60] Zhao, W.; Chellappa, R.; Phillips, P. J.; Rosenfeld, A.
 [49] Farkas, L. G. Anthropometry of the Head and Face.             Face recognition. ACM Computing Surveys Vol. 35,
      Raven Press, 1994.                                            No. 4, 399–458, 2003.
 [50] Blanz, V.; Vetter, T. A morphable model for              [61] Berretti, S.; del Bimbo, A.; Pala, P. 3D partial
      the synthesis of 3D faces. In: Proceedings of the             face matching using local shape descriptors. In:
      26th Annual Conference on Computer Graphics and               Proceedings of the Joint ACM Workshop on Human
      Interactive Techniques, 187–194, 1999.                        Gesture and Behavior Understanding, 65–71, 2011.
3D face recognition: A comprehensive survey in 2022                                                                       681
 [62] Li, H. B.; Huang, D.; Lemaire, P.; Morvan, J. M.;           [74] Elaiwat, S.; Bennamoun, M.; Boussaid, F.; El-Sallam,
      Chen, L. M. Expression robust 3D face recognition                A. 3-D face recognition using curvelet local features.
      via mesh-based histograms of multiple order surface              IEEE Signal Processing Letters Vol. 21, No. 2, 172–
      differential quantities. In: Proceedings of the 18th             175, 2014.
      IEEE International Conference on Image Processing,          [75] Drira, H.; Ben Amor, B.; Srivastava, A.; Daoudi,
      3053–3056, 2011.                                                 M.; Slama, R. 3D face recognition under expressions,
 [63] Creusot, C.; Pears, N.; Austin, J. Automatic                     occlusions, and pose variations. IEEE Transactions
      keypoint detection on 3D faces using a dictionary                on Pattern Analysis and Machine Intelligence Vol. 35,
      of local shapes. In: Proceedings of the International            No. 9, 2270–2283, 2013.
      Conference on 3D Imaging, Modeling, Processing,             [76] Li, H. B.; Huang, D.; Morvan, J. M.; Chen,
      Visualization and Transmission, 204–211, 2011.                   L. M.; Wang, Y. H. Expression-robust 3D face
 [64] Zhang, G. P.; Wang, Y. H. Robust 3D face recognition             recognition via weighted sparse representation
      based on resolution invariant features. Pattern                  of multi-scale and multi-component local normal
      Recognition Letters Vol. 32, No. 7, 1009–1019, 2011.             patterns. Neurocomputing Vol. 133, 179–193, 2014.
 [65] Inan, T.; Halici, U. 3-D face recognition with local        [77] Berretti, S.; Werghi, N.; Bimbo, A.; Pala, P.
      shape descriptors. IEEE Transactions on Information              Selecting stable keypoints and local descriptors for
      Forensics and Security Vol. 7, No. 2, 577–587, 2012.             person identification using 3D face scans. The Visual
 [66] Berretti, S.; del Bimbo, A.; Pala, P. Sparse matching            Computer Vol. 30, No. 11, 1275–1292, 2014.
      of salient facial curves for recognition of 3-D faces       [78] Lei, Y. J.; Bennamoun, M.; Hayat, M.; Guo, Y. L.
      with missing parts. IEEE Transactions on Information             An efficient 3D face recognition approach using local
      Forensics and Security Vol. 8, No. 2, 374–389, 2013.             geometrical signatures. Pattern Recognition Vol. 47,
 [67] Li, X. L.; Da, F. P. Efficient 3D face recognition               No. 2, 509–524, 2014.
      handling facial expression and hair occlusion. Image        [79] Tabia, H.; Laga, H.; Picard, D.; Gosselin, P. H.
      and Vision Computing Vol. 30, No. 9, 668–679, 2012.              Covariance descriptors for 3D shape matching and
 [68] Ballihi, L.; Ben Amor, B.; Daoudi, M.; Srivastava,               retrieval. In: Proceedings of the IEEE Conference on
      A.; Aboutajdine, D. Boosting 3-D-geometric features              Computer Vision and Pattern Recognition, 4185–4192,
      for efficient face recognition and gender classification.        2014.
      IEEE Transactions on Information Forensics and              [80] Vezzetti, E.; Marcolin, F.; Fracastoro, G. 3D
      Security Vol. 7, No. 6, 1766–1779, 2012.                         face recognition: An automatic strategy based on
 [69] Berretti, S.; Werghi, N.; del Bimbo, A.; Pala, P.                geometrical descriptors and landmarks. Robotics and
      Matching 3D face scans using interest points and                 Autonomous Systems Vol. 62, No. 12, 1768–1776, 2014.
      local histogram descriptors. Computers & Graphics           [81] Li, H. B.; Huang, D.; Morvan, J. M.; Wang, Y. H.;
      Vol. 37, No. 5, 509–525, 2013.                                   Chen, L. M. Towards 3D face recognition in the real: A
 [70] Smeets, D.; Keustermans, J.; Vandermeulen, D.;                   registration-free approach using fine-grained matching
      Suetens, P. meshSIFT: Local surface features for                 of 3D keypoint descriptors. International Journal of
      3D face recognition under expression variations                  Computer Vision Vol. 113, No. 2, 128–142, 2015.
      and partial data. Computer Vision and Image                 [82] Elaiwat, S.; Bennamoun, M.; Boussaid, F.; El-Sallam,
      Understanding Vol. 117, No. 2, 158–169, 2013.                    A. A curvelet-based approach for textured 3D face
 [71] Creusot, C.; Pears, N.; Austin, J. A machine-learning            recognition. Pattern Recognition Vol. 48, No. 4, 1235–
      approach to keypoint detection and landmarking on                1246, 2015.
      3D meshes. International Journal of Computer Vision         [83] Al-Osaimi, F. R. A novel multi-purpose matching
      Vol. 102, Nos. 1–3, 146–179, 2013.                               representation of local 3D surfaces: A rotationally
 [72] Tang, H. L.; Yin, B. C.; Sun, Y. F.; Hu, Y. L. 3D                invariant, efficient, and highly discriminative approach
      face recognition using local binary patterns. Signal             with an adjustable sensitivity. IEEE Transactions on
      Processing Vol. 93, No. 8, 2190–2198, 2013.                      Image Processing Vol. 25, No. 2, 658–672, 2016.
 [73] Lei, Y. J.; Bennamoun, M.; El-Sallam, A. A. An              [84] Ming, Y. Robust regional bounding spherical
      efficient 3D face recognition approach based on                  descriptor for 3D face recognition and emotion
      the fusion of novel local low-level features. Pattern            analysis. Image and Vision Computing Vol. 35, 14–22,
      Recognition Vol. 46, No. 1, 24–37, 2013.                         2015.
682                                                                                                  Y. Jing, X. Lu, S. Gao
 [85] Guo, Y. L.; Lei, Y. J.; Liu, L.; Wang, Y.; Bennamoun,      [97] Samir, C.; Srivastava, A.; Daoudi, M. Three-
      M.; Sohel, F. EI3D: Expression-invariant 3D face                dimensional face recognition using shapes of facial
      recognition based on feature and shape matching.                curves. IEEE Transactions on Pattern Analysis and
      Pattern Recognition Letters Vol. 83, 403–412, 2016.             Machine Intelligence Vol. 28, No. 11, 1858–1863, 2006.
 [86] Soltanpour, S.; Wu, Q. J. Multimodal 2D–3D face            [98] Lowe, D. G. Distinctive image features from
      recognition using local descriptors: Pyramidal shape            scale-invariant keypoints. International Journal of
      map and structural context. IET Biometrics Vol. 6,              Computer Vision Vol. 60, No. 2, 91–110, 2004.
      No. 1, 27–35, 2017.                                        [99] Deng, X.; Da, F.; Shao, H. J.; Jiang, Y. T. A multi-
 [87] Lei, Y. J.; Guo, Y. L.; Hayat, M.; Bennamoun, M.;               scale three-dimensional face recognition approach with
      Zhou, X. Z. A Two-Phase Weighted Collaborative                  sparse representation-based classifier and fusion of
      Representation for 3D partial face recognition with             local covariance descriptors. Computers & Electrical
      single sample. Pattern Recognition Vol. 52, 218–237,            Engineering Vol. 85, 106700, 2020.
      2016.                                                     [100] Vezzetti, E.; Marcolin, F.; Tornincasa, S.; Ulrich,
 [88] Emambakhsh, M.; Evans, A. Nasal patches and                     L.; Dagnes, N. 3D geometry-based automatic
      curves for expression-robust 3D face recognition.               landmark localization in presence of facial occlusions.
      IEEE Transactions on Pattern Analysis and Machine               Multimedia Tools and Applications Vol. 77, No. 11,
      Intelligence Vol. 39, No. 5, 995–1007, 2017.                    14177–14205, 2018.
 [89] Werghi, N.; Tortorici, C.; Berretti, S.; Del Bimbo, A.    [101] Drira, H.; Benamor, B.; Daoudi, M.; Srivastava, A.
      Boosting 3D LBP-based face recognition by fusing                Pose and expression-invariant 3D face recognition
      shape and texture descriptors on the mesh. IEEE                 using elastic radial curves. In: Proceedings of the
      Transactions on Information Forensics and Security              British Machine Vision Conference, 1–11, 2010.
      Vol. 11, No. 5, 964–979, 2016.                            [102] Freund, Y.; Schapire, R. E. A short introduction to
 [90] Hariri, W.; Tabia, H.; Farah, N.; Benouareth, A.;               boosting. Journal of Japanese Society for Artificial
      Declercq, D. 3D face recognition using covariance               Intelligence Vol. 14, No. 5, 771–780, 1999.
      based descriptors. Pattern Recognition Letters Vol. 78,   [103] Aubry, M.; Schlickewei, U.; Cremers, D. The
      1–7, 2016.                                                      wave kernel signature: A quantum mechanical
 [91] Soltanpour, S.; Jonathan Wu, Q. M. High-order                   approach to shape analysis. In: Proceedings of the
      local normal derivative pattern (LNDP) for 3D face              IEEE International Conference on Computer Vision
      recognition. In: Proceedings of the IEEE International          Workshops, 1626–1633, 2011.
      Conference on Image Processing, 2811–2815, 2017.          [104] Ojala, T.;      Pietikainen, M.;       Maenpaa, T.
 [92] Deng, X.; Da, F. P.; Shao, H. J. Efficient 3D                   Multiresolution gray-scale and rotation invariant
      face recognition using local covariance descriptor              texture classification with local binary patterns.
      and Riemannian kernel sparse coding. Computers &                IEEE Transactions on Pattern Analysis and Machine
      Electrical Engineering Vol. 62, 81–91, 2017.                    Intelligence Vol. 24, No. 7, 971–987, 2002.
 [93] Abbad, A.; Abbad, K.; Tairi, H. 3D face recognition:      [105] Yu, Y.; Da, F. P.; Guo, Y. F. Sparse ICP with
      Multi-scale strategy based on geometric and local               resampling and denoising for 3D face verification.
      descriptors. Computers & Electrical Engineering Vol.            IEEE Transactions on Information Forensics and
      70, 525–537, 2018.                                              Security Vol. 14, No. 7, 1917–1927, 2019.
 [94] Soltanpour, S.; Wu, Q. M. J. Weighted extreme sparse      [106] Spreeuwers, L. Fast and accurate 3D face recognition.
      classifier and local derivative pattern for 3D face             International Journal of Computer Vision Vol. 93, No.
      recognition. IEEE Transactions on Image Processing              3, 389–414, 2011.
      Vol. 28, No. 6, 3020–3033, 2019.                          [107] Ocegueda, O.; Passalis, G.; Theoharis, T.; Shah, S.
 [95] Shi, L. L.; Wang, X.; Shen, Y. L. Research on 3D face           K.; Kakadiaris, I. A. UR3D-C: Linear dimensionality
      recognition method based on LBP and SVM. Optik                  reduction for efficient 3D face recognition. In:
      Vol. 220, 165157, 2020.                                         Proceedings of the International Joint Conference
 [96] Samir, C.; Srivastava, A.; Daoudi, M.; Klassen, E.              on Biometrics, 1–6, 2011.
      An intrinsic framework for analysis of facial surfaces.   [108] Ming, Y.; Ruan, Q. Q. Robust sparse bounding sphere
      International Journal of Computer Vision Vol. 82, No.           for 3D face recognition. Image and Vision Computing
      1, 80–95, 2009.                                                 Vol. 30, No. 8, 524–534, 2012.
3D face recognition: A comprehensive survey in 2022                                                                      683
[109] Liu, P. J.; Wang, Y. H.; Huang, D.; Zhang, Z. X.;          [120] Fadaifard, H.; Wolberg, G.; Haralick, R. Multiscale 3D
      Chen, L. M. Learning the spherical harmonic features             feature extraction and matching with an application
      for 3-D face recognition. IEEE Transactions on Image             to 3D face recognition. Graphical Models Vol. 75, No.
      Processing Vol. 22, No. 3, 914–925, 2013.                        4, 157–176, 2013.
[110] Taghizadegan, Y.;       Ghassemian, H.;   Naser-           [121] Alyuz, N.; Gokberk, B.; Akarun, L. 3-D face
      Moghaddasi, M. 3D face recognition method using                  recognition under occlusion using masked projection.
      2DPCA-Euclidean distance classification. ACEEE                   IEEE Transactions on Information Forensics and
      International Journal on Control System and                      Security Vol. 8, No. 5, 789–802, 2013.
      Instrumentation Vol. 3, No. 1, 1–5, 2012.                  [122] Bagchi, P.; Bhattacharjee, D.; Nasipuri, M. Robust
[111] Mohammadzade, H.; Hatzinakos, D. Iterative                       3D face recognition in presence of pose and
      closest normal point for 3D face recognition. IEEE               partial occlusions or missing parts. arXiv preprint
      Transactions on Pattern Analysis and Machine                     arXiv:1408.3709, 2014.
      Intelligence Vol. 35, No. 2, 381–397, 2013.                [123] Bagchi, P.; Bhattacharjee, D.; Nasipuri, M. 3D Face
[112] Ming, Y. Rigid-area orthogonal spectral regression               Recognition using surface normals. In: Proceedings of
      for efficient 3D face recognition. Neurocomputing Vol.           the TENCON 2015 - 2015 IEEE Region 10 Conference,
      129, 445–457, 2014.                                              1–6, 2015.
                                                                 [124] Liang, Y.; Zhang, Y.; Zeng, X. X. Pose-invariant 3D
[113] Ratyal, N. I.; Ahmad Taj, I.; Bajwa, U. I.;
                                                                       face recognition using half face. Signal Processing:
      Sajid, M. 3D face recognition based on pose
                                                                       Image Communication Vol. 57, 84–90, 2017.
      and expression invariant alignment. Computers &
                                                                 [125] LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning.
      Electrical Engineering Vol. 46, 241–255, 2015.
                                                                       Nature Vol. 521, No. 7553, 436–444, 2015.
[114] Tang, Y. H.; Sun, X.; Huang, D.; Morvan, J. M.;
                                                                 [126] Taigman, Y.; Yang, M.; Ranzato, M.; Wolf,
      Wang, Y. H.; Chen, L. M. 3D face recognition
                                                                       L. DeepFace: Closing the gap to human-level
      with asymptotic cones based principal curvatures.
                                                                       performance in face verification. In: Proceedings of
      In: Proceedings of the International Conference on
                                                                       the IEEE Conference on Computer Vision and Pattern
      Biometrics, 466–472, 2015.
                                                                       Recognition, 1701–1708, 2014.
[115] Gilani, S. Z.; Mian, A.; Eastwood, P. Deep,
                                                                 [127] Sun, Y.; Chen, Y. H.; Wang, X. G.; Tang, X. O. Deep
      dense and accurate 3D face correspondence for
                                                                       learning face representation by joint identification-
      generating population specific deformable models.
                                                                       verification. In: Proceedings of the 27th International
      Pattern Recognition Vol. 69, 238–250, 2017.
                                                                       Conference on Neural Information Processing Systems,
[116] Peter, M.; Minoi, J. L.; Hipiny, I. H. M. 3D face                Vol. 2, 1988–1996, 2014.
      recognition using kernel-based PCA approach. In:
                                                                 [128] Sun, Y.; Wang, X. G.; Tang, X. O. Deep learning
      Computational Science and Technology. Lecture Notes              face representation from predicting 10,000 classes. In:
      in Electrical Engineering, Vol. 481. Alfred, R.; Lim,            Proceedings of the IEEE Conference on Computer
      Y.; Ibrahim, A.; Anthony, P. Eds. Springer Singapore,            Vision and Pattern Recognition, 1891–1898, 2014.
      77–86, 2019.                                               [129] Sun, Y.; Wang, X. G.; Tang, X. O. Deeply learned
[117] Passalis, G.; Perakis, P.; Theoharis, T.; Kakadiaris, I.         face representations are sparse, selective, and robust.
      A. Using facial symmetry to handle pose variations               In: Proceedings of the IEEE Conference on Computer
      in real-world 3D face recognition. IEEE Transactions             Vision and Pattern Recognition, 2892–2900, 2015.
      on Pattern Analysis and Machine Intelligence Vol. 33,      [130] Sun, Y.; Liang, D.; Wang, X. G.; Tang, X. O.
      No. 10, 1938–1951, 2011.                                         DeepID3: Face recognition with very deep neural
[118] Huang, D.; Ardabilian, M.; Wang, Y. H.; Chen,                    networks. arXiv preprint arXiv:1502.00873, 2015.
      L. M. 3-D face recognition using eLBP-based facial         [131] Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A
      description and local feature hybrid matching. IEEE              unified embedding for face recognition and clustering.
      Transactions on Information Forensics and Security               In: Proceedings of the IEEE Conference on Computer
      Vol. 7, No. 5, 1551–1565, 2012.                                  Vision and Pattern Recognition, 815–823, 2015.
[119] Alyüz, N.; Gökberk, B.; Spreeuwers, L.; Veldhuis,        [132] Charles, R. Q.; Hao, S.; Mo, K. C.; Guibas, L.
      R.; Akarun, L. Robust 3D face recognition in the                 J. PointNet: Deep learning on point sets for 3D
      presence of realistic occlusions. In: Proceedings of             classification and segmentation. In: Proceedings of the
      the 5th IAPR International Conference on Biometrics,             IEEE Conference on Computer Vision and Pattern
      111–118, 2012.                                                   Recognition, 77–85, 2017.
684                                                                                                       Y. Jing, X. Lu, S. Gao
[133] Qi, C. R.; Yi, L.; Su, H.; Guibas, L. J. PointNet++:                 the 7th International Conference on Signal Processing
      Deep hierarchical feature learning on point sets in a                and Integrated Networks, 696–701, 2020.
      metric space. In: Proceedings of the 31st International      [143]   Lin, S. S.; Jiang, C. Y.; Liu, F.; Shen, L. L. High
      Conference on Neural Information Processing System,                  quality facial data synthesis and fusion for 3D low-
      5105–5114, 2017.                                                     quality face recognition. In: Proceedings of the IEEE
[134] Ding, Y. Q.; Li, N. Y.; Young, S. S.; Ye, J. W. Efficient            International Joint Conference on Biometrics, 1–8,
      3D face recognition in uncontrolled environment. In:                 2021.
      Advances in Visual Computing. Lecture Notes in               [144]   Chiu, M. T.; Cheng, H. Y.; Wang, C. Y.; Lai,
      Computer Science, Vol. 11844. Springer Cham, 430–                    S. H. High-accuracy RGB-D face recognition via
      443, 2019.                                                           segmentation-aware face depth estimation and mask-
[135] Lin, S. S.; Liu, F.; Liu, Y. H.; Shen, L. L. Local feature           guided attention network. In: Proceedings of the 16th
      tensor based deep learning for 3D face recognition.                  IEEE International Conference on Automatic Face
      In: Proceedings of the 14th IEEE International                       and Gesture Recognition, 1–8, 2021.
      Conference on Automatic Face & Gesture Recognition,          [145]   Zhang, Z. Y.; Da, F. P.; Yu, Y. Learning directly
      1–5, 2019.                                                           from synthetic point clouds for “in-the-wild” 3D face
[136] Tan, Y.; Lin, H. X.; Xiao, Z. L.; Ding, S. Y.; Chao, H.              recognition. Pattern Recognition Vol. 123, 108394, 2022.
      Y. Face recognition from sequential sparse 3D data via       [146]   Bhople, A. R.; Shrivastava, A. M.; Prakash, S. Point
      deep registration. In: Proceedings of the International              cloud based deep convolutional neural network for 3D
      Conference on Biometrics, 1–8, 2019.                                 face recognition. Multimedia Tools and Applications
                                                                           Vol. 80, No. 20, 30237–30259, 2021.
[137] Olivetti, E. C.; Ferretti, J.; Cirrincione, G.; Nonis,
                                                                   [147]   Bhople, A. R.; Prakash, S. Learning similarity
      F.; Tornincasa, S.; Marcolin, F. Deep CNN for 3D
                                                                           and dissimilarity in 3D faces with triplet network.
      face recognition. In: Design Tools and Methods in
                                                                           Multimedia Tools and Applications Vol. 80, Nos. 28–
      Industrial Engineering. Lecture Notes in Mechanical
                                                                           29, 35973–35991, 2021.
      Engineering. Rizzi, C.; Andrisano, A. O.; Leali, F.;
                                                                   [148]   Jiang, C. Y.; Lin, S. S.; Chen, W.; Liu, F.; Shen,
      Gherardini, F.; Pini, F.; Vergnano, A. Eds. Springer
                                                                           L. L. PointFace: Point set based feature learning for
      Cham, 665–674, 2020.
                                                                           3D face recognition. In: Proceedings of the IEEE
[138] Xu, K. M.; Wang, X. M.; Hu, Z. H.; Zhang, Z. H.
                                                                           International Joint Conference on Biometrics, 1–8,
      3D face recognition based on twin neural network
                                                                           2021.
      combining deep map and texture. In: Proceedings
                                                                   [149]   Papadopoulos, K.; Kacem, A.; Shabayek, A.; Aouada,
      of the IEEE 19th International Conference on
                                                                           D. Face-GCN: A graph convolutional network for
      Communication Technology, 1665–1668, 2019.
                                                                           3D dynamic face identification/recognition. arXiv
[139] Feng, J. Y.; Guo, Q.; Guan, Y. D.; Wu, M.
                                                                           preprint arXiv:2104.09145, 2021.
      D.; Zhang, X. R.; Ti, C. L. 3D face recognition
                                                                   [150]   Li, B. Y. L.; Mian, A. S.; Liu, W. Q.; Krishna,
      method based on deep convolutional neural network
                                                                           A. Using Kinect for face recognition under varying
      In: Smart Innovations in Communication and
                                                                           poses, expressions, illumination and disguise. In:
      Computational Sciences. Advances in Intelligent
                                                                           Proceedings of the IEEE Workshop on Applications
      Systems and Computing, Vol. 670. Panigrahi, B.;
                                                                           of Computer Vision, 186–192, 2013.
      Trivedi, M.; Mishra, K.; Tiwari, S.; Singh, P. Eds.
                                                                   [151]   Wang, F.; Cheng, J.; Liu, W. Y.; Liu, H. J. Additive
      Springer Singapore, 123–130, 2019.
                                                                           margin softmax for face verification. IEEE Signal
[140] Mu, G. D.; Huang, D.; Hu, G. S.; Sun, J.; Wang, Y.                   Processing Letters Vol. 25, No. 7, 926–930, 2018.
      H. Led3D: A lightweight and efficient deep approach          [152]   Cortes, C.; Vapnik, V. Support-vector networks.
      to recognizing low-quality 3D faces. In: Proceedings                 Machine Learning Vol. 20, No. 3, 273–297, 1995.
      of the IEEE/CVF Conference on Computer Vision                [153]   Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-
      and Pattern Recognition, 5766–5775, 2019.                            to-image translation with conditional adversarial
[141] Dutta, K.; Bhattacharjee, D.; Nasipuri, M.                           networks. In: Proceedings of the IEEE Conference on
      SpPCANet: A simple deep learning-based feature                       Computer Vision and Pattern Recognition, 5967–5976,
      extraction approach for 3D face recognition.                         2017.
      Multimedia Tools and Applications Vol. 79, Nos. 41–42,       [154]   Jiang, L.; Zhang, J. Y.; Deng, B. L. Robust
      31329–31352, 2020.                                                   RGB-D face recognition using attribute-aware loss.
[142] Cao, C. Q.; Swash, M. R.; Meng, H. Y. Reliable                       IEEE Transactions on Pattern Analysis and Machine
      holoscopic 3D face recognition. In: Proceedings of                   Intelligence Vol. 42, No. 10, 2552–2566, 2020.
3D face recognition: A comprehensive survey in 2022                                                                         685
[155] Paysan, P.; Knothe, R.; Amberg, B.; Romdhani, S.;                              Yaping Jing received her bachelor
      Vetter, T. A 3D face model for pose and illumination                           degree (Hons) in information technology
      invariant face recognition. In: Proceedings of the 6th                         from Deakin University, Australia in
      IEEE International Conference on Advanced Video                                2016. She is currently a Ph.D. candidate
      and Signal Based Surveillance, 296–301, 2009.                                  in the School of Information Technology
[156] Cao, C.; Weng, Y. L.; Zhou, S.; Tong, Y. Y.; Zhou, K.                          at Deakin University. Her research
      FaceWarehouse: A 3D facial expression database for                             interests include 3D face recognition, 3D
      visual computing. IEEE Transactions on Visualization                           data processing, and machine learning.
      and Computer Graphics Vol. 20, No. 3, 413–425, 2014.
[157] Castellani, U.; Bartoli, A. 3D shape registration. In:
      3D Imaging, Analysis and Applications. Liu, Y.; Pears,                      Xuequan Lu is a lecturer (assistant
      N.; Rosin, P. L.; Huber, P. Eds. Springer Cham, 353–                        professor)    in   Deakin      University,
      411, 2020.                                                                  Australia. He spent more than two years
                                                                                  working as a research fellow in Singapore.
[158] D’Errico, J. Surface fitting using gridfit. MATLAB
                                                                                  Prior to that, he received his Ph.D.
      Central File Exchange. 2005. Available at https://                          degree from Zhejiang University, China
      www.mathworks.com/matlabcentral/fileexchange/8998-                          in 2016. His research interests mainly
      surface-fitting-using-gridfit.                                              fall into the category of visual data
[159] Ciravegna, G.; Cirrincione, G.; Marcolin, F.; Barbiero,   computing, for example, geometry modeling, processing and
      P.; Dagnes, N.; Piccolo, E. Assessing discriminating      analysis, animation, simulation, 2D data processing and
      capability of geometrical descriptors for 3D face         analysis.
      recognition by using the GH-EXIN neural network. In:
      Neural Approaches to Dynamics of Signal Exchanges.
      Smart Innovation, Systems and Technologies, Vol.                               Shang Gao received her Ph.D. degree
      151. Esposito, A.; Faundez-Zanuy, M.; Morabito, F.;                            in computer science from Northeastern
      Pasero, E. Eds. Springer Singapore, 223–233, 2020.                             University, China in 2000.         She is
[160] Lüthi, M.; Gerig, T.; Jud, C.; Vetter, T. Gaussian                            currently a senior lecturer in the School
      process morphable models. IEEE Transactions on                                 of Information Technology, Deakin
      Pattern Analysis and Machine Intelligence Vol. 40,                             University. Her current research interests
      No. 8, 1860–1873, 2017.                                                        include cybersecurity, cloud computing,
                                                                                     and machine learning.
[161] Gilani, S. Z.; Mian, A.; Shafait, F.; Reid, I. Dense 3D
      face correspondence. IEEE Transactions on Pattern
      Analysis and Machine Intelligence Vol. 40, No. 7,
      1584–1598, 2018.                                          Open Access This article is licensed under a Creative
[162] El Rahman Shabayek, A.; Aouada, D.; Cherenkova,           Commons Attribution 4.0 International License, which
      K.; Gusev, G.; Ottersten, B. 3D deformation signature     permits use, sharing, adaptation, distribution and reproduc-
      for dynamic face recognition. In: Proceedings of the      tion in any medium or format, as long as you give appropriate
      IEEE International Conference on Acoustics, Speech        credit to the original author(s) and the source, provide a link
      and Signal Processing, 2138–2142, 2020.                   to the Creative Commons licence, and indicate if changes
                                                                were made.
[163] Smith, M.; Smith, L.; Huang, N.; Hansen, M.;
      Smith, M. Deep 3D face recognition using 3D data             The images or other third party material in this article are
      augmentation and transfer learning. In: Proceedings       included in the article’s Creative Commons licence, unless
      of the 16th International Conference on Machine           indicated otherwise in a credit line to the material. If material
                                                                is not included in the article’s Creative Commons licence and
      Learning and Data Mining, 209–218, 2020.
                                                                your intended use is not permitted by statutory regulation or
[164] Dou, P. F.; Shah, S. K.; Kakadiaris, I. A. End-to-end     exceeds the permitted use, you will need to obtain permission
      3D face reconstruction with deep neural networks. In:     directly from the copyright holder.
      Proceedings of the IEEE Conference on Computer               To view a copy of this licence, visit http://
      Vision and Pattern Recognition, 1503–1512, 2017.          creativecommons.org/licenses/by/4.0/.
[165] Richardson, E.; Sela, M. T.; Kimmel, R. 3D face           Other papers from this open access journal are available
      reconstruction by learning from synthetic data. In:       free of charge from http://www.springer.com/journal/41095.
      Proceedings of the 4th International Conference on        To submit a manuscript, please go to https://www.
      3D Vision, 460–469, 2016.                                 editorialmanager.com/cvmj.