Information 16 00107
Information 16 00107
situations is a tough challenge. Additionally, cosmetics, facial hair, and accessories (such as
scarves or spectacles) may alter one’s image. The resemblance between people (e.g., twins,
relatives) presents another challenge to face recognition [4,5]. Face recognition is the most
favoured bio-metric for identity recognition for the following reasons: 1. high accuracy,
2. cross platform, 3. reliability [6], and 4. consent—unlike other bio-metric systems, face
recognition does not require consent from the subject [7].
In our scoping review [8], we consulted 266 CNN face recognition articles from the
year 2013 to 2023 and found the following gaps in the literature: (1) Most researchers are
interested in face recognition using images compared to videos. (2) Researchers prefer
using clean images compared to occluded ones. This is a problem because it affects
model performance when applied in the real world, where occlusion exists. (3) There has
been a lot of research that has been conducted using traditional CNN compared to other
CNN architectures.
The objectives of this systematic review are as follows:
1. To determine which techniques have been applied in the face recognition domain.
2. To identify which databases of face recognition are most common.
3. To find out which areas have adopted face recognition.
4. To assess and identify suitable evaluation metrics to use when comparative studies
are carried out in the field of face recognition.
This study seeks to review CNN architectures, databases, metrics, and applications for
face recognition.
ject identification framework for faces was developed in 2001, making it feasible to
recognize faces in real time from video material [16].
• 2011: Deep learning, a machine learning technology based on artificial neural net-
works [17], accelerated everything. The computer chooses which points to compare: it
learns faster when more photos are provided. Studies aimed to enhance the perfor-
mance of existing approaches by exploring novel arcface loss functions.
• 2015: The Viola–Jones method was implemented on portable devices and embedded
systems employing tiny low-power detectors. As a result, the Viola–Jones method has
been utilized to enable new features in user interfaces and teleconferencing, not just
broadening the practical use of face recognition systems [18].
• 2022: Ukraine is utilizing Clearview AI face recognition software from the United
States to identify deceased Russian servicemen. Ukraine has undertaken 8600 searches
and identified the families of 582 Russian troops who died in action [19].
3.3. Healthcare
Face recognition may be used in the healthcare industry to identify and monitor pa-
tients, guaranteeing that the appropriate person receives the proper therapy. Furthermore,
facial expressions and characteristics can be examined to track patient reactions to medicine,
identify early indicators of mental health conditions like depression, and monitor emotional
states [22].
alone cannot solve the problem of detection across scales [28]. Despite these challenges,
great progress has been achieved in the past decade, with several systems demonstrat-
ing excellent real-time performance. These algorithms’ recent advancements have also
made major contributions to recognizing other objects such as humans/pedestrians and
automobiles [29].
5. Methodology
Our systematic review’s primary objective is to identify the databases, approaches,
and issues that face recognition is now facing. Future research in this field will be based
on the knowledge gathered from this study. The goals and research topics covered in this
study are displayed in Table 2.
Question Purpose
Q1 What are the most prevalent methods
To determine which techniques have been
used for face recognition? How do they
applied in the face recognition domain
compare in performance?
Q2 What databases are used in face To identify which databases of face
recognition? recognition are most common
Q3 In which areas in the real world have To find out which areas have adopted face
face recognition techniques been applied? recognition
To assess and identify suitable evaluation
Q4 What are the most common evaluation metrics to use when comparative studies
metrics of face recognition systems? are carried out in the field of face
recognition
In the following subsection, we will cover the steps suggested to identify and screen
the studies in our systematic review.
Information 2025, 16, 107 6 of 41
5.1.3. Screening
In order to eliminate duplicate records, the first filter was applied during the screening
phase. Twelve databases included a total of two texts that were repeated, and 3620 of those
texts matched the remaining records.
Figure 2. PRISMA-ScR diagram showing all the steps taken to filter out articles [8].
6. Databases
Researchers studying face recognition have access to many face datasets. The databases
can be offered with free access to data or for sale. Figure 3 shows the databases used in face
recognition systems.
Information 2025, 16, 107 10 of 41
captured during 15 sessions in a semi-controlled setting. The dataset includes 1564 sets of
face pictures for 14,126 individuals, including 365 duplicates [5]. Figure 5 below shows a
preview picture of the database.
6.4. AR Database
Aleix Martinez and Robert Benavente created the AR Face database in 1998 at the
Computer Vision Center (CVC) of the Universitat Autònoma de Barcelona in Barcelona,
Spain [20,36,37]. This database focuses on face recognition but can also recognize facial
expressions. The AR database includes almost 4000 frontal pictures from 126 participants
(70 men and 56 females). Each subject had 26 samples recorded in two sessions on different
days, with photos measuring 13,576 × 768 pixels each session [20,36,37]. Figure 6 below
shows a preview picture of the database.
6.14. Megaface
The MegaFace dataset is a massive facial recognition dataset intended to test and
enhance face recognition capabilities at scale [46]. It is one of the biggest datasets for
face recognition system training and benchmarking, with over 1 million tagged photos
representing 690,000 distinct identities [46]. MegaFace provides a strong platform for
testing face verification and identification algorithms, including a probe set for testing and
a gallery set containing the majority of identities [46]. Despite the fact that its magnitude
offers important insights into actual face recognition situations, the dataset’s web-based
collection presents problems including noise and poor image quality [46].
source). Face photos and videos of 1500 more people were gathered as an addition to the
IJB-A dataset [47].
spectacles, facial hair, beards, hats, turbans, veils, masquerades, and ball masks. The dataset
is difficult for face identification because of these changes in addition to those related to
stance, lighting, emotion, backdrop, ethnicity, age, gender, clothes, hairstyles, and camera
quality [55]. There are 1001 normal face images, 903 validation face photos, 4814 disguised
face images, and 4440 impersonator images in the DFW collection overall. Any individual
who, whether knowingly or unknowingly, assumes the identity of a topic is considered an
imposter of that subject.
Table 6. Summary of the databases used for training and testing face recognition systems.
Table 7 provides an overview of the strengths and limitations of various face recogni-
tion databases used for training and testing face recognition systems. These databases vary
in terms of the diversity of subjects, quality of images, and environmental conditions, each
presenting unique advantages and challenges for researchers in the field.
The above table presents a comparison of various face recognition databases used for
training and testing systems, focusing on their strengths and limitations. Some datasets,
such as ORL, FERET, and Yale, are smaller and suitable for controlled experiments, but
they are limited in terms of subject diversity and environmental conditions, often with
low-resolution images. On the other hand, databases like FRGC, MegaFace, and Ms-Celeb-
M1 offer large-scale datasets with diverse subjects, poses, and lighting conditions, though
they can suffer from issues like high computational costs, data quality inconsistencies, or
limited diversity in real-world scenarios. Datasets like CASIA Webface and VGGFACE
provide large collections of high-quality images, but some have imbalances in data dis-
tribution or limited pose variation. Certain specialized databases like AR, CASIA Mask,
and DMFD focus on specific challenges, such as occlusion or manipulation detection, but
may have limited generalizability to unconstrained environments. Overall, while larger
datasets provide better scalability, the limitations related to data quality, environmental
Information 2025, 16, 107 17 of 41
conditions, and pose variation must be considered when choosing a database for training
face recognition systems.
Table 7. Strengths and limitations of face recognition databases used for training and testing face
recognition systems.
Table 7. Cont.
The next section concerns face recognition methods and will inform us about tradi-
tional and deep learning architectures.
7.2.2. VGGNet
In the publication “Very Deep Convolutional Networks for Large-Scale Image Recogni-
tion”, K. Simonyan and A. Zisserman introduced the convolutional neural network model
known as the VGG-Network [69]. Their primary contribution was to employ the VGGNet
design, which has modest (3 × 3) convolution filters and doubles the amount of feature
maps following the (2 × 2) pooling. To improve the deep architecture’s ability to learn
continuous nonlinear mappings, the network’s depth was raised to 16–19 weight layers.
Figure 11 below shows an example of a VGG architecture.
7.2.3. ResNet
In 2016, He et al., designed the residual neural network (also known as a residual
network or ResNet) architecture largely to improve the performance of existing CNN
architectures such as VGGNet, GoogLeNet, and AlexNet [70]. A ResNet is a deep learning
model in which weight layers train residual functions based on the layer inputs. It functions
like a highway network, with gates opened using significantly positive bias weights [71].
ResNet uses “global average pooling” instead of “fully connected” layers, resulting in a
significantly reduced model size compared to the VGG network [70].
7.2.4. FaceNet
Google researchers Florian Schroff, Dmitry Kalenichenko, and James Philbina created
the FaceNet face recognition technology. The technology was initially demonstrated at the
2015 IEEE Conference on Computer Vision and Pattern Recognition [72]. Using the Labeled
Faces in the Wild (LFW) and YouTube Faces Databases, FaceNet was able to recognize faces
with an accuracy of 99.63% and 95.12%, respectively [72].
7.2.5. LBPNet
LBPNet uses two filters that are based on Principle Component Analysis (PCA) and
LBP methodologies, respectively [73]. The two components of LBPNet’s architecture are
the (i) regular network for classification and (ii) deep network for feature extraction [73].
7.2.7. YOLO
A convolutional neural network architecture called YOLO was developed by Joseph
Redmon and his colleagues [75]. It provides a one-stop shop for frame position prediction
and the classification of many candidates. Regression is the method used by YOLO to
handle object recognition, simplifying the process from picture input to category and
position output [75].
Information 2025, 16, 107 21 of 41
7.2.8. MTCNN
MTCNNs or Multi-Task Cascaded Convolutional Neural Networks represent a neural
network that detects faces and facial landmarks on images. It was published in 2016 by
Zhang et al. [76]. Additionally, this system uses one-shot learning, or just one image of the
offender to identify him. The goal is to recognize the criminal’s face, locate the information
that has been entered in the database for that criminal, and notify the police of all the facts,
including the place where the criminal was being watched by cameras [76].
7.2.9. DeepMaskNet
Ullah et al. presented DeepMaskNet in 2021 during COVID-19 [77]. It is a powerful
system that can distinguish between individuals who are wearing face masks and those
who are not. With an accuracy of 93.33%, DeepMaskNet outperformed various cutting-edge
CNN models, including VGG19, AlexNet, and Resnet18 [77].
7.2.10. DenseNet
DenseNet was suggested as a solution to the vanishing gradient problem and is
comparable to ResNet [78]. DenseNet solves the problem with ResNet by using feed-
forward connections between each previous layer and the subsequent layer, utilizing
cross-layer connectivity. ResNet explicitly retains information through additive identity
transformations, which adds to its complexity. It makes use of thick blocks; as a result, all
following levels receive their feature maps from all preceding layers [78].
7.2.11. MobileNetV2
A convolutional neural network design called MobileNetV2 aims to function well
on mobile devices [79]. Its foundation is an inverted residual structure, in which the
bottleneck layers are connected by residuals. Lightweight depthwise convolutions are used
by the intermediate expansion layer to filter features as a source of non-linearity. The first
complete convolution layer with 32 filters makes up the entirety of MobileNetV2’s design.
It is followed by 19 residual bottleneck levels [79].
7.2.12. MobileFaceNets
MobileFaceNets are a type of very efficient CNN model designed for high-accuracy
real-time face verification on mobile and embedded devices [80]. These models have less
than one million parameters. The superiority of MobileFaceNets over MobileNetV2 has
been established. MobileFaceNets require smaller amounts of data while maintaining
superior accuracy when compared to other cutting-edge CNNs [80].
7.2.15. DeepFace
Facebook’s research group developed DeepFace, a deep learning face recognition
technology. It recognizes human faces in digital photos. The approach employs a deep
convolutional neural network (CNN) to develop robust face representations for the purpose
of face verification [85]. DeepFace achieved a human-level accuracy of 97.35% on the LFW
(Labeled Faces in the Wild) dataset, which was a substantial advance above existing
approaches at the time [85]. The network comprises nine layers and pre-processes photos
using 3D face alignment, which aligns them based on facial landmarks and helps with
position variations. It uses a softmax loss function during training to discriminate between
distinct identities [85].
attention employs many sets (or “heads”) to capture various relationships or aspects in the
data [88].
Table 8. Cont.
Table 8. Cont.
The next section covers performance measures; the section will inform us about the
perform metrics used in face recognition systems.
8. Performance Measures
8.1. Accuracy
Accuracy quantifies the model’s overall accuracy by comparing the number of correct
predictions to the total number of predictions. Below is the mathematical computation
of accuracy:
TP + TN
Accuracy =
TP + TN + FP + FN
8.2. Precision
Precision refers to the fraction of correctly identified positives. In face recognition
systems, it refers to the percentage of correct positive matches out of all those recognized
by the system. Below is the mathematical computation of precision:
TP
Precision =
TP + FP
8.3. Recall
Recall is a statistic that indicates how often a machine learning model accurately
detects positive examples (true positives) among all of the actual positive samples in the
dataset. Below is the mathematical computation of recall:
TP
Recall =
TP + FN
8.4. F1-Score
The F1-score is a critical assessment parameter for facial recognition algorithms, partic-
ularly when dealing with unbalanced datasets. It offers a single measurement that balances
accuracy and recall. The F1-score is a more trustworthy estimate of a model’s performance
than accuracy since it takes into account both false positives and false negatives. The
F1-score represents the harmonic mean of accuracy and recall. The harmonic mean is
utilized instead of the standard arithmetic mean since it penalizes extreme values more.
Below is the mathematical computation of F1-score:
2 ∗ Precision ∗ Recall 2 ∗ TP
F1 = =
Precision + Recall 2 ∗ TP + FP + FN
A perfect F1-score of 1 indicates excellent precision and recall, whereas the poorest
possible F1-score is 0.
8.5. Sensitivity
This is the system’s capacity to accurately recognize positives. This metric measures
the system’s ability to identify subject persons. Below is the mathematical computation
of sensitivity:
TP
Sensitivity = Recall =
TP + FN
8.6. Specificity
This is the system’s capacity to accurately recognize negatives. This metric measures
the system’s ability to identify non-subject persons. Below is the mathematical computation
of specificity:
TN
Speci f icity =
FP + TN
Table 9. Summary of the strengths and weaknesses of the performance metrics for face recognition.
The above are the most commonly used performance measures for face recognition
systems. The next section is on Face Recognition Loss Functions; the section will inform us
about how faces are mapped in order to achieve the best accuracy.
N
Lsoftmax = − ∑ yi log( pi )
i =1
Information 2025, 16, 107 28 of 41
where L is the loss, N is the number of classes, yi is the ground truth label (1 for the correct
class, 0 for others), and ŷi is the predicted probability for class i, given by the Softmax
function. The goal is to minimize this loss function, which means maximizing the likelihood
of the true class being predicted with high probability.
In the Triplet Loss function, the variables represent key components for learning
discriminative embeddings. a is the anchor, which is the reference input, typically a face
image. p is the positive sample, which is another image of the same identity or class as the
anchor, while n is the negative sample, from a different class or identity. The term d(a, p)
measures the distance between the anchor and positive samples, aiming to keep them close,
while d(a, n) measures the distance between the anchor and the negative sample, aiming
to maximize the distance. The parameter α is a margin that ensures the anchor-negative
distance is sufficiently larger than the anchor-positive distance by a margin of α, preventing
trivial solutions. The max function ensures the loss is zero if the margin condition is met;
otherwise, it penalizes the model for not achieving sufficient separation between positive
and negative pairs.
In this equation, Lcenter represents the center loss. The variable N denotes the number
of samples in the batch. xi refers to the feature vector of the i-th sample, and cyi is the center
of the class yi to which the i-th sample belongs. The notation ∥xi − cyi ∥22 represents the
squared Euclidean distance between the feature vector xi and the corresponding class center
cyi . This loss function minimizes the distance between the feature representations of the
same class by encouraging each sample to be closer to the center of its corresponding class.
In this equation, Larcface represents the ArcFace loss. The term θy is the angle between
the feature vector and the weight vector of the true class y. m is a margin added to the
true class angle θy to enforce a larger angular distance for correct classification. The cosine
of this modified angle, cos(θy + m), is used to increase the decision margin for the true
class. The denominator consists of the sum of the exponentials of the cosine values of the
angles for all classes, including the true class and all other classes j ̸= y. This loss function
encourages the model to distinguish between classes by maximizing the angular margin
between the true class and other classes, thereby improving classification accuracy.
1
Lcontrastive = y · ∥a − b∥22 + (1 − y) · max(0, m − ∥a − b∥2 )2
2
In this equation, Lcontrastive represents the contrastive loss. The term a and b are
feature vectors of two samples (e.g., two images), and ∥a − b∥2 is the Euclidean distance
between these vectors. The label y is a binary indicator where y = 1 indicates that the two
Information 2025, 16, 107 29 of 41
samples are from the same class (positive pair), and y = 0 indicates that they are from
different classes (negative pair).
For positive pairs, the loss is proportional to the squared Euclidean distance between
the feature vectors, encouraging the model to bring similar samples closer. For negative
pairs, the loss is based on a margin m, encouraging the model to push dissimilar samples
apart. If the distance between the negative pair is smaller than the margin m, the loss will
be proportional to the square of the difference from the margin. If the distance exceeds the
margin, no penalty is applied. This loss function promotes the model to learn embeddings
that minimize the distance for similar samples and maximize the distance for dissimilar
ones, thus improving the model’s discriminative ability.
Al-Azzawi et al. [93]’s experimental findings in 2018 indicate that utilizing a Localized-
Based CNN structure for face recognition and identification improves performance by
leveraging the Localized Deep feature map. The model addresses issues with huge expres-
sions, poses, lighting, and poor resolution. The suggested model achieved 98.76% facial
recognition accuracy.
In 2018, Lu et al. [94] proposed Deep Coupled ResNet (DCR) for low-resolution face
recognition. The suggested DCR model consistently outperformed the state of the art,
according to a thorough examination on the LFW database. For a probe size of 8 × 8,
when using the LFW database, the DCR achieved an accuracy of 93.6%, which was better
compared to 67.7%, 72.7%, and 75.0% achieved by the LightCNN, ResNet, and VGGFace,
respectively [94].
In 2018, Qu et al. [95] proposed a real-time facial recognition based on a convolution
neural network (CNN) on Field Programmable Gate Array (FPGA), which enhances speed
and accuracy. FPGA technology allows parallel computing and unique logic circuit design,
resulting in faster processing speeds compared to Central Processing Unit (CPU), Graphics
Processing Unit (GPU), and Tensor Processing Unit (TPU) processors. Parallel processing
of FPGA speeds up network calculation, enabling real-time face recognition. The network
operates at a clock frequency of 50 Megahertz (MHz) and achieves recognition speeds of
up to 400 Frames per second (FPS), exceeding previous achievements. The recognition rate
was 99.25% greater than the human eye.
In 2019, Talab et al. [96] developed an effective sub-pixel convolution neural network
for face recognition at low to high resolutions. This convolutional neural network is used
in image pre-processing to improve recognition of low-resolution pictures. The suggested
Efficient Sub-Pixel Convolutional Neural Network converts low-resolution pictures to high-
resolution for further recognition. Evaluations were conducted using Yale face database
and ORL dataset faces. The suggested technique achieved greater accuracy for the Yale and
ORL datasets (95.3% and 93.5%, respectively) compared to prior methods.
In 2020, Feng et al. [97] proposed a small sample face recognition approach using
ensemble deep learning. This approach improves network performance by increasing
the number of training samples, optimizing training parameters, and reducing the dis-
advantage of small sample sizes. Experiments indicate that this technique outperforms
convolutional neural networks in recognizing faces in the ORL database, even with few
training sets.
In 2020, Lin et al. [98] suggested LWCNN for small sample data and employed k-fold
cross-validation to ensure robustness. Lin et al.’s suggested LWCNN approach outper-
formed other methods in terms of recognition accuracy and avoidance of overfitting in
limited sample spaces.
In 2021, Yudita and colleagues [99] investigated the concept of face recognition utilizing
VGG16 based on incomplete or insufficient human face data. The study achieved low
accuracy; Yudita et al. suggested that the effectiveness of CNN models for human face
recognition may be impacted by the many databases that are employed, each having
varying quantities and kinds of data.
In 2021, Szmurlo and Osowski [100] suggested a technique that combines feature
selection methods with three classifiers: support vector machine, random forest of decision
trees, and Softmax incorporated into a CNN. The system uses an ALEXNET-based CNN.
Results indicate that SVM outperforms random forests. However, SVM and Softmax are
not as accurate as the standard softmax classifier (96.3% vs. 97.3%, respectively). Classical
classifiers perform poorly due to little learning data. CNN generates a huge number
of descriptors, making traditional classifiers difficult and requiring a large number of
learning samples.
Information 2025, 16, 107 31 of 41
Multi-task cascaded convolutional neural networks (MTCNNs) were used for rapid
face identification and alignment, while FaceNet with increased loss function provided
high-accuracy face verification and recognition. The study evaluated the performance of
their MTCNN and FaceNet hybrid network for face identification and recognition to other
deep learning algorithms and approaches. The testing results show that the upgraded
FaceNet can handle real-time recognition needs with an accuracy of 99.85% compared to
97.83% achieved by MTCNN. They recommended that the Access Control System’s face
detection and recognition functions may be efficiently integrated [101].
The studies conducted by Sarahi et al. [102] in 2021 show that YOLO-Face achieves
precision, recall, and accuracy scores of 99.8%, 72.9%, and 72.8%. The only model that
outperforms YOLO-Face is MTCNN [102], which achieves an accuracy of 81.5% in the
FDDB dataset. Furthermore, the CelebA dataset, which consists of 19,962 image faces, was
used to test the same face detection models. The findings indicate that all of the models
work well when used with the CelebA dataset. But Face-SSD outperforms YOLO-Face
with 99.6% accuracy, with 99.7% accuracy. Lastly, it should be noted that YOLO-Face
achieves 95.8%, 94.2%, and 87.4% on the Easy, Medium, and Hard subsets of WIDER FACE
validation. It is evident that small-scale faces generate superior results for the YOLO-Face
detector [102].
In 2021, Malakar et al. [103] suggested a reconstructive approach to obtain partially
restored characteristics of the occluded area of the face, which was subsequently recognized
using an existing deep learning method. The proposed technique does not entirely recreate
the occluded section, but it does give enough characteristics to enhance recognition accuracy
by up to 15%.
In 2022, Marjan et al. [104] provide an improved VGG19 deep model to enhance the
accuracy of masked face recognition systems. The experimental findings show that the
suggested extended VGG19 method outperforms other techniques. The suggested model
accurately detects the frontal face with a mask (96%).
With respect to generative adversarial network (GAN)-based techniques, Li et al. [105]
provided an algorithm architecture including de-occlusion and distillation modules. The
de-occlusion module utilizes GAN for masked face completion, allowing for the recovery of
occluded features and eliminating appearance ambiguity. The distillation module employs
a pre-trained model for face categorization. The simulated LFW dataset had the greatest
recognition accuracy of 95.44%. GANs offer a way to grow datasets without requiring a lot
of real-world data collection, which enhances system performance while resolving privacy
and data availability issues [106–108]. However, it is difficult for GAN-based algorithms
to duplicate the characteristics of the face’s key features, especially when there is broad
occlusion, as in the case of a facemask [109].
Using a Convolutional Block Attention Module (CBAM) [110], Li et al. [111] presented
a novel technique that blends cropping-based and attention-based methodologies. In
order to maximize recognition accuracy, the cropping-based method eliminates the masked
face region while experimenting with different cropping proportions. The masked facial
features received lower weights in the attention-based procedure, but the characteristics
surrounding the eyes received larger weights. The accuracy of this method was 92.61%
Masked Face Recognition (MFR). In a different work, Deng et al. [112] improved masked
facial recognition by applying cosine loss to create the MF-Cosface algorithm, which
outperformed the attention-based approach. In order to improve the model’s emphasis
on the uncovered face area, they also developed an Attention–Inception module that
combines CBAM with Inception–ResNet. Verification tasks were somewhat improved by
this approach. Wu [113] presented a local restricted dictionary learning approach for an
attention-based MFR algorithm that distinguishes between the face and the mask. It uses
Information 2025, 16, 107 32 of 41
the attention mechanism to lessen information loss and the dilated convolution to increase
picture resolution.
Table 10 summarizes various face recognition architectures, their corresponding train-
ing sets, and performance metrics, highlighting key approaches such as Convolutional
Neural Networks (CNNs), VGG16, FaceNet, and others across different datasets and years.
The table provides insights into the evolution of face recognition models, including their
verification metrics and accuracy rates, showcasing both traditional and contemporary
methods.
Table 10. Summary of CNN architectures used for face recognition systems.
Convolutional
Architectures Training Set Year Authors Verif. Metric Accuracy
Layers
SVM ORL 2003 Yanhun and Chongqing [114] - - 96%
Gabor
ORL 2001 Kepenekci. [115] - - 95.25%
Wavelets
ESP-CNN +
ORL 2019 Talab et al. [96] - - 93.5%
CNN
Ensemble
ORL 2020 Feng et al. [97] 4 Softmax 88.5%
CNN
Center Loss and
VGG16 ORL 2020 Lou and Shi [116] 16 99.02%
Softmax
DeepFace LFW 2014 Taigman et al. [48,85] - Softmax 97.35%
FaceNet LFW 2015 Parkhi et al. [48] - Triplet Loss 98.87%
DCR LFW 2018 Lu et al. [94] - CM Loss 93.6%
ResNet LFW 2018 Lu et al. [94] - CM Loss 72.7%
Localized
LFW 2018 Al-Azzawi et al. [93] - Softmax 97.13%
Deep-CNN
FaceNet LFW 2021 Malakar et al. [103] - - 70–80%
Triplet Loss and
MTCNN LFW 2021 Wu and Zhang [101] 9 97.83%
ArcFace Loss
MTCNN + Triplet Loss and
LFW 2021 Wu and Zhang [101] 9 99.85%
FaceNet ArcFace Loss
VGG16 VGGFace 2015 Parkhi et al. [85] - Triplet Loss 98.95%
Light-CNN MS-Celeb-1M 2015 Parkhi et al. [48] - Softmax 98.8%
Traditional
CMU-PIE 2018 Qu et al. [95] 5 Sigmoid 99.25%
CNN
ESPCN + CNN Yale 2019 Talab et al. [96] - - 95.3%
Center Loss and
VGG16 Yale 2020 Lou and Shi [116] 16 97.62%
Softmax
Yale Face
LWCNN 2020 Lin et al. [98] 9 Softmax 96.19%
Database
VGG16 CASIA 2020 Lou and Shi [116] 16 Center Loss + Softmax 98.65%
CASIA Mask CASIA 2021 - - Occluded Faces -
AlexNet Own dataset 2021 Szmurlo and Osowski [100] 9 Softmax 97.8%
AlexNet Own dataset 2022 Mahesh and Ramkumar [117] 8 Softmax 96%
Face
- 2022 Sun and Tzimiropoulos [84] - - 99.83%
Transformer
YOLO-Face FDDB 2021 Sarahi et al. [102] - - 72.8%
Yale Face
PCA + FaceNet 2021 Malakar et al. [103] - - 85–95%
Database B
Extended
- 2022 Marjan et al. [104] 19 Softmax 96%
VGG19
When comparing various convolutional neural network (CNN) architectures used for
face recognition, several differences in accuracy and training datasets become apparent.
For instance, FaceNet, a model developed by Parkhi et al. [48] in 2015, demonstrated an
impressive accuracy of 98.87% on the LFW dataset, showcasing its strong performance
in large-scale face verification tasks. Similarly, VGG16, introduced by Parkhi et al. [48]
in the same year, achieved a slightly higher accuracy of 98.95% on the VGGFace dataset,
Information 2025, 16, 107 33 of 41
which indicates its effectiveness in handling facial recognition tasks involving variations
in face poses, lighting, and identities. The models trained on these benchmark datasets,
LFW and VGGFace, highlight the robustness of these architectures in large-scale real-
world conditions.
On the other hand, models like SVM (2003) and Gabor Wavelets (2001), while demon-
strating high accuracy on smaller datasets like ORL (96% and 95.25%, respectively), show
limitations in terms of scalability to larger, more complex datasets. Their reliance on simpler
algorithms and smaller training sets places them at a disadvantage when compared to more
recent architectures like FaceNet and VGG16, which leverage deep learning techniques
and larger, more diverse datasets. For example, FaceNet’s success on LFW (98.87%) and its
application to other datasets like Yale Face Database B (70–80%) emphasize its versatility
and high generalization capabilities.
Moreover, architectures like MTCNN, which achieved 99.85% accuracy on the WIDER
Face dataset and LFW, demonstrate the advantage of combining multiple models—such as
face detection and face recognition—into a single framework. This further underscores the
importance of using multi-task learning and ensemble methods to achieve state-of-the-art
results. While older models like Gabor Wavelets performed reasonably well on smaller
datasets, the increasing accuracy and applicability of newer models on larger datasets
signal the rapid progress in the field of face recognition systems.
In conclusion, models like FaceNet, VGG16, and MTCNN, when trained on compre-
hensive datasets such as LFW and VGGFace, show superior performance, with accuracy
rates exceeding 98%. In contrast, earlier models such as SVM and Gabor Wavelets perform
well on smaller, less complex datasets but fall short in comparison to modern architectures
designed for scalability and robustness. The evolution from simpler models to advanced
deep learning approaches highlights the growing demand for sophisticated algorithms
capable of handling large-scale, real-world recognition challenges.
using ensemble deep learning. Lin et al. [98] suggested LWCNN for small sample data and
employed k-fold cross-validation to ensure robustness. Using small sample dataset raises
questions regarding the accuracy of the model in real-life application. Generally, CNNs are
known to use large datasets to achieve the best accuracy.
Occlusion, pose, and light are also still a concern. Malakar et al. [103] suggested a
reconstructive approach to obtain partially restored characteristics of the occluded area
of the face, which was subsequently recognized using an existing deep learning method.
Marjan et al. [104] provided an improved VGG19 deep model to enhance the accuracy of
masked face recognition systems. The two techniques rely on existing unmasked data to
identify an individual. The techniques appear to be ineffective if the unmasked data are
not available and they also lack the inability to fully reconstruct the missing parts.
Most researchers are using images to train and test facial recognition systems [8]. Using
image face datasets helps them achieve great accuracy reasons provided that the following are
met: 1. for the most part, the subject is directly facing the camera; and 2. the environment is
controlled. Using video for facial recognition presents a lot of challenges but creates an opportu-
nity for research that explore gait recognition. Gait recognition has advantages of overcoming
camera angle problems and the ability to identify a masked person by their movement.
13. Conclusions
In this study, we explored CNNs for face recognition, databases, and their performance
metrics. CNNs are very versatile depending on the objective(s), provided that their architecture
and layers are not limited. Face recognition systems can identify a person under various
conditions; however, most of them struggle with occlusion, camera angle, pose, and light.
13.1. Observations
• More sophisticated deep learning architectures like FaceNet, VGG16, and MTCNN
have clearly replaced previous, simpler models like SVM and Gabor Wavelets. Newer
models increase scalability and accuracy, demonstrating the field of face recognition’s
increasing complexity.
• There are several CNN designs, databases, and performance measures, as we have
shown. In the face recognition field, several CNN architectures are applied to different
tasks. Certain designs work well in scenarios such as low-quality images or videos
and frontal face recognition. There were more images than videos in most of the
datasets used to train and evaluate face recognition algorithms. Most researchers, we
also noticed, employed Softmax verification measures.
• We observed that most of the databases used for training and testing face recognition
models had less black people in comparison to other races. The lack of balance in the
datasets leads to bias. The issue of privacy surrounding cameras is still being debated
in the US [8,120]. One aspect of the discussion is the inability of facial recognition
software to distinguish between black and white faces, resulting in racial profiling
and erroneous arrests [8,120].
• Occlusion, camera angle, stance, and lighting are all issues that persist [8]. Researchers
are attempting to devise solutions to these challenges; however, the varying resolutions
provided by the source of images or videos, such as CCTV, creates additional issues.
Intelligent and automated face recognition systems work well in controlled contexts,
but badly in uncontrolled ones [121]. The two main causes of this are facial alignment
and the use of high- or low-resolution face images taken in controlled settings by
researchers for training.
• Models trained on bigger and more diversified datasets, like LFW and VGGFace, often
perform better in real-world, large-scale recognition tasks. The comparison of simpler
Information 2025, 16, 107 35 of 41
datasets such as ORL and more complex datasets such as LFW indicates that dataset
size and variety are important variables in the performance of face recognition systems.
• Little research has been conducted using Transformers and mamba for face recognition.
• Most researchers use accuracy as a performance metric. The use of accuracy as a
performance measurement without comparing it to other performance metrics raises
questions about the validity of results.
• The use of several models for face detection and recognition, as demonstrated in
MTCNN, highlights the potential advantages of hybrid and ensemble techniques.
Research into more advanced hybrid models might increase the robustness and adapt-
ability of face recognition systems.
Author Contributions: Conceptualization, A.N., C.C. and S.V.; methods, A.N. and C.C.; investigation,
A.N. and C.C.; formal Analysis, A.N.; data curation, A.N.; writing—original draft preparation, A.N.;
writing—review and editing, A.N., C.C. and S.V.; supervision, S.V. and C.C.; project administration,
S.V. and C.C.; funding acquisition, S.V. and C.C. Authorship has been limited to those who have
contributed substantially to the work reported. All authors have read and agreed to the published
version of the manuscript.
Funding: This work was undertaken within the context of the Centre for Artificial Intelligence
Research, which is supported by the Centre for Scientific and Innovation Research (CSIR) under grant
number CSIR/BEI/HNP/CAIR/2020/10, supported by the Government of the Republic of South
Africa through its Department of Science and Innovation’s University Capacity Development grants.
Data Availability Statement: The datasets analysed in this study can be found in the University of
Johannesburg repository using the following link: https://figshare.com/s/dbfbe28773afeb71872f
(accessed on 12 June 2024).
Acknowledgments: We acknowledge both the moral and technical support given by the University
of Johannesburg, University of KwaZulu-Natal, and Sol Plaatje University.
Conflicts of Interest: The funders had no role in the design of the study; in the collection, analyses,
or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Junayed, M.S.; Sadeghzadeh, A.; Islam, M.B. Deep covariance feature and cnn-based end-to-end masked face recognition. In
Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur,
India, 15–18 December 2021; IEEE: New York, NY, USA, 2021; pp. 1–8.
2. Chien, J.T.; Wu, C.C. Discriminant waveletfaces and nearest feature classifiers for face recognition. IEEE Trans. Pattern Anal.
Mach. Intell. 2002, 24, 1644–1649. [CrossRef]
3. Wan, L.; Liu, N.; Huo, H.; Fang, T. Face recognition with convolutional neural networks and subspace learning. In Proceedings
of the 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, 2–4 June 2017; IEEE: New
York, NY, USA, 2017; pp. 228–233.
4. Jain, A.K.; Ross, A.A.; Nandakumar, K. Introduction to Biometrics; Springer: Berlin/Heidelberg, Germany, 2011.
5. Taskiran, M.; Kahraman, N.; Erdem, C.E. Face recognition: Past, present and future (a review). Digit. Signal Process. 2020,
106, 102809. [CrossRef]
6. Kortli, Y.; Jridi, M.; Al Falou, A.; Atri, M. Face Recognition Systems: A Survey. Sensors 2020, 20, 342. [CrossRef]
7. Saini, R.; Rana, N. Comparison of various biometric methods. Int. J. Adv. Sci. Technol. 2014, 2, 24–30.
8. Nemavhola, A.; Viriri, S.; Chibaya, C. A Scoping Review of Literature on Deep Learning Techniques for Face Recognition. Hum.
Behav. Emerg. Technol. 2025, 2025, 5979728. [CrossRef]
9. Adjabi, I.; Ouahabi, A.; Benzaoui, A.; Taleb-Ahmed, A. Past, present, and future of face recognition: A review. Electronics 2020,
9, 1188. [CrossRef]
10. Nilsson, N.J. The Quest for Artificial Intelligence; Cambridge University Press: Cambridge, UK, 2009.
11. de Leeuw, K.M.M.; Bergstra, J. The History of Information Security: A Comprehensive Handbook; Elsevier: Amsterdam,
The Netherlands, 2007.
12. Turk, M.; Pentland, A. Eigenfaces for recognition. J. Cogn. Neurosci. 1991, 3, 71–86. [CrossRef]
13. Gates, K.A. Our Biometric Future: Facial Recognition Technology and the Culture of Surveillance; NYU Press: New York, NY, USA,
2011; Volume 2.
14. King, I. Neural Information Processing: 13th International Conference, ICONIP 2006, Hong Kong, China, 3–6 October 2006: Proceedings;
Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006.
15. Xue-Fang, L.; Tao, P. Realization of face recognition system based on Gabor wavelet and elastic bunch graph matching. In
Proceedings of the 2013 25th Chinese Control and Decision Conference (CCDC), Guiyang, China, 25–27 May 2013; IEEE: New
York, NY, USA, 2013; pp. 3384–3386.
16. Kundu, M.K.; Mitra, S.; Mazumdar, D.; Pal, S.K. Perception and Machine Intelligence: First Indo-Japan Conference, PerMIn 2012, Kolkata,
India, 12–13 January 2011, Proceedings; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 7143.
17. Guo, G.; Zhang, N. A survey on deep learning based face recognition. Comput. Vis. Image Underst. 2019, 189, 102805. [CrossRef]
18. Datta, A.K.; Datta, M.; Banerjee, P.K. Face Detection and Recognition: Theory and Practice; CRC Press: Boca Raton, FL, USA, 2015.
19. Gofman, M.I.; Villa, M. Identity and War: The Role of Biometrics in the Russia-Ukraine Crisis. Int. J. Eng. Sci. Technol. (IJonEST)
2023, 5, 89. [CrossRef]
20. Barnouti, N.H.; Al-Dabbagh, S.S.M.; Matti, W.E. Face recognition: A literature review. Int. J. Appl. Inf. Syst. 2016, 11, 21–31.
[CrossRef]
21. Lal, M.; Kumar, K.; Arain, R.H.; Maitlo, A.; Ruk, S.A.; Shaikh, H. Study of face recognition techniques: A survey. Int. J. Adv.
Comput. Sci. Appl. 2018; 1–8 . [CrossRef]
22. Shamova, U. Face Recognition in Healthcare: General Overview. Язык в сфере профессиональной коммуникации.—Екатеринбург,
2020; pp. 748–752. Available online: https://elar.urfu.ru/handle/10995/84113 (accessed on 17 March 2024).
23. Elngar, A.A.; Kayed, M. Vehicle security systems using face recognition based on internet of things. Open Comput. Sci. 2020,
10, 17–29. [CrossRef]
24. Xing, J.; Fang, G.; Zhong, J.; Li, J. Application of face recognition based on CNN in fatigue driving detection. In Proceedings of
the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing, Dublin, Ireland, 17–19 October 2019;
pp. 1–5.
25. Pabiania, M.D.; Santos, K.A.P.; Villa-Real, M.M.; Villareal, J.A.N. Face recognition system for electronic medical record to access
out-patient information. J. Teknol. 2016, 78. [CrossRef]
26. Aswis, A.; Morsy, M.; Abo-Elsoud, M. Face Recognition Based on PCA and DCT Combination Technique. Int. J. Eng. Res. Technol.
2018, 4, 1295–1298.
27. Ranjan, R.; Sankaranarayanan, S.; Bansal, A.; Bodla, N.; Chen, J.C.; Patel, V.M.; Castillo, C.D.; Chellappa, R. Deep learning for
understanding faces: Machines may be just as good, or better, than humans. IEEE Signal Process. Mag. 2018, 35, 66–83. [CrossRef]
28. Loy, C.C. Face detection. In Computer Vision: A Reference Guide; Springer: Berlin/Heidelberg, Germany, 2021; pp. 429–434.
29. Yang, M.H. Face Detection. In Encyclopedia of Biometrics; Li, S.Z., Jain, A.K., Eds.; Springer: Boston, MA, USA, 2015; pp. 447–452.
[CrossRef]
Information 2025, 16, 107 38 of 41
30. Calvo, G.; Baruque, B.; Corchado, E. Study of the pre-processing impact in a facial recognition system. In Proceedings of
the Hybrid Artificial Intelligent Systems: 8th International Conference, HAIS 2013, Salamanca, Spain, 11–13 September 2013;
Proceedings 8; Springer: Berlin/Heidelberg, Germany, 2013; pp. 334–344.
31. Benedict, S.R.; Kumar, J.S. Geometric shaped facial feature extraction for face recognition. In Proceedings of the 2016 IEEE
International Conference on Advances in Computer Applications (ICACA), Coimbatore, India, 24 October 2016; pp. 275–278.
[CrossRef]
32. Napoléon, T.; Alfalou, A. Pose invariant face recognition: 3D model from single photo. Opt. Lasers Eng. 2017, 89, 150–161.
[CrossRef]
33. Tricco, A.C.; Lillie, E.; Zarin, W.; O’Brien, K.K.; Colquhoun, H.; Levac, D.; Moher, D.; Peters, M.D.; Horsley, T.; Weeks, L.;
et al. PRISMA extension for scoping reviews (PRISMA-ScR): Checklist and explanation. Ann. Intern. Med. 2018, 169, 467–473.
[CrossRef] [PubMed]
34. Phillips, P.; Moon, H.; Rizvi, S.; Rauss, P. The FERET evaluation methodology for face-recognition algorithms. IEEE Trans. Pattern
Anal. Mach. Intell. 2000, 22, 1090–1104. [CrossRef]
35. Belhumeur, P.; Hespanha, J.; Kriegman, D. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE
Trans. Pattern Anal. Mach. Intell. 1997, 19, 711–720. [CrossRef]
36. Huang, G.B.; Mattar, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database forstudying face recognition in
unconstrained environments. In Proceedings of the Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and
Recognition, Marseille, France, 17–20 October 2008.
37. Zou, J.; Ji, Q.; Nagy, G. A comparative study of local matching approach for face recognition. IEEE Trans. Image Process. 2007,
16, 2617–2628. [CrossRef]
38. Peer, P. CVL Face Database; University of Ljubljana: Ljubljana, Slovenia, 2010.
39. Fox, N.; Reilly, R.B. Audio-visual speaker identification based on the use of dynamic audio and visual features. In Proceedings
of the International Conference on Audio-and Video-Based Biometric Person Authentication, Guildford, UK, 9–11 June 2003;
Springer: Berlin/Heidelberg, Germany, 2003; pp. 743–751.
40. Bailly-Bailliére, E.; Bengio, S.; Bimbot, F.; Hamouz, M.; Kittler, J.; Mariéthoz, J.; Matas, J.; Messer, K.; Popovici, V.; Porée, F.; et al.
The BANCA database and evaluation protocol. In Proceedings of the Audio-and Video-Based Biometric Person Authentication:
4th International Conference, AVBPA 2003, Guildford, UK, 9–11 June 2003; Proceedings 4; Springer: Berlin/Heidelberg, Germany,
2003; pp. 625–638.
41. Phillips, P.; Flynn, P.; Scruggs, T.; Bowyer, K.; Chang, J.; Hoffman, K.; Marques, J.; Min, J.; Worek, W. Overview of the face
recognition grand challenge. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 947–954. [CrossRef]
42. Milborrow, S.; Morkel, J.; Nicolls, F. The MUCT Landmarked Face Database. In Pattern Recognition Association of South Africa;
2010. Available online: http://www.milbo.org/muct (accessed on 12 April 2024).
43. Gross, R.; Matthews, I.; Cohn, J.; Kanade, T.; Baker, S. Multi-PIE. Image Vis. Comput. 2013, 28, 807–813. [CrossRef]
44. Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Learning face representation from scratch. arXiv 2014, arXiv:1411.7923.
45. Klare, B.F.; Klein, B.; Taborsky, E.; Blanton, A.; Cheney, J.; Allen, K.; Grother, P.; Mah, A.; Jain, A.K. Pushing the frontiers of
unconstrained face detection and recognition: Iarpa janus benchmark a. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1931–1939.
46. Kemelmacher-Shlizerman, I.; Seitz, S.M.; Miller, D.; Brossard, E. The MegaFace Benchmark: 1 Million Faces for Recognition at
Scale. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA,
27–30 June 2016; pp. 4873–4882. [CrossRef]
47. Whitelam, C.; Taborsky, E.; Blanton, A.; Maze, B.; Adams, J.; Miller, T.; Kalka, N.; Jain, A.K.; Duncan, J.A.; Allen, K.; et al. Iarpa
janus benchmark-b face dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops,
Honolulu, HI, USA, 21–26 July 2017; pp. 90–98.
48. Parkhi, O.; Vedaldi, A.; Zisserman, A. Deep face recognition. In Proceedings of the BMVC 2015-Proceedings of the British
Machine Vision Conference, Swansea, UK, 7–10 September 2015; British Machine Vision Association: Durham, UK, 2015.
49. Sengupta, S.; Chen, J.C.; Castillo, C.; Patel, V.M.; Chellappa, R.; Jacobs, D.W. Frontal to profile face verification in the wild. In
Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10
March 2016; IEEE: New York, NY, USA, 2016; pp. 1–9.
50. Guo, Y.; Zhang, L.; Hu, Y.; He, X.; Gao, J. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In Proceedings
of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings,
Part III 14; Springer: Berlin/Heidelberg, Germany, 2016; pp. 87–102.
51. Al-ghanim, F.; Aljuboori, A. Face Recognition with Disguise and Makeup Variations Using Image Processing and Machine
Learning. In Proceedings of the Advances in Computing and Data Sciences: 5th International Conference, ICACDS 2021, Nashik,
India, 23–24 April 2021; pp. 386–400. [CrossRef]
Information 2025, 16, 107 39 of 41
52. Cao, Q.; Shen, L.; Xie, W.; Parkhi, O.M.; Zisserman, A. Vggface2: A dataset for recognising faces across pose and age. In
Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China,
15–19 May 2018; IEEE: New York, NY, USA, 2018; pp. 67–74.
53. Maze, B.; Adams, J.; Duncan, J.A.; Kalka, N.; Miller, T.; Otto, C.; Jain, A.K.; Niggel, W.T.; Anderson, J.; Cheney, J.; et al. Iarpa janus
benchmark-c: Face dataset and protocol. In Proceedings of the 2018 International Conference on Biometrics (ICB), Gold Coast,
Australia, 20–23 February 2018; IEEE: New York, NY, USA, 2018; pp. 158–165.
54. Nech, A.; Kemelmacher-Shlizerman, I. Level Playing Field for Million Scale Face Recognition. arXiv 2017, arXiv:1705.00393.
55. Kushwaha, V.; Singh, M.; Singh, R.; Vatsa, M.; Ratha, N.K.; Chellappa, R. Disguised Faces in the Wild. In Proceedings of the 2018
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June
2018; pp. 1–18.
56. Elharrouss, O.; Almaadeed, N.; Al-Maadeed, S. LFR face dataset:Left-Front-Right dataset for pose-invariant face recognition in
the wild. In Proceedings of the 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), Doha,
Qatar, 2–5 February 2020; pp. 124–130. [CrossRef]
57. Ayad, W.; Qays, S.; Al-Naji, A. Generating and Improving a Dataset of Masked Faces Using Data Augmentation. J. Tech. 2023,
5, 46–51. [CrossRef]
58. Gottumukkal, R.; Asari, V.K. An improved face recognition technique based on modular PCA approach. Pattern Recognit. Lett.
2004, 25, 429–436. [CrossRef]
59. Yang, J.; Liu, C.; Zhang, L. Color space normalization: Enhancing the discriminating power of color spaces for face recognition.
Pattern Recognit. 2010, 43, 1454–1466. [CrossRef]
60. Ye, J.; Janardan, R.; Li, Q. Two-dimensional linear discriminant analysis. Adv. Neural Inf. Process. Syst. 2004, 17, 1–8.
61. Rahman, M.T.; Bhuiyan, M.A. Face recognition using Gabor Filters. In Proceedings of the 2008 11th International Conference on
Computer and Information Technology, Khulna, Bangladesh, 24–27 December 2008; pp. 510–515. [CrossRef]
62. Olshausen, B.A.; Field, D.J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images.
Nature 1996, 381, 607–609. [CrossRef]
63. Hammouche, R.; Attia, A.; Akhrouf, S.; Akhtar, Z. Gabor filter bank with deep autoencoder based face recognition system. Expert
Syst. Appl. 2022, 197, 116743. [CrossRef]
64. Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA, 8–14 December 2001; Volume 1,
pp. 1–8. [CrossRef]
65. Ruppert, D. The elements of statistical learning: Data mining, inference, and prediction. J. Am. Stat. Assoc. 2004, 99, 567.
[CrossRef]
66. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 886–893.
[CrossRef]
67. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in
Neural Information Processing Systems; Pereira, F., Burges, C., Bottou, L., Weinberger, K., Eds.; Curran Associates, Inc.: Nice, France,
2012; Volume 25.
68. Levi, G.; Hassner, T. Age and gender classification using convolutional neural networks. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015; pp. 34–42.
69. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556.
70. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
71. Srivastava, R.K.; Greff, K.; Schmidhuber, J. Highway networks. arXiv 2015, arXiv:1505.00387.
72. Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823.
73. Xi, M.; Chen, L.; Polajnar, D.; Tong, W. Local binary pattern network: A deep learning approach for face recognition. In
Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016;
IEEE: New York, NY, USA, 2016; pp. 3224–3228.
74. Wu, X.; He, R.; Sun, Z.; Tan, T. A light CNN for deep face representation with noisy labels. IEEE Trans. Inf. Forensics Secur. 2018,
13, 2884–2896. [CrossRef]
75. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016.
76. Kumar, K.K.; Kasiviswanadham, Y.; Indira, D.; Priyanka palesetti, P.; Bhargavi, C. Criminal face identification system using
deep learning algorithm multi-task cascade neural network (MTCNN). Mater. Today Proc. 2023, 80, 2406–2410; SI:5 NANO 2021.
[CrossRef]
Information 2025, 16, 107 40 of 41
77. Ullah, N.; Javed, A.; Ali Ghazanfar, M.; Alsufyani, A.; Bourouis, S. A novel DeepMaskNet model for face mask detection and
masked facial recognition. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 9905–9914. [CrossRef]
78. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708.
79. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv
2019, arXiv:1801.04381.
80. Chen, S.; Liu, Y.; Gao, X.; Han, Z. Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. In
Proceedings of the Biometric Recognition: 13th Chinese Conference, CCBR 2018, Urumqi, China, 11–12 August 2018; Proceedings
13; Springer: Berlin/Heidelberg, Germany, 2018; pp. 428–438.
81. Alexey, D.; Fischer, P.; Tobias, J.; Springenberg, M.R.; Brox, T. Discriminative unsupervised feature learning with exemplar
convolutional neural networks. IEEE TPAMI 2016, 38, 1734–1747.
82. Zhou, D.; Kang, B.; Jin, X.; Yang, L.; Lian, X.; Jiang, Z.; Hou, Q.; Feng, J. DeepViT: Towards Deeper Vision Transformer. arXiv
2021, arXiv:2103.11886.
83. Zhong, Y.; Deng, W. Face transformer for recognition. arXiv 2021, arXiv:2103.14803.
84. Sun, Z.; Tzimiropoulos, G. Part-based face recognition with vision transformers. arXiv 2022, arXiv:2212.00057.
85. Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. Deepface: Closing the gap to human-level performance in face verification. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014;
pp. 1701–1708.
86. Zhao, H.; Jia, J.; Koltun, V. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10076–10085.
87. Zhu, B.; Li, L.; Hu, X.; Wu, F.; Zhang, Z.; Zhu, S.; Wang, Y.; Wu, J.; Song, J.; Li, F.; et al. DEFOG: Deep Learning with Attention
Mechanism Enabled Cross-Age Face Recognition. Tsinghua Sci. Technol. 2024, 30, 1342–1358. [CrossRef]
88. Voita, E.; Talbot, D.; Moiseev, F.; Sennrich, R.; Titov, I. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
Lifting, the Rest Can Be Pruned. arXiv 2019, arXiv:1905.09418.
89. Lin, K.; Li, L.; Lin, C.C.; Ahmed, F.; Gan, Z.; Liu, Z.; Lu, Y.; Wang, L. Swinbert: End-to-end transformers with sparse attention for
video captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA,
USA, 18–24 June 2022; pp. 17949–17958.
90. Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation
through attention. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021;
pp. 10347–10357.
91. Sharma, S.; Shanmugasundaram, K.; Ramasamy, S.K. FAREC—CNN based efficient face recognition technique using Dlib.
In Proceedings of the 2016 International Conference on Advanced Communication Control and Computing Technologies
(ICACCCT), Ramanathapuram, India, 25–27 May 2016; pp. 192–195. [CrossRef]
92. Arsenovic, M.; Sladojevic, S.; Anderla, A.; Stefanovic, D. FaceTime—Deep learning based face recognition attendance system.
In Proceedings of the 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), Subotica, Serbia,
14–16 September 2017; pp. 53–58. [CrossRef]
93. Al-Azzawi, A.; Hind, J.; Cheng, J. Localized Deep-CNN Structure for Face Recognition. In Proceedings of the 2018 11th
International Conference on Developments in eSystems Engineering (DeSE), Cambridge, UK, 2–5 September 2018; pp. 52–57.
[CrossRef]
94. Lu, Z.; Jiang, X.; Kot, A. Deep coupled resnet for low-resolution face recognition. IEEE Signal Process. Lett. 2018, 25, 526–530.
[CrossRef]
95. Qu, X.; Wei, T.; Peng, C.; Du, P. A Fast Face Recognition System Based on Deep Learning. In Proceedings of the 2018 11th
International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 8–9 December 2018; Volume 1,
pp. 289–292. [CrossRef]
96. Talab, M.A.; Awang, S.; Najim, S.A.d.M. Super-Low Resolution Face Recognition using Integrated Efficient Sub-Pixel Convo-
lutional Neural Network (ESPCN) and Convolutional Neural Network (CNN). In Proceedings of the 2019 IEEE International
Conference on Automatic Control and Intelligent Systems (I2CACIS), Selangor, Malaysia, 29 June 2019; pp. 331–335. [CrossRef]
97. Feng, Y.; Pang, T.; Li, M.; Guan, Y. Small sample face recognition based on ensemble deep learning. In Proceedings of the 2020
Chinese Control and Decision Conference (CCDC), Hefei, China, 22–24 August 2020; pp. 4402–4406. [CrossRef]
98. Lin, M.; Zhang, Z.; Zheng, W. A Small Sample Face Recognition Method Based on Deep Learning. In Proceedings of the 2020
IEEE 20th International Conference on Communication Technology (ICCT), Nanning, China, 28–31 October 2020; pp. 1394–1398.
[CrossRef]
99. Yudita, S.I.; Mantoro, T.; Ayu, M.A. Deep Face Recognition for Imperfect Human Face Images on Social Media using the CNN
Method. In Proceedings of the 2021 4th International Conference of Computer and Informatics Engineering (IC2IE), Depok,
Indonesia, 14–15 September 2021; pp. 412–417. [CrossRef]
Information 2025, 16, 107 41 of 41
100. Szmurlo, R.; Osowski, S. Deep CNN ensemble for recognition of face images. In Proceedings of the 2021 22nd International
Conference on Computational Problems of Electrical Engineering (CPEE), Hradek u Susice, Czech Republic, 15–17 September
2021; pp. 1–4. [CrossRef]
101. Wu, C.; Zhang, Y. MTCNN and FACENET based access control system for face detection and recognition. Autom. Control.
Comput. Sci. 2021, 55, 102–112.
102. Sanchez-Moreno, A.S.; Olivares-Mercado, J.; Hernandez-Suarez, A.; Toscano-Medina, K.; Sanchez-Perez, G.; Benitez-Garcia, G.
Efficient face recognition system for operating in unconstrained environments. J. Imaging 2021, 7, 161. [CrossRef] [PubMed]
103. Malakar, S.; Chiracharit, W.; Chamnongthai, K.; Charoenpong, T. Masked Face Recognition Using Principal component analysis
and Deep learning. In Proceedings of the 2021 18th International Conference on Electrical Engineering/Electronics, Computer,
Telecommunications and Information Technology (ECTI-CON), Online, 19–22 May 2021; pp. 785–788. [CrossRef]
104. Marjan, M.A.; Hasan, M.; Islam, M.Z.; Uddin, M.P.; Afjal, M.I. Masked Face Recognition System using Extended VGG-19. In
Proceedings of the 2022 4th International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE),
Rajshahi, Bangladesh, 29–31 December 2022; pp. 1–4. [CrossRef]
105. Li, Y.; Liu, S.; Yang, J.; Yang, M.H. Generative face completion. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3911–3919.
106. Pann, V.; Lee, H.J. Effective attention-based mechanism for masked face recognition. Appl. Sci. 2022, 12, 5590. [CrossRef]
107. Yuan, L.; Li, F. Face recognition with occlusion via support vector discrimination dictionary and occlusion dictionary based
sparse representation classification. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association
of Automation (YAC), Wuhan, China, 11–13 November 2016; IEEE: New York, NY, USA, 2016; pp. 110–115.
108. Deng, W.; Hu, J.; Guo, J. Extended SRC: Undersampled face recognition via intraclass variant dictionary. IEEE Trans. Pattern Anal.
Mach. Intell. 2012, 34, 1864–1870. [CrossRef]
109. Alzu’bi, A.; Albalas, F.; Al-Hadhrami, T.; Younis, L.B.; Bashayreh, A. Masked face recognition using deep learning: A review.
Electronics 2021, 10, 2666. [CrossRef]
110. Li, S.; Lee, H.J. Effective Attention-Based Feature Decomposition for Cross-Age Face Recognition. Appl. Sci. 2022, 12, 4816.
[CrossRef]
111. Boutros, F.; Damer, N.; Kirchbuchner, F.; Kuijper, A. Self-restrained triplet loss for accurate masked face recognition. Pattern
Recognit. 2022, 124, 108473. [CrossRef]
112. Deng, H.; Feng, Z.; Qian, G.; Lv, X.; Li, H.; Li, G. MFCosface: A masked-face recognition algorithm based on large margin cosine
loss. Appl. Sci. 2021, 11, 7310. [CrossRef]
113. Wu, G. Masked face recognition algorithm for a contactless distribution cabinet. Math. Probl. Eng. 2021, 2021, 5591020. [CrossRef]
114. Yanhun, Z.; Chongqing, L. Face recognition based on support vector machine and nearest neighbor classifier. J. Syst. Eng. Electron.
2003, 14, 73–76.
115. Kepenekci, B. Face Recognition Using Gabor Wavelet Transform. Master’s Thesis, Middle East Technical University, Ankara,
Turkey, 2001.
116. Lou, G.; Shi, H. Face image recognition based on convolutional neural network. China Commun. 2020, 17, 117–124. [CrossRef]
117. Mahesh, S.; Ramkumar, G. Smart Face Detection and Recognition in Illumination Invariant Images using AlexNet CNN Compare
Accuracy with SVM. In Proceedings of the 2022 3rd International Conference on Intelligent Engineering and Management
(ICIEM), London, UK, 27–29 April 2022; pp. 572–575. [CrossRef]
118. Garvie, C.; Bedoya, A.; Frankle, J. Unregulated Police Face Recognition in America. Perpetual Line Up. 2016. Available online:
https://www.perpetuallineup.org/ (accessed on 17 March 2024).
119. Korshunov, P.; Marcel, S. DeepFakes: A New Threat to Face Recognition? Assessment and Detection. arXiv 2018, arXiv:1812.08685.
120. Sikhakhane, N. Joburg hostels and Townships coming under surveillance by facial recognition cameras. Drones 2023 . Available
online: https://www.dailymaverick.co.za/article/2023-08-13-joburg-hostels-and-townships-coming-under-surveillance-by-
facial-recognition-cameras-and-drones/ (accessed on 12 April 2024).
121. Masud, M.; Muhammad, G.; Alhumyani, H.; Alshamrani, S.S.; Cheikhrouhou, O.; Ibrahim, S.; Hossain, M.S. Deep learning-based
intelligent face recognition in IoT-cloud environment. Comput. Commun. 2020, 152, 215–222. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.