0% found this document useful (0 votes)
19 views35 pages

3D Face Reconstruction: The Road To Forensics

Uploaded by

jelen.lopez20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views35 pages

3D Face Reconstruction: The Road To Forensics

Uploaded by

jelen.lopez20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

3D Face Reconstruction: the Road to Forensics

SIMONE MAURIZIO LA CAVA, University of Cagliari, Italy


GIULIA ORRÙ, University of Cagliari, Italy
MARTIN DRAHANSKY, Touchless Biometric Systems (TBS) Holding AG, Switzerland
GIAN LUCA MARCIALIS, University of Cagliari, Italy
FABIO ROLI, University of Genova, Italy
3D face reconstruction algorithms from images and videos are applied to many fields, from plastic surgery to the entertainment sector,
arXiv:2309.11357v1 [cs.CV] 20 Sep 2023

thanks to their advantageous features. However, when looking at forensic applications, 3D face reconstruction must observe strict
requirements that still make its possible role in bringing evidence to a lawsuit unclear. An extensive investigation of the constraints,
potential, and limits of its application in forensics is still missing. Shedding some light on this matter is the goal of the present survey,
which starts by clarifying the relation between forensic applications and biometrics, with a focus on face recognition. Therefore, it
provides an analysis of the achievements of 3D face reconstruction algorithms from surveillance videos and mugshot images and
discusses the current obstacles that separate 3D face reconstruction from an active role in forensic applications. Finally, it examines the
underlying data sets, with their advantages and limitations, while proposing alternatives that could substitute or complement them.

CCS Concepts: • Applied computing → Investigation techniques; • Social and professional topics → Surveillance; • Comput-
ing methodologies → Biometrics; 3D imaging; Reconstruction; Shape inference; Shape representations; Appearance and texture
representations.

Additional Key Words and Phrases: 3D face reconstruction, forensics, recognition

ACM Reference Format:


Simone Maurizio La Cava, Giulia Orrù, Martin Drahansky, Gian Luca Marcialis, and Fabio Roli. 2023. 3D Face Reconstruction: the
Road to Forensics. J. ACM 37, 4, Article 111 (August 2023), 35 pages. https://doi.org/XXXXXXX.XXXXXXX

1 INTRODUCTION
In the last few decades, much attention has been paid to the use of 3D data in facial image processing applications. This
technology has shown to be promising for robust facial feature extraction [10, 51, 189]. In uncontrolled environments,
it limits the effects of adverse factors such as unfavourable illumination conditions and the non-frontal poses of the
face with respect to the camera [51, 148, 176].
Among the various scenarios, developing personal recognition based on 3D data appears to be a "hot topic" due to
the accuracy and efficiency obtainable from comparing faces, thanks to the complementary information of shape and
texture [12, 16, 97]. However, acquiring such data requires expensive hardware; moreover, the enrolment process is
much more complex [143, 148, 184, 219, 225]. Thus, face recognition technology was mainly developed in the 2D domain.
Authors’ addresses: Simone Maurizio La Cava, simonem.lac@unica.it, University of Cagliari, Cagliari, Italy; Giulia Orrù, giulia.orru@unica.it, University
of Cagliari, Cagliari, Italy; Martin Drahansky, martin.drahansky@tbs-biometrics.com, Touchless Biometric Systems (TBS) Holding AG, Switzerland; Gian
Luca Marcialis, marcialis@unica.it, University of Cagliari, Cagliari, Italy; Fabio Roli, fabio.roli@unige.it, University of Genova, Genova, Italy.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2023 Association for Computing Machinery.
Manuscript submitted to ACM

Manuscript submitted to ACM 1


2 La Cava et al.

Fig. 1. Example of reconstruction of a 3D facial model from a single input 2D image (obtained through the framework proposed by [104]).

The acquisition of 2D images is more straightforward than that of 3D ones, as it does not require specific hardware, but
often makes the recognition task challenging due to the significant variability in facial appearance [35, 148]. 3D face
reconstruction (3DFR) from 2D images and videos may overcome these limits, combining the ease of acquiring 2D data
with the robustness of 3D ones (Fig. 1).
One of the possible fields that could benefit from these advantageous characteristics is that of forensics, which often
deals with probe images of unidentified people’s faces in non-frontal view, in uncontrolled environments, and in an
uncooperative way, such as in the case of the ones captured by CCTV (Closed-Circuit Television) cameras. Despite
some frameworks for the acquisition of 3D face models of suspects have been proposed (e.g., [126]), in such context, it
is still common to have 2D mugshots, that is, frontal and, usually, profile images of subjects routinely captured by law
enforcement agencies [131] for the recognition of people of interest, such as suspects or witnesses (Fig. 2).

Fig. 2. Example of forensic facial recognition from a mugshot reference gallery and a probe image (images from the SCface dataset [80]).

Unfortunately, a reference gallery composed of frontal and profile images is not able to provide effective coverage of
all possible conditions, such as in the case of a probe image in an arbitrary pose which is not at the same view angle
as in one of the available mugshot images [230]. Therefore, from the first attempt at face recognition from mugshots
[210], 3D reconstruction techniques were exploited too for facing some of the issues which are typical of the considered
forensic cases, trying to establish the identity of unknown individuals against a reference data set of known individuals,
either in verification mode (1 to 1) or identification mode (1 to N). Hence, the research community proposed to employ
this approach in facial recognition from probe videos and images acquired in an unconstrained environment to provide
more information about the individual faces through the generation of multiple views or the "correction" of the pose in
probe data. This makes the comparison with reference data more robust to various appearance variations typical of
forensic cases.
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 3

In particular, to be suitable for real-world forensic applications, any system of this kind should satisfy strict constraints
leading to the legal validity of the conclusions during a lawsuit or in the investigation phase [27, 110]. For this reason,
it is necessary to analyze the methods which employ 3DFR to shed some light on their admissibility in the forensic
scenario. Although other authors investigated the state of the art of 3DFR from 2D images or videos [61, 73, 148, 234]
and its applications to face recognition [61, 148, 156], none of them considered the requirements they have to satisfy
to be potentially employed in such context and how forensics can benefit from their adoption. Moreover, the validity
of the proposed face recognition systems in the considered application scenarios strongly depends on the data sets
on which they have been evaluated since these provide a basis for measuring and comparing their performance with
state of the art. In other words, data representativeness is fundamental, and the algorithms’ adoption is bounded by the
available data [40, 174].
A specific investigation highlighting the potential and limits of 3D facial reconstruction in forensics is still missing,
and, in our opinion, it would be necessary to direct research toward its real-world application. In order to pursue this
goal, this work analyzes the potentiality of the employment of 3D face reconstruction in forensics and the approaches
proposed by the research community for its integration in a common face recognition casework while considering
the core challenges of legal admissibility of automated systems including it. The central premise of this work is to
shed some light on the requirements that should be satisfied to fill the gap between biometric recognition and forensic
comparison when reconstructing a facial image into 3D space for the recognition of an individual from 2D videos or
images. The investigation of the potential benefit of this technique to forensics is the aim of our work.
This paper is the follow-up of Ref. [123], which is a first step toward the objectives listed above. To our knowledge, it
represents the first investigation focused on state of the art in applications and potentialities of 3D face reconstruction
in forensics and the novelties introduced to date (Fig. 3), as well as the requirements that any of the related systems
must satisfy to be considered admissible in criminal investigations or judicial cases. With respect to [123], this paper
extends such disquisition, especially in relation to the comparison among the proposed methods and the admissibility
constraints which have to be satisfied in order to be effectively integrated into the reference scenario. Moreover, this
survey also provides an analysis of the data sets employed in the reviewed studies, which could further highlight their
strengths and limits, suggesting their uses in the design and evaluation of forensic facial recognition algorithms and the
potential issues. Finally, some state-of-the-art data sets which could be alternative or complementary to those already
used are proposed and analyzed as well to provide suitable ground truth for future studies, with the main focus on the
types of data so far considered, namely facial images, videos and 3D scans of the face.
The paper’s structure is as follows. Section 2 analyzes the relationship between forensics and biometrics, mainly
focusing on facial traits and the integration of 3DFR. The state-of-the-art assessment of 3DFR methods for face
recognition from mugshot images is reported in Section 3. A review of other proposed forensic-related applications of
3DFR from facial images and videos is carried out in Section 4. Section 5 explores the underlying data sets of facial
images, videos, and 3D scans, proposing others which could be suitable as well for future research on the analyzed
topic. Finally, Section 6 discusses how all the aspects above converge in a unified view.

2 FACE RECOGNITION AND FORENSICS


The face represents a valuable clue in many criminal investigations due to its advantageous characteristics with respect
to other biometrics [109, 164] and the growing number of surveillance cameras in both private and public places
[52, 102, 140]. Over the years, various methods have been proposed to check whether the individual’s identity in a
probe image or video matches that of a person of interest, namely an individual related to the event under investigation,
Manuscript submitted to ACM
4 La Cava et al.

Fig. 3. Milestones of forensic identification based on 3D face reconstruction.

such as a suspect, a victim, or a witness. In particular, these represent a subset of the approaches widely explored in
traditional biometric recognition and implemented in the related automated face recognition systems [109, 120, 185].
These methods can be summarized into various qualitative or quantitative examination approaches, which can be
employed or are preferred under different conditions [60, 62].
A first approach processes the face globally in a holistic form. However, it is only recommended where other more
effective are not suitable and is highly inaccurate when faces belong to unfamiliar people, in the case of partially
occluded faces [32, 62, 64, 216, 226] or severely distorted CCTV footage [34].
A second approach is based on a set of facial fiducial points named landmarks [28, 49] and employed to derive
the distances and proportions between facial features. This choice is not generally recommended as well due to the
subjectivity in their manual estimation in uncontrolled images due to adverse factors such as the large pose of the head,
the distance from the camera, facial expressions, and lighting conditions [62, 118, 150, 151, 208]. Some of these issues
could be mitigated by means of preprocessing techniques (e.g., super-resolution methods [101]).
A third approach is that of superimposition. It allows handling the discrepancies arising from differences in the
position of the face with respect to the camera in two different aligned images or videos. To achieve this goal, it
combines them through various methods, such as a reduced opacity overlay or blinking quickly between them. This
approach is unreliable when comparing data acquired in uncontrolled scenarios, even in previous judicial cases
[5, 24, 62, 137, 150, 192, 193, 226].
A fourth approach is that of morphological comparison, in which a generally predefined list of facial regions and
features extracted from them related to shape, appearance, presence and/or location, such as the relative width of the
mouth with respect to the distance between the eyes and the asymmetry of the mouth [83], are compared to determine
differences and similarities between the probe and reference data [226].
In particular, the latter approach is able to improve the identification accuracy by examiners, even thanks to the
higher physical stability over time with respect to many of both photoanthropometry and holistic features [86, 151].
However, the stability of the evaluated features could also be affected by extrinsic factors, such as lighting and the
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 5

position of the subject’s face with respect to the camera, which can introduce different levels of variability, contributing
to the unreliability of certain features [119, 151, 224].
Despite the differences in reliability and acceptance, these approaches are not alternatives to each other. The choice
among them is generally dependent on the probe image or videos, and they can even be used jointly in the identification
task to carry out a more exhaustive analysis [15, 108, 151]. Furthermore, even if these approaches could not be used as
evidence in a confirmatory identification due to the acquisition condition of the probe image or video, these could still
be employed in an attempt to exclude possible suspects or be a limited - but not worthless - support for reaching a
conclusion through other evidence [84, 119, 137, 151].
Although both biometric recognition and forensic identification seek to link evidence to a particular individual [112],
research in these fields has been pursued independently for many years due to their different goals and requirements,
as well as the difficulties in achieving significant scientific contributions in this cross-domain research field [123]. Thus,
despite the employment of approaches which are common between them, the underlying methods and the automated
systems integrating them must satisfy strict constraints in order to be considered suitable for forensic casework.

2.1 Automated Forensic Facial Recognition: The Italian Case


Due to the stringent requirements of the analyzed field, automatic recognition systems are only recently being introduced.
For example, in 2017, the Italian police bodies introduced the ordinary use of an automatic image recognition system,
S.A.R.I. (from the Italian "Sistema Automatico di Riconoscimento delle Immagini"), as an innovative tool aimed at
supporting investigative activities [17, 173]. This system allows automatically comparing a facial probe image with
millions of mugshots to reduce the number of candidates, which are then ordered by the similarity degree. Furthermore,
the system is also able to work in real-time on a gallery on the order of hundreds of thousands of individuals to enforce
security and control on the territory. The SARI’s outcome is a set of potential candidates that must be examined by the
specialized experts of the scientific police in charge of verifying the process [22, 173]. Due to the stringent requirements
of the analyzed field, automatic recognition systems are only recently being exploited. Despite the effectiveness and
the extreme speed of this automatic system, it cannot yet be used in the criminal field, as it does not allow access and
repetition of recognition by the defense, thus precluding cross-examination of the specific functioning of the software
in question [38, 70, 173, 179]. Moreover, its functioning lacks the transparency required for any criminal case, thus
precluding its compatibility with the constitutional procedural guarantees granted to the suspect [173, 179].
If this is the state of things in face recognition, what about 3D face reconstruction? 3D reconstruction is already
employed for enhancing the views of crime scenes (e.g., [142]), as computer-generated evidence [22]. Thanks to the
3D representation of the scene, obtained by one or more reference photographs, it is possible to recreate the aimed
scenario, for example, by inserting moving objects and simulating people’s behaviour while respecting the physical laws.
However, depending on the task for which it has to be employed, this technology could be considered not admissible
due to the still experimental nature of the underlying method [22]. Furthermore, the accuracy of the reconstruction of
the human body is low, and the face is strongly influenced by the definition of the reference images as well as by the
subjectivity of the operator in positioning the characterizing points for the reconstruction [141].
Therefore, a fully automated 3D reconstruction such as the one integrated into biometric systems could reduce the
errors caused by the operator, standardize the process, and speed up the analysis, provided that sufficient quality of
the resulting 3D model can be guaranteed. These advantages led the research community to propose methods and
approaches strictly focused on reconstructing the body or even single parts, such as the face. In particular, the 3D
reconstruction of the face could be crucial for some forensic recognition tasks, strongly enhancing the recognition
Manuscript submitted to ACM
6 La Cava et al.

accuracy with respect to the recognition from raw images, especially on faces represented in non-frontal poses. In
particular, this technology could even be integrated into the previously cited scene reconstruction technology to enforce
the reliability of the related computer-generated evidence and make it employable for real recognition tasks.
These factors could be crucial in the introduction of this technology in the forensic recognition task. However, it
must comply with the technical and admissibility requirements, summarized in Figure 4 and discussed in the following
subsections, which any system must satisfy to be considered suitable to be employed in such a field.

Fig. 4. Forensic admissibility evaluation of an automatic biometric system in a casework.

2.2 Biometric Systems and Forensic Admissibility


Techniques and systems designed for biometrics, especially the automated ones, are appealing for their potential to
address some forensic domain’s problems concerning crime prevention, crime investigation, and judicial trials in a more
efficient, "scientifically objective", and standardized way [15, 112, 149, 162, 176, 190, 220]. In the case of face recognition,
the related recognition technology has a role in many forensic and security applications, such as in identifying people
of interest (e.g., terrorists) and searching for missing people, even in real-time [71, 75, 99]. In particular, concepts behind
biometric facial recognition could be beneficial in various tasks underlying forensic applications. For example, person
re-identification and face identification could aid the search task of forensic practitioners, thus the collection of evidence
from crime scene images acquired from surveillance cameras [197], and the investigation, thus linking traces between
crime scenes by generating and testing likely explanations [197]. Face recognition could as well represent an aid in the
individualization (or forensic evaluation) step, in which the evidential value is computed and assigned to the collected
traces [197], with noticeable parallelism with the similarity scores assigned by most automated face recognition systems
in biometric recognition tasks.
However, despite several groups, such as the FISWG (Facial Identification Scientific Working Group) [9] and the
ENFSI (European Network of Forensic Science Institutes) [4], which are currently working in this direction, there is no
standardized and validated method in forensics [15, 146, 149]. For example, in the United States, the admissibility of
scientific evidence obtained through face recognition is generally evaluated through two guidelines:
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 7

• the “Frye’s rule” gives the judges the task of assessing whether the technique or technology is accepted in a
relevant scientific community [1];
• the “Daubert’s rule” adds to the previous one the constraints that it has been tested, the description of its error
rate is available, and it must be maintained and adhere to standards [2, 6, 7, 15, 63].
In many other judicial systems beyond the U.S.A., no specific admissibility rule regarding the evaluation of the
scientific evidence is given, such as the case of the European judicial system, where the judges are generally responsible
for its assessment in single cases [15]. Another issue is the general acceptance of the biometric itself, especially
the face, to the point that some governments banned or limited its usage even in law enforcement agencies (e.g.,
[48, 76, 136, 145, 218]). The concern is particularly related to positive identification due to the huge consequences of a
false match in forensic cases combined with previous failures of face recognition systems in that direction [31, 100].
Therefore, a robust and transparent methodology must be given for forensic recognition, which effectiveness has
to be quantitatively assessable in statistical and probabilistic terms. The goal is to provide guidelines for quantifying
biometric evidence value and its strength based on assumptions, operating conditions, and the casework’s implicit
uncertainty [72, 136, 197]. Besides, a set of interpretation methods must be defined independently of the baseline
biometric system and integrated into the considered algorithm [153, 197]. This allows reaching conclusions in court
trials in agreement with three constraints (Fig. 5): performance evaluation, understandability, and forensic evaluation
[27, 54, 110]. Closely related to these constraints, the quality of the probe and reference data should also be considered
in the admissibility assessment [27, 176, 220].

Fig. 5. Taxonomy of forensic recognition methods based on 3DFR with respect to the evaluation levels for forensic purposes.

2.2.1 Performance Evaluation. Performance evaluation concerns the basic trust level of the system and its performance
for a specific purpose; therefore, it supports the forensic practitioner’s decision when using such a system to perform
a given task. For instance, a biometric system could be considered suitable for a specific task whenever it is tested
and achieves a performance acknowledged as “good” on data representative of the working system’s context. E.g., a
face recognition system designed to perform well on high-resolution frontal images is not required to achieve the
same performance on images acquired by CCTV cameras and random head poses [27]. In a statistical evaluation, the
definition of “good performance” depends on the context, the data and the end users’ requirements set in the design
process. The performance parameters are different according to the system itself and the specific task for which it
should be employed. For example, the accuracy, namely the percentage of correctly classified samples [113], could
Manuscript submitted to ACM
8 La Cava et al.

be considered in evaluating the performance of classification problems, such as in the case of the face recognition
task. Distance-based metrics can be instead used for evaluating the error between the predicted values and the real
ones in regression problems such as the 3D reconstruction tasks. An example of the latter is the Root Mean Square
Error (RMSE), which considers the distance between a reconstructed facial part and the corresponding ground truth
in terms of pixels (e.g., [229]). As previously mentioned, understanding the metrics employed requires basic statistics
knowledge, which legal decision-makers often do not have. This makes it difficult to justify the use of a particular
system by such metrics in a law court [27]. Thus, a certain level of confidence in the underlying technical aspects is
necessary to interpret the performance parameters adopted.
Another issue for the trust of biometric recognition systems in forensics is that of biased performance against certain
demographic groups, meaning that the performance parameters may depend, on average, on the demographic groups
present in the system’s data set [110, 212]. For example, biased performance on age, gender, and ethnic groups was
recently reported [33, 110, 166]. In face recognition, bias is a severe problem since facial regions contain rich information
strictly correlated to many demographic attributes, which could lead to biased performance [117]. This issue has often
been overlooked when face recognition systems were employed by law enforcement agencies [82]. Thus, the missed
analysis of this aspect, or on the demographic group representative of the casework, could lead to the inadmissibility of
the biometric system in judicial trials or, simply, to unreliable support to the human expert decision. In other words, the
choice of the data sets employed for training the system and evaluating it is one of the factors that must be considered
in the performance evaluation [108]. Furthermore, fairness, interpretability, and even performance could benefit from
the ability of a system to provide information about how biased its decision could be [72, 157] (see also 2.2.2).

2.2.2 Understandability. Understandability (also known as interpretability [27]) is the ability of a human to understand
the functioning of a system, its purpose, its features, as well as its outcome and the (computational) steps that led to
such a result. In particular, the understandability evaluation supports the decision of whether the outcome of the system
is suitable. This is particularly relevant for legal decision-makers (e.g., judges) who are typically not experts in those
topics [11, 19, 27, 88, 110, 125].
A first step for making a system understandable is to design it as “explainable” in the decision-making process. This
facilitates its traceability, which, in turn, could help prevent or deal with erroneous decisions by revealing the possible
points of failure, the most appropriate data and architecture [11, 47, 72]. The main difference between understandability
and explainability is that the latter focuses on the system’s design [27], whilst the former focuses on the end-user
experience. Therefore, the system’s understandability requires an explainable design process.
A factor that can improve the system’s understandability is its transparency, meaning the ability of the forensic
practitioner to have access to the details related to the functioning of such a system [27]. For example, a fully open-source
system is entirely transparent. However, even a fully transparent system does not imply its understandability, as in the
case of image processing algorithms whose effects cannot be reversed. In other words, they cause a loss of details or an
irreversible/random addition which could even impact the reproducibility required by any automated system to be
employable by forensic practitioners for reaching conclusions [139]. Moreover, even details about the algorithms and
the implementation of very complex systems like neural networks could be insufficient for their understanding [27].
Therefore, for both complex and black-box systems, such as those based on Artificial Intelligence, it should be
necessary to add sufficient local and/or global interpretations through metrics and mechanisms [11, 27, 72, 87, 125, 127].
For example, the forensic practitioner must be able to determine whether the system is using the face area instead of
the background when computing the related outcome. Moreover, understandability is an aid for legal decision-makers
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 9

in cases where both the prosecution and the defence of the suspect present contradictory results based on their own
black-box systems [110]. Some examples of approaches for enhancing the explainability and, in particular, spatial
understandability in the context of face recognition are the extraction of features in different areas of the face [222] and
the use of model-agnostic methods (i.e., not tied to a particular type of system [11, 87]) that visualizes the salient areas
that contribute to the similarity between pairs of faces [194]. Other approaches are the estimation of the uncertainty
of features through the analysis of the distributional representation in the feature space of each input facial image,
therefore assessing the uncertainty through the variance of such distributions [186], and the analysis of the effect of
features in the resulting outcomes such as facial angle and non-facial elements [55, 196]. However, black-box systems
such as deep neural networks still lack the reasonable interpretability to be effectively employed in forensic processing.
In particular, understanding what information is being encoded from the input image into deep face representations
would also help address eventual biases of the system (e.g., toward a demographic group) [110, 156].

2.2.3 Forensic Evaluation. Forensic evaluation is the assignment of a relative plausibility of information over a set of
competing hypotheses (or "propositions") [27]. It supports the forensic practitioner’s opinion regarding the level of
confidence and the weight (i.e., the strength) of evidence when the system makes a decision according to its outcome
[27, 108, 125]. The system’s performance and understandability are taken into account in forensic evaluation, together
with contextual information (e.g., additional cues or supporting evidence from other sources) and general knowledge;
thus, additional information which could be either included in the decision process or formalized into the automated
system itself [27, 58, 111, 171]. Therefore, forensic evaluation includes the above elements to drive forensic practitioners
toward an appropriate decision (e.g., identification) that could be either conclusive or inconclusive according to the
assessed level of confidence [27, 46, 56–58].
From a technical perspective, forensic evaluation is quantitatively given by a statistical approach based on the
likelihood ratio values (LR) [146, 167, 176, 197, 209, 217]. In particular, it is acknowledged that the LR allows for a
transparent, testable, and quantitative assessment of the probability assigned to the evidence of a face match by forensic
practitioners, based on personal experience, experiments, and academic research, against the probability of a non-match
[27, 171]. A semi-quantitative scale could also be employed, in which values are aligned with ranges of likelihood
ratios (e.g., weak/medium/strong), or employ the relative strength of forensic observations in light of each proposition
[27, 39, 42]. Therefore, thanks to its transparency, testability, and formal correctness, LR allows the clear separation
of responsibilities between the forensic examiner and the court. This makes it compliant with the requirements of
evidence-based forensic science when quantifying the value of the evidence to the law court [168, 175]. However, it
must be remarked that calibrating a biometric score to become an LR requires a substantial amount of case-relevant
data, thus data representative of the analyzed scenario regarding quality (see 2.2.4) and demographic group (see 2.2.1).

2.2.4 Quality Evaluation. As previously pointed out, the characteristics of acquired data are also relevant. Firstly, they
should meet minimal requirements in terms of quality [176, 220]. Although not defined in a rigorous way, this term refers
to factors that lead to blurriness, distortion, and artifacts in images. They may be caused by (1) the camera employed,
whose sensor, optic, and analog-to-digital converter impact on the image resolution, the dynamic of gray-levels, its
ability to focus on the target [85, 108, 128, 137, 170, 187], (2) environmental conditions such as the illumination and
the background of the scene, the same weather conditions (rainy/cloudy) [105, 128, 170], (3) the subject’s distance
from the camera that adds scaling and out-of-focus problems, his/her camouflage to evade recognition (sunglasses,
beard/moustache, hat/cap, makeup, jewelry), the speed at which the subject is moving and the direction, the position of
the face with respect to the camera which can lead to non-frontal views and incomplete data [85, 102, 124, 137, 152, 213],
Manuscript submitted to ACM
10 La Cava et al.

(4) the image processing embedded into the camera or next to the raw data acquisition, such as compression and
re-sizing [85, 108, 128, 170]. Therefore, quality must be evaluated for both probes and reference facial data in order to
assess whether the proposed face recognition system is compliant with data of the kind [10, 27, 105, 159]. Secondly, the
data amount is crucial from the viewpoint of a new classification system to be trained and fine-tuned [105, 220] and
the calibration of LR frameworks for the evaluation of those and already existing systems (see 2.2.3), yielding to the
creation of large-scale data sets for the evaluation of face recognition algorithms (e.g., [103]).
While the acquisition of mugshot images by law enforcement agencies is usually subject to strict control to ensure
the truthful representation of appearance, this is often not the case with the acquisition of probe images and videos.
Therefore, concerning the available data, it is necessary to assess the quality to determine whether it fulfils the aimed
biometric function, including the 3D reconstruction task and the following recognition. The final goal is the system’s
outcome employment in the forensic investigation and the following judicial conclusion. In the middle, the quality
evaluation would allow the assessment of the confidence level in decisions based on such data or to rank and select
the ones with the best quality (e.g., single frames from a surveillance video) [98, 181, 198, 200]. To the state of our
knowledge, a global standard for quality assessment is currently missing [98, 181], probably also due to the human
subjectivity factor in the task, and international standards are still under development (e.g., [106, 107]). However, a
score based on the Mean Opinion Scores (MOS) was proposed [182] to justify the legal acceptance or the rejection of a
potential probe image, video, or part of them. Unfortunately, the MOS method is often impractical since it is considered
slow, expensive, and, in general, inconvenient. Although other quality assessment methods have been proposed, most
of them are not representative of human perception [92, 214]. In our opinion, specific expertise in agreement with the
lawcourt process should be included (e.g., [182, 198]). Furthermore, quality measures about the "partial results" of the
system should be integrated as well. For example, in the case of forensic recognition based on 3D face reconstruction
from 2D images or videos, the 3D model reconstructed either from reference or probe data could be corrupted due
to inaccurate localization of facial landmarks [23], thus requiring the repetition of the localization process or even to
discard the sample because it results to be unfeasible. Therefore, the quality measures could be integrated into forensic
recognition, considering them as complementary features [74, 98, 163]. This means that quality assessment would
pass through the previously described requirements (Subsections 2.2.1-2.2.2) in order to be admissible in the analyzed
context [181] according to the forensic evaluation process (Subsection 2.2.3).

2.3 3D Face Reconstruction in Forensics


During the investigation phase, the subject’s identity is unknown, and the possible identities within a suspect reference
set need to be rendered and sorted [220] in terms of likelihood with respect to the evidence (e.g. a frame captured from
a CCTV camera) [14]. In addition to the classic challenges related to facial recognition in uncontrolled environments,
such as low resolution, large poses, and occlusions [89], forensic recognition faces even more challenges. Examples
are the acquisition systems which are set up cheaply and subjects that actively try not to be captured by cameras,
which enhance the previously cited issues and introduce novel problems such as heavy compression, distortions,
and aberrations [226]. Thanks to its greater representational power than 2D facial data, 3DFR can alleviate some of
these problems. In fact, 3D data provides a representation of the facial geometry that reduces the adverse impact of
non-optimal pose and illumination. Depending on the characteristics of the probe image and the reference set narrowed
down by police and forensic investigation, whenever the investigator is required to compare these images, and it is
necessary or advantageous to use an automatic face recognition system, 3DFR can be employed by following two
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 11

different approaches, namely view-based and model-based approaches (Fig. 6), to improve the performance of facial
recognition systems and, therefore, enhancing its admissibility in legal trials.

Fig. 6. Taxonomy of performance enhancement methods through 3DFR for forensic recognition.

In a view-based approach, the set of images containing frontal faces is adapted to non-frontal ones, and, thus, it
is typically applied on the reference set to adapt the faces within mugshots to the probe image such that it matches
the pose of the represented face [160] (Fig. 7). Although it allows comparing facial images under similar poses, this
approach requires a reference set containing images of suspects captured in such a pose or synthesizing such a view
through the 3D model of each suspect. In the latter case, each 3D model can be adapted after applying a pose estimation
algorithm on the probe image before employing the actual recognition system [59, 138, 228, 229]. Another proposed
strategy is to introduce a gallery enlargement phase instead, which consists of projecting the 3D model in various
predefined poses in the 2D domain to enhance the representation capability of each subject and then employing the
synthesized images in the recognition task [93, 131, 132, 230]. However, the view-based approach represents a suitable
choice whenever multi-view face images of suspects are captured during enrollment for the purpose of highly accurate
authentication, such as in the case of the verification task in face recognition [93], although it usually involves higher
computational cost in terms of both time and memory with respect to the model-based counterpart.
In a model-based approach, the adaptation phase is performed on non-frontal faces to synthesize a face in frontal
view through the reconstructed 3D face [93] (Fig. 8). The normalized (or “frontalized”) face is then compared to the
frontal faces within the gallery set to determine the subject’s identity in the probe image [69, 95]. This approach is
suitable for real-world scenarios in which it is necessary to seek the identity of an unknown person within a probe
image or video in a large-scale mugshot data set [93], as in the so-called face identification task in biometric recognition,
for maximizing the likelihood of returning the potential candidates. Despite the generally lower computational cost, this
approach is only applicable when it is possible to synthesize good-quality frontal view images with the original texture
since it could provide complementary information for recognition with respect to the shape [16, 134]. According to
what we discussed in 2.2.4, the minimum quality requirements for the probe images must be met, which is not often the
case in real forensic scenarios. Furthermore, it could be necessary to handle possible textural artifacts in the resulting
frontal image [36, 95, 233].
Manuscript submitted to ACM
12 La Cava et al.

Fig. 7. Example of face recognition through a view-based approach (the 3D model was obtained through [104]).

Fig. 8. Example of face recognition through a model-based approach (based on [95]).

Hence, the application of a view-based approach would allow changing the scenario from a more traditional 2D-to-2D
recognition to a 3D-to-2D recognition, in which the reconstructed 3D face representation is typically used to generate
synthetic facial views matched with the probe image [35]. This can be achieved by turning the 3D model in such a way
that the pose matches the one in the compared image and eventually after applying similar light conditions on the
model to ease the comparison (e.g., [211]). Similarly, a model-based approach could be exploited either for aiding the
2D-to-2D face recognition task, through the synthesis of non-frontal faces in the frontal view [93], as it is typically
the case of probe images, and the 2D-to-3D recognition scenario, where several synthetic views can provide a set
of potential probe images [206], in agreement with the reference ones. Coherently, these approaches would jointly
allow a 3D-to-3D recognition scenario: the 3D representation of the face reconstructed from the reference images is
compared with the one reconstructed from a probe video sequence [35]. The view-based approach typically involves the
reconstruction from mugshots and the model-based approach from probes images, mainly due to the typical qualitative
characteristics of data. Nonetheless, it is still possible to employ these approaches on both sets of data, according to
the specific task (e.g., it could be possible and convenient to apply a view-based approach on a surveillance video to
ease the comparison). However, the potential bias towards the average geometry must be taken into account when
reconstructing the 3D faces [205], especially when the reconstruction is performed from single images.
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 13

3 3D FACE RECONSTRUCTION FOR MUGSHOT-BASED RECOGNITION


Although many attempts have been performed in the past years to reconstruct faces in the 3D domain, either from a
single image or multiple images of the same subject [148], only a few were evaluated for their potential applications in
forensics. Among them, we want to focus on exploiting mugshot images captured by law enforcement agencies. The
reason is that methods inspired by this approach are closer than others to satisfying the previously seen criteria for
their potential admissibility in forensic cases.
To our knowledge, the earlier study on 3DFR from mugshot images for forensic recognition was proposed in 2008
by Zhang et al. [230], who employed a view-based gallery enlargement approach to recognize probe face images
in arbitrary view with the aid of a 3D face model for each subject reconstructed from mugshot images (Fig. 9). To
reconstruct the shape of such a model, they proposed a multilevel variation minimization approach that requires a set of
landmarks specified on a pair of frontal-side views to be used as constraining points (i.e., eyes, eyebrows, nose profiles,
lips, ears, and points interpolated between them [232]). Finally, they recovered the corresponding facial texture through
a photometric method. They evaluated their approach on the CMU PIE data set [188], using a holistic face comparator
(or matcher) [202] and a local one typically employed in biometrics for a textural classification [13], restricting the
rotation angles of the probe images to ±70°. This analysis revealed a significant improvement in average recognition
accuracy with respect to the original mugshot gallery, especially when the rotation angle of the face in the probe image
is larger than 30°. However, the limit of the rotation angle of faces in probe images and the use of traditional face
comparators rather than state-of-the-art ones do not allow for assessing the actual improvement in the effectiveness of
3DFR from mugshot images in terms of forensic recognition [93, 131]. Other drawbacks of the proposed method are
the possible artifacts caused by the assumed model [228] and the poorly explored image texture. Furthermore, they
performed the analysis on a small-scale data set containing only 68 subjects. Finally, despite improved performance and
the usage of a local face comparator that enhances understandability [222], expressing the similarity between the single
facial parts rather than providing a global similarity and allowing the assessment of the salient areas that led to the
outcome of the system, the authors did not utilize any strategy for facilitating the forensic evaluation. Moreover, the
analysis of local patterns could also help address the presence of occlusions. Another aspect that could be considered
is the computational time required for the gallery enlargement, which appears to make the method unsuitable for
applications having strict time constraints, even considering how old the hardware system on which it has been tested
is (Table 1). We further discussed this factor in Section 6.

Fig. 9. Representation of the gallery enlargement method (the 3D model was obtained through [104]).

Four years later, Han and Jain [93] proposed to employ the frontalization approach in the considered scenario, as it
had already shown its effectiveness in the biometric recognition from non-frontal faces [25]. They proposed a 3DFR
method from a pair of frontal-profile views based on a 3D Morphable Model (3DMM) [26], a generative model for
realistic face shape and appearance, to aid the reconstruction process. They reconstructed the 3D face shape through
the correspondence between landmarks within the frontal image and those on the profile one and extracted the texture
Manuscript submitted to ACM
14 La Cava et al.

by mapping the facial image to the 3D shape. A view-based gallery enlargement approach and model-based probe
frontalization approach (Fig. 10) were employed to enhance the performance through the proposed reconstruction
approach. They evaluated them on subsets of PCSO [3] and FERET [161] data sets through a local face comparator and
a commercial one, revealing an improved recognition accuracy in both cases. One of the most evident limits of the
reconstruction approach in a forensic context is that the involved 3DMM is a global statistical model which is limited in
recovering facial details [148], as it could be dominated by the mean 3D face model, which potentially introduces a
bias of the outcome towards the underlying model [206]. This aspect could be further enforced by the relatively low
quality of the employed images. Furthermore, the involved 3DMM could cause evident distortion when the model is
largely rotated [132, 229]. Other limits of this work are that the authors did not fully explore the texture and did not use
state-of-the-art face comparators [131, 228]. Therefore, as in the previous case, despite the improvement in performance
and the enhanced understandability thanks to local features, the authors did not employ any framework for easing the
forensic evaluation of their method. Finally, no information about the computational time was reported.

Fig. 10. Representation of the probe normalization (or "frontalization") method (based on [95]).

In the same year, Dutta et al. [59] proposed a method based on 3DFR for improving face recognition from non-frontal
view images through a view-based gallery adaptation approach (Fig. 11). They applied existing recognition systems to
the 16 common subjects in the CMU PIE [188] and Multi-PIE [81] data sets, containing frontal and surveillance images,
respectively. The adaptation of the reconstructed model to the pose estimated from a probe image could be particularly
advantageous whenever poor-quality probe data were acquired, while it is possible to obtain the 3D model from images
having a higher quality, such as in the case of mugshot images (Fig. 11). However, this approach requires an accurate
estimate of the pose of the face in the probe image. Furthermore, the small number of subjects involved in the study
should be enlarged to simulate a forensic case and evaluate the improvement entity for assessing their applicability in
real-case scenarios. Despite the advantages in some application contexts in terms of performance, the authors did not
take into account understandability or forensic evaluation. The required computational time was not assessed as well.
Similarly, Zeng et al. [228, 229] reconstructed 3D faces from 2D forensic mugshot images, employing frontal, left
profile, and right profile reference images, through multiple reference models to obtain more accurate outcomes
for enhancing recognition performance through a view-based gallery adaptation approach. To this aim, they used a
coarse-to-fine 3D shape reconstruction approach based on the three views through a photometric method and multiple
reference 3D face models. The use of multiple reference models is an attempt to limit the homogeneity of reconstructed
3D face shape models and increase the probability of finding the most similar candidate for the single parts of the
input face. The so-reconstructed 3D face shapes were then used in the recognition task to establish correspondence
between the local semantic patches around seven landmarks on the arbitrary view probe image and those on the gallery
of mugshot face images, assuming that patches will deform according to the head pose angles. The authors [228]
tested their approach on the CMU PIE [188] and Color FERET [161] data sets. They showed that deforming semantic
patches is effective [13] and compared the performance with a commercial face recognition system [154] and the
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 15

Fig. 11. Representation of the gallery adaptation method (the pose was estimated through [90, 91], and the 3D model was obtained
through [104]).

previously described method proposed by Zhang et al. [230]. The authors [229] also evaluated the enhancement using a
machine learning (ML) classifier on different poses within the Bosphorus [178] and Color FERET [161] data sets. As the
authors suggested, the improvement in recognition capability from arbitrary position face images is due to the greatest
robustness of semantic patches to pose variation and the higher inter-class variation introduced by the subject-specific
3D face model. A limitation of this work is the out-of-date involved face comparators [131]. Furthermore, although the
method employs multiple reference models, the outcome could still be biased toward them [206]. Finally, despite the
fact that the proposed method enhances the performance of an understandable recognition approach, thanks to the
employed local recognition approach, the authors did not perform any forensic evaluation. Moreover, despite assessing
the test time on a single probe image, the authors did not report the computational time required for the reconstruction
of the models in the reference gallery nor for the training of the recognition system (Table 1).
In 2018, Liang et al. [131] proposed an approach for arbitrary face recognition based on 3DFR from mugshot images
which fully explores image texture. The proposed shape reconstruction approach is based on cascaded linear regression
from 2D facial landmarks estimated in frontal and profile images. After reconstructing the 3D shape, they approached
the texture recovery through a coarse-to-fine approach. Therefore, they employed the proposed method in a recognition
task on a subset of images from each subject of the Multi-PIE data set [81] through a view-based gallery enlargement
approach on state-of-the-art comparators based on deep learning (DL). Furthermore, they compared the performance
before and after the gallery enlargement and by fine-tuning the comparators with the generated multi-view images.
The results highlighted improved recognition accuracy in large-pose images, especially with fine-tuned comparators. In
particular, this method provides better results than the one proposed by Han and Jain [93], probably because of the major
focus on reconstructing texture information [131]. Hence, the most significant novelties introduced by this work are the
textured full 3D faces reconstructed from the mugshot images and the analysis on DL-based comparators, inherently
more robust to pose variations than traditional ones [131]. Furthermore, they fine-tuned those comparators with the
enlarged gallery, revealing even better performance than the previous gallery enlargement approaches. The authors
also assessed the computational time required for the reconstruction of the 3D models, revealing a huge improvement
with respect to the previous study reporting it, still considering the different capabilities of the physical system on
which it has been tested (Table 1). Despite the reconstruction method appearing suitable for real-time applications [131],
the authors did not report the computational time required for training and testing the recognition system. A limit of
Manuscript submitted to ACM
16 La Cava et al.

the proposed method is that it does not consistently work across all pose directions, revealing worse performance for
some poses than in the original gallery (e.g., in frontal pose). Furthermore, the evaluated performance could suffer from
demographic bias due to the unbalanced demographic distribution related to the data set employed in the experiments
[81]. Finally, the authors did not take into account any understandability or forensic evaluation.
In 2020, the same authors published an extension of this work [132], in which they also proposed a DL-based shape
reconstruction. In this work, the authors extended the evaluation of the face recognition capability of the proposed
method based on linear shape reconstruction by employing a subset of the Color FERET data set [161], obtaining a
higher recognition accuracy on average as in the case of the Multi-PIE data set [81]. Furthermore, they tried to solve the
drawback of their previous work, related to worse recognition performance for some poses, with respect to usage of the
original gallery, through a fusion between the similarity scores obtained by both the original mugshot images and the
synthesized ones. The improvements previously observed by combining 2D images and 3D face models in multi-modal
approaches [10, 29, 30, 43, 44] were therefore confirmed. This approach, evaluated on the Multi-PIE data set [81],
revealed consistently better performance on all the pose angles. With respect to their previous study, the authors also
reported the computational time required for training the recognition system (Table 1). Despite the proposed novelties,
the authors did not assess if the proposed DL-based shape reconstruction approach is able to enhance recognition
capability. Finally, the study did not consider understandability or forensic evaluation.

Fig. 12. Taxonomy of 3DFR approaches in forensic scenarios.

A quantitative comparison among the previously reviewed methods would require the usage of the same face
comparators and their evaluation on the same ground truth data through the same performance metrics, and this is
often unfeasible due to many factors, such as the current state-of-the-art data sets when the work has been proposed.
Similarly, a comparison in terms of computational time is not suitable both due to the unreported information about
time complexity and the differences in terms of physical systems on which the proposed methods have been tested.
However, a qualitative comparison is provided in the following table (Table 1) and then discussed in Section 6.
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 17

Table 1. Methods based on 3DFR for face recognition from mugshots (FMR is False Match Rate).
Time complexity (s)
3DFR
Input Reconstruction Recognition Performance Understandability
Method Data sets System enhancement
data Training Single enhancement enhancement
Shape Texture approach
(subjects) test
Intel From 74.27% to
Pentium IV 93.45% face
Zhang Frontal 1,048.64
2.8-GHz Gallery recognition rate Local binary
et al. and CMU PIE [188] 985.8 225.6 (68 0.635
with enlargement introducing patterns
[230] profile subjects)
1 GB of synthesized
memory virtual views
From 72.5% to
82.1% rank-1
accuracy
Probe with probe
Frontal FERET [161]
Han and frontalization frontalization, Local binary
and and PCSO N.A. N.A. N.A. N.A. N.A.
Jain [93] or gallery from 0.1% to patterns
profile [3]
enlargement 65.5% verification
rate at FMR=0.1%
with gallery
enlargement
Dutta CMU PIE [188]
Gallery
et al. Frontal and N.A. N.A. N.A. N.A. N.A. N.A. N.A.
adaptation
[59] Multi-PIE [81]
Frontal, Intel
Color FERET Mean accuracy
left core i5 Local binary
Zeng [161], of 97.8%, 2.3%
profile, 2.60-GHz N.A. (68 Gallery patterns on
et al. Bosphorous N.A. N.A. 9 higher than
and with subjects) adaptation landmark-based
[228, 229] [178], and Zhang et al.
right 4 GB of patches
CMU PIE [188] [230]
profile memory
From rank-1
identification
Frontal, Intel
rate in the range
left Multi-PIE [81] core
Liang 133 70.35-88.22% to
profile, and Color i7-4710 Gallery
et al. 0.04 1.1 (1000 N.A. 87.94-94.88% with N.A.
and FERET with enlargement
[131, 132] samples) DL comparators
right [161] 16 GB of
on Multi-PIE [81]
profile memory
(86.30-94.41% with
Han and Jain [93])

4 OTHER APPLICATIONS OF 3D FACE RECONSTRUCTION IN FORENSICS


In addition to recognition from mugshot images, 3DFR could represent a valuable aid in other forensic contexts to
facilitate the recognition of a subject. An example is the search for missing persons. Taking into account such a
scenario, Ferková et al. [68] proposed a method that includes demographic information to improve the outcome of
the reconstruction from a single frontal image and, at the same time, speed up the related computation. In particular,
the method estimates the 3D shape of the missing person’s face by taking into account age, gender and the similarity
between the landmarks of the reference depth images and those previously annotated in the input image. Then, planar
meshes are generated by triangulating between the input image and the depth image. The authors reported that their
reconstruction method requires a computational time lower than 3 seconds and strongly depends on the underlying
landmarks estimation algorithm. Despite the good geometrical results, the width of the outcome is usually overstretched,
and the generated 3D face model does not include the forehead. Furthermore, the authors did not quantitatively evaluate
the contribution of their method to recognition capability or their potential admissibility in forensic scenarios.
Similarly to some of the previous studies, Rahman et al. [165] highlighted how 3D face models could enhance forensic
recognition from CCTV camera footage. In particular, they reconstructed the 3D face models from single frames by
optimizing an Active Appearance Model (AAM), an algorithm that matches a statistical model of shape and appearance
to an image [115]. Therefore, they evaluated the improvement in the recognition capability of different ML models with
respect to 2D AAMs. However, this study on the possible application of 3DFR to forensic recognition from surveillance
videos is limited to a data set of a few subjects, which is not publicly available. Finally, the authors did not assess the
recognition performance and did not investigate its admissibility in terms of understandability and forensic evaluation.
Manuscript submitted to ACM
18 La Cava et al.

With a similar purpose, van Dam et al. [204] proposed a method based on a projective reconstruction of facial
landmarks. An auto-calibration step is added to obtain the 3D face model from CCTV camera footage. The authors
considered the specific case of fraud to an Automatic Transaction Machine (ATM) with an uncalibrated camera under
very short distance acquisitions with a distorted perspective [201]. They analyzed how the quality of the resulting 3D
face model is affected by the number of frames and the number of landmarks, assessing the minimum values for a precise
perspective shape reconstruction, which could, however, be affected by the eventual errors on the estimated landmark
coordinates introduced by the noise. However, the authors did not quantitatively assess the method’s improvement
with respect to its 2D counterpart in face recognition. Neither understandability nor forensic evaluation was addressed.
In 2016, the same authors proposed another method to reconstruct a 3D face from multiple frame images for an
application in the forensic context [206]. Such a method employs a photometric algorithm to estimate both the texture
and the 3D shape of the face. The goal is to avoid generating an outcome biased towards any facial model, thus enhancing
the suitability in a forensic comparison process. The proposed method is a coarse-to-fine shape estimation process:
it first provides a coarse 3D shape [205] and other pose parameters from landmarks in multiple frames, and then a
refined shape is computed by assessing the photometric parameters for every point in the 3D model. The last step also
allows estimating the texture information, thus providing the dense 3D model. The authors evaluated their method
in a recognition task on a homemade data set of single-camera video recordings of 48 people containing frames with
different facial views. The reconstructed textures with the ground truth images were compared through FaceVACS
[79] by increasing the considered frames among iterations, revealing enhancement in recognition results in most cases.
Furthermore, using the likelihood ratio framework, they highlighted that in more than 60% of the cases, data initially
unsuitable for forensic cases became meaningful in the same context through the proposed method. As the authors
suggested, the outcomes can be used to generate faces under different poses, while they are not suitable for shape-based
3D face recognition. Despite the enhanced suitability in forensic scenarios, one of the most significant drawbacks is
that the model-free reconstruction approach is computationally more burdensome than a model-based one and requires
multiple images. Furthermore, the authors did not quantitatively evaluate their method on publicly available data sets.
Although the authors did not assess understandability, they introduced a forensic evaluation of their method based on
3DFR; thus, in our opinion, this is the most significant work on 3DFR applied to forensics.
Unlike all previous approaches, Loohuis [135] proposed to employ 3DFR for facing the lack of facial images, which
could be used in training ML and DL models for face recognition tasks, for example, in a surveillance scenario. The
author combined a method for generating face images with rendering techniques to simulate such adverse conditions
and assessed the impact of the resulting synthetic images on existing face recognition systems. In particular, the method
proposed by Deng et al. [53] for reconstructing the 3D model of the face, based on a DL model [96] and a 3DMM
[158], has been applied to the single images of a subset of the ForenFace data set [227] to generate images simulating
different levels of image degradation. Unfortunately, the proposed method does not perform well on very low-quality
images. However, a reasonable level of degradation in many forensic scenarios can still be mimicked because the
generated images show a high degree of similarity with the reference ones. Moreover, a similar approach employing
3DFR for generating degraded synthetic views has already been demonstrated to enhance the recognition performance
of automatic face recognition systems from low-quality videos, such as those acquired by surveillance cameras, with
holistic, local, and DL approaches [102]. Furthermore, despite the human subjectivity in perceiving the quality of an
image, such an approach could even be employed in the development of quality assessment algorithms for facial images
since it would allow comparing the degraded image against a known reference version thereof, thus aiding the selection
of potentially suitable samples either for the reconstruction or the recognition tasks [181].
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 19

5 DATA SETS FOR FACE RECOGNITION BASED ON 3D FACE RECONSTRUCTION


Public data sets provide a way to test and compare the performance of face recognition systems through a common
evaluation framework. Therefore, in this Section, we focus on the characteristics of the available data sets from the
perspective of an application of forensic facial recognition based on 3D face reconstruction. Some of them have already
been introduced in Sections 3 and 4. Furthermore, we provide some proposals about not yet employed data sets, in our
opinion, suitable for the forensic task.
We subdivided the available data sets into two categories. The first one (Subsection 5.1) is related to sets of 2D images
containing mugshot-like facial images and, eventually, in-the-wild images (i.e., images acquired in an uncontrolled
environment). These include data to test reconstruction algorithms and recognition methods in realistic forensic
scenarios. The second category (Subsection 5.2) includes sets of 3D facial scans and videos, which could be employed to
evaluate the accuracy of the 3D reconstruction algorithm and eventual 3D-to-2D or 3D-to-3D face recognition systems,
thus extending the application scenarios to realistic surveillance videos.

5.1 Image data sets


Five different data sets containing 2D images were employed in the previous studies. These data sets contain either RGB
images (i.e., color images) or grayscale images acquired in controlled or semi-controlled scenarios. Consequently, most
of them could be considered mugshot-like data sets (Table 2) and employed in studies related to mugshot-based face
recognition (Section 3). However, some of them also contain facial images in different poses and expressions, suitable
for evaluating the robustness of the proposed algorithms to such factors. We provided their description and suggested
some of their possible uses in studies related to forensic face recognition based on 3D face reconstruction. Besides these,
we indicated data sets unemployed in the previous studies but of great potential in our view, allowing us to address
some shortcomings of the other data sets or even being explicitly designed for realistic forensic scenarios.
Table 2. Mugshot-like data sets
Data set Image types Subjects Forensic features Acquisition context Used by
Color FERET [161] RGB 994 None Semi-controlled [228][229][132]
FERET [161] Grayscale 1199 None Semi-controlled [93]
CMU PIE [188] RGB 68 None Controlled [230][59][228]
Multi-PIE [81] RGB 337 Landmarks Controlled [59][132]
PCSO [3] RGB 28557 None Controlled [93]
NIST MID [215] Grayscale 1573 None Controlled
Morph (Academic) [169] RGB 13618 Eye coodinates Controlled
SCface [80] RGB & IR 130 Landmarks Controlled & uncontrolled
ATVS Forensic DB [207] RGB 50 Landmarks Controlled
LFW [103] RGB 5749 None Uncontrolled

The Color FERET data set [161] contains multi-pose, multi-expression, and multi-session facial images captured
in a semi-controlled environment during 15 sessions across nearly three years, intended to aid in the development of
the forensic field. It contains RGB images of size 512×768 pixels. The face of each individual was captured in up to
13 different poses and sometimes on different dates, with an average of about 11 samples per subject. These images
represent the frontal pose with different facial expressions, the right and left profiles at different angles with respect to
the frontal one, and extra irregular positions. Furthermore, some images were captured while individuals were wearing
eyeglasses or pulling their hair back, adding further intra-subject variability to the samples. This data set has been
analyzed by some of the previously reviewed studies [132, 228, 229] for its application to biometric recognition based on
3D reconstruction from multi-view facial images, considering them as mugshots, while using other samples of the same
subjects for evaluating the recognition performance and the robustness of the system to pose and facial expression.
Despite the absence of entirely uncontrolled acquisition, the variations in scale, pose, expression, and illumination,
together with the relatively low quality of the images, make the data set potentially suitable for studies related to 3D
Manuscript submitted to ACM
20 La Cava et al.

face reconstruction for surveillance-related tasks, such as the mugshot-based recognition of a suspect captured by a
CCTV camera. Furthermore, its multi-session characteristic makes the data set even suitable for studies related to aging,
in order to make the system more robust to changes in the appearance of a person’s face over time [40].
The data set referenced as FERET is the grayscale version of the Color FERET data set. It has been used for evaluating
the recognition performance of 3D face reconstruction based on a pair of frontal-profile facial images [93]. Since the
images are grayscale, the potential applications of this data set appear limited with respect to its colorized version,
significantly reducing information about the appearance of the subjects, while the main advantage of this data set is the
lower memory required, since a single color channel is used.
The CMU-PIE data set [188] is made up of multi-pose, multi-expression and multi-illumination face images. It
contains 41368 RGB images of size 640x486 pixels, collected through a common setup composed of 13 fixed cameras and
21 flashes. Therefore, the faces of individuals were acquired in up to 13 different poses, under 43 different illumination
conditions, and with four different expressions. Furthermore, a background image from each of the 13 cameras was
acquired at each recording session to ease face localization. Subsets of CMU-PIE were involved in some of the previously
described studies, aiming to evaluate how the recognition performance could benefit from the 3D face reconstruction
obtained from a single frontal image [59], a pair of frontal-profile images [230], or the frontal image and both left
and right profile images [228], evaluating the robustness of the system to large poses. Furthermore, due to its nature,
CMU-PIE can also be used for evaluating the robustness of such systems to illumination conditions and facial expressions.
The controlled environment could limit the usage of a system based on this data set, making it unsuitable for in-the-wild
applications. However, its multi-camera setting makes it possible to obtain more images of the same subjects with the
same environmental condition as in a registration-like scenario, eventually making the 3D reconstruction from multiple
images more straightforward and aiding detailed geometric and photometric modeling of the faces [78, 188]. Other
limits of this data set are the relatively low number of subjects (Table 2), which represents a shortcoming for evaluating
the inter-subject discriminability, and the limited intra-subject variability due to the single-session scenario and the
small range of expressions [81].
The CMU Multi-PIE data set was collected to address such issues [81]. In particular, it provides 755370 facial RGB
images of size 3072×2048 pixels collected by 15 cameras using 18 different flashes in a controlled environment, with
similar settings used for collecting the CMU PIE data set [188]. Hence, it represents a multi-pose, multi-expression,
multi-illumination and multi-session face data set, which introduces more variability thanks to up to 6 facial expressions
and four sessions, increasing the quality of the images as well. This data set has been used to evaluate a recognition
system’s performance based on 3D face reconstruction from frontal and profile images [132]. Furthermore, it has been
used to evaluate the biometric performance on non-frontal images of 16 subjects common with the CMU-PIE [188],
from which the 3D face was reconstructed to synthesise a non-frontal view of the subject, which can be compared with
the tested image [59]. Hence, in addition to appearing, on the whole, suitable for the tasks already described while
discussing the CMU-PIE [188], CMU Multi PIE offers the possibility to evaluate the aging robustness [40], especially
jointly with its predecessor, which was acquired about four years earlier, even if in a limited way due to the limited
number of common subjects. These would also allow the evaluation of performance by employing different acquisition
parameters. Another interesting feature of this data set is the presence of 68 annotated facial landmark points for images
in the range 0° to 45° both left and right and 39 points for profile images, which could be exploited in both reconstruction
and recognition algorithms. The most evident disadvantage of this data set is still the collection in a strictly controlled
environment, which makes it unsuitable for in-the-wild applications. Finally, its demographic distributions could lead
to a bias since the subjects were predominantly men (69.7%) and European-Americans (60%) [81].
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 21

The PCSO data set contains mugshot images collected as part of the booking process. One or more RGB images per
subject are completed with metadata such as age, sex, and ethnicity. Despite some variations in lighting conditions
and head positions, photographic parameters are relatively consistent, and the quality of the images is quite good,
with good contrast between the background and the individual, who is photographed with a frontal face and neutral
expression [122]. The most significant advantage of this data set with respect to the previously described ones is its
large number of subjects (Table 2), making it suitable for longitudinal research. However, there is no intuitive way to
relate multiple arrest records from the same individual [122], making it challenging to perform multi-session analyses.
Moreover, it appears to be unsuitable for studies on robustness in the wild due to its semi-controlled nature. Finally, the
data set does not appear to be currently available to other researchers except those already enabled in the past [121].
Due to this availability issue, a possible alternative to the PCSO is the NIST-MID [215], containing frontal and profile
facial views. Although some subjects were not acquired in the profile view, other subjects were acquired even more
than once in both frontal and profile views. However, these mugshot images are in 8-bit grayscale and, therefore, do
not allow fully exploiting the information provided by the face’s texture. Furthermore, the ratio between male and
female subjects (i.e., about 19 males for each female) could lead to a demographic bias.
Another alternative is the academic version of the MORPH data set [169], which contains scanned frontal and profile
mugshot images related to 13618 subjects [8]. The images were acquired in different periods of time, up to 1681 days,
with an average longitudinal time between photos of 164 days and an average of 4 acquisitions per subject [221], thus
allowing the evaluation of the robustness of a recognition system to time progression. MORPH also allows analyzing
the system’s robustness to the age variation across different subjects since ages range from 16 to 77. Moreover, it also
contains annotations related to the location of the eyes [227], which could be required by some recognition algorithms
or employed for evaluating their automatic detection. However, the algorithms tested on this data set could also suffer
from a demographic bias due to unbalance between male and female individuals and in terms of ethnicity.
A data set simulating realistic forensic scenarios is the SCface [80], providing both mugshot and surveillance images
acquired through a high-quality photo camera in controlled conditions and five different commercial cameras at the
same height, respectively (e.g., Fig. 2). NIR (near-infrared) mugshot images are included as well. The probe images were
acquired indoors using the outdoor light coming through a window on one side as the only illumination source. The
observed head poses are the ones typically found in footage acquired by a regular commercial surveillance system,
with the camera placed slightly above the subject’s head [41]. In total, 21 images of each subject were taken at three
different distances from each surveillance camera, between 1 and 4.2 meters. The RGB mugshot images were also
cropped by following the ANSI 385-2004 standard recommendations [18]. Furthermore, SCFace provides 21 manually
annotated facial landmarks [199], which could be employed in both reconstruction and recognition algorithms and
metadata on demographic information and the presence of glasses and moustache [41]. Therefore, this data set could be
suitable for studies on photoanthropometry. To summarize, SCFace could be employed to analyze the effect of different
quality and resolution cameras on face recognition performance and the robustness to different illumination conditions,
distances, and head poses. SCFace also allows studies on recognition from NIR images, which are inherently more
robust to illumination changes than images acquired in the visible spectrum [174]. However, this aspect could find
limited application in real-world scenarios due to the specific hardware system required to acquire NIR images [129].
From a 3D reconstruction perspective, the information provided by the nine different poses makes this data set also
suitable for evaluating the performance of the related algorithms in a realistic scenario. Although its characteristics
make it suitable for low-resolution face recognition in forensic research, traces in the SCface data set only consist of

Manuscript submitted to ACM


22 La Cava et al.

frontal surveillance camera images [227]. Furthermore, the difference in the distribution between male and female
individuals (i.e., 114 and 16, respectively) and the absence of non-Caucasian people could lead to a demographic bias.
Another set of images containing forensic annotations (i.e., 21 landmarks on frontal faces) is the ATVS-Forensic
[207]. Despite the relatively low number of subjects (i.e., 32 men and 18 women) and the limitation concerning the
potential application scenarios since the data set only consists of high-quality mugshot images, this data set would
allow evaluating the robustness of the recognition system to the distance thanks to the acquisition at three different
distances between 1 and 3 meters from the camera. Furthermore, it provides a lateral view of the full body and the face.
All the images were acquired in each of the two sessions held for each subject, therefore simulating forensic scenarios
in which the mugshot images of a suspect have been acquired on a different day with respect to the probe image.
One mention that must be made is that of the LFW data set [103]. Although not explicitly designed for forensic
applications, it has been employed in many face recognition algorithms that can cope with uncontrolled settings.
It contains images acquired in unconstrained scenarios, including variations of pose, expression, hairstyles, camera
parameters, background, lightning, and other demographic aspects. Due to the variability in terms of the number of
images for each subject, up to 530, LFW is suitable for both identification and verification scenarios.
The considerable differences among the reviewed data sets make them suitable for different purposes. Therefore,
future studies should consider these differences in order to assess whether a specific data set is suitable for the
performance evaluation of the proposed system, starting from the representativeness of the images contained with
respect to the aimed scenario.

5.2 3D scan and video data sets


Despite the smaller amount of available data sets, videos and 3D face scans could be effectively employed in order to
evaluate the proposed 3D face reconstruction algorithms. Their characteristics are summarized in Table 3. In particular,
the acquisition context of the analyzed dataset could make them suitable for different scenarios that are characteristics
of the forensic fields, either in terms of reference images or probe data. Moreover, most of them contain annotations
which are traditionally employed in forensic cases (e.g., landmarks). These features motivate their potential in the
simulation of the face comparison from surveillance footage.
Table 3. 3D scan and video data sets
Data set Data types Subjects Forensic features Acquisition context Used by
Bosphours [178] 3D scans & images 105 Landmarks Controlled [132, 229]
ForenFace [227] 3D scans, videos & images 97 Annotated facial parts Controlled & uncontrolled [135]
Quis-Campi [155] 3D scans, videos, images & gait 320 Eye coordinates Controlled & uncontrolled
Wits Face Database [20] Videos & images 622 None Controlled & uncontrolled
IJB-C [144] Videos & images 3531 None Uncontrolled
FIDENTIS (Licensed) [203] 3D scans 200 Landmarks Controlled
NoW benchmark [177] 3D scans & images 100 None Controlled & uncontrolled
Florence 2D/3D [21] 3D scans & videos 53 None Controlled & uncontrolled

The Bosphorus data set [178] contains both multi-pose and multi-expression 3D data representing the shape of the
face and the correspondent RGB texture images of size 1600x1200 pixels. It comprises 4666 face scans related to 60
men and 45 women, mainly Caucasian and aged between 25 and 35. The scans were acquired in a single view using a
structured-light-based 3D system while the subjects were sitting at a distance of about 1.5 meters. Several face scans are
available per subject, in up to 13 head poses with different yaw and pitch angles, and up to 4 deliberate occlusions of eyes
or mouth through beard, moustache, hair, hand or eyeglasses, and 34 different facial expressions for each. Furthermore,
it provides up to 24 facial landmarks manually annotated on 2D and 3D images, making it suitable for studies based on
photoanthropometry and the estimation of landmarks. Bosphorus was involved in the evaluation of the performance of
a recognition system based on the 3D face reconstruction from multi-view facial images [229] and the assessment of
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 23

the accuracy of the reconstruction from frontal and profile images [132, 229]. This data set is suitable for studies on
robustness to occlusions and adverse conditions such as different poses and expressions, thanks to its big intra-subjects
variability. One of its main disadvantages is the low ethnic diversity [50]. Moreover, the acquisitions under uniform
illumination do not allow investigation of the effects of the light variations on reconstruction and recognition. Finally,
it contains corrupted data due to subject movements during acquisitions and self-occlusion.
The ForenFace data set [227] contains 3D scans, videos, and high-quality mugshot images and has been specifically
designed to represent realistic forensic scenarios. In particular, ForenFace contains images of five views per subject,
photos of an identity document (i.e., employee cards taken months or even years before), and the related frontal and
semi-profile 3D scan as reference material. The CCTV videos and stills from visible and partially occluded subjects
were instead acquired indoors through six different models of surveillance cameras in various locations, positions,
and distances from the subject. ForenFace also includes a large set of anthropomorphic features that forensic facial
practitioners employ during forensic work, such as those proposed by FISWG [83]; this makes it suitable for studies on
morphological comparison and valuable due to the lack of data sets of facial features allowing quantitative, statistical
evaluation of face comparison evidence [150]. This data set is very flexible in its usage and suitable for studies related
to various application scenarios. For example, it is suitable for evaluating errors with different models of surveillance
cameras. Another potential use is in evaluating the robustness of age differences through passport-style images. The
acquired videos allow for assessing the robustness to partial occlusions of the face, thanks to eyeglasses, beard, and
baseball caps, and evaluating the reconstruction from probe videos and frontal/profile images. It could also be employed
to evaluate methods for extracting facial features and comparing them with the annotated ones. Finally, ForenFace
allows the recognition task across different types of facial data (e.g., probe video vs mugshot image or 3D scan). Despite
being particularly useful for forensic research, the size of ForenFace is relatively small from a biometric perspective
[227]. Furthermore, the predominance of the Caucasian ethnicity could lead to demographic bias.
The Quis-Campi [155] data set is made up of videos and images taken from modern surveillance systems that
typically have a higher resolution than traditional ones. Compared to the previous data set, Quis-Campi contains
data related to more subjects (Table 3) captured in the outdoor environment in unconstrained conditions through a
camera about 50 meters from the subject. It also contains 3D scans of the face and reference images acquired indoors.
Furthermore, it provides gait recordings as full-body video sequences, which could be employed in a multimodal
recognition system. Annotations about the locations of the eyes in each frame were also added, which can be useful
for evaluating the performance of eye detection or head-pose estimation algorithms. In summary, Quis-Campi can be
adopted to assess the robustness to the key adverse factors of forensic face recognition in the wild, namely expression,
occlusion, illumination, pose, motion-blur, and out-of-focus, in a realistic outdoor scenario through an automated image
acquisition of a non-cooperative subject on-the-move and at-a-distance [155]. On the other hand, it lacks a good set of
reference images [227] and could lead to demographic bias due to the predominance of the Caucasian ethnicity.
In order to perform a CCTV-based recognition, including both the identification and the verification scenarios, it
is possible to employ the Wits-Face data set [20]. It includes African male individuals aged between 18 and 35, each
acquired in ten photos, in five different frontal and profile views with a neutral expression and facing straight ahead,
both under natural outdoor lighting and artificial indoor fluorescent lighting conditions. CCTV video recordings were
acquired from 334 subjects in indoor or outdoor environments, allowing the evaluation of the difference in a face
recognition system’s performance between these. Furthermore, some of the recordings are related to subjects wearing
caps or sunglasses, thus allowing the evaluation of the robustness of such partial occlusions. One critical issue of
Wits-Face is related to demographic bias since only images and videos related to male subjects were acquired.
Manuscript submitted to ACM
24 La Cava et al.

In the context of face recognition from videos, it is worth mentioning the IARPA Janus Benchmark (IJB) data sets,
which contain facial images and videos varying in pose, illumination, expression, resolution, and occlusion, mainly
acquired in an uncontrolled scenario. The most recent of these data sets is the IJB-C [144]. In particular, it provides
21294 facial images and 11779 face videos of 3531 subjects. Furthermore, all media has manually annotated facial
bounding boxes, and the data set includes 10040 non-facial images, allowing studies related to face detection as well.
Finally, attribute metadata related to age, gender, occlusion, capture environment, skin tone, facial hair, and face yaw is
provided as well, allowing further examinations like occlusion detection and analysis of the demographic bias.
Considering the reconstruction task’s evaluation, data sets containing 3D facial scans acquired in controlled conditions
may be of some use. An example is the FIDENTIS data set [203], which licensed version provides textured one or even
multiple 3D scans of 83 males and 117 females. In particular, it contains both raw frontal, profile, and merged models,
the latter with and without ears. Furthermore, the models with the ears are provided with 42 associated landmarks,
making them suitable for studies on photoanthropometry and the estimation of such landmarks (e.g., [67]). Moreover, it
is also suitable for the analysis of multi-session differences. However, a system evaluated on this data set could suffer
from biased performance since, despite ages ranging between 18 and 67, 75% of the subjects are aged between 21 and 29.
In order to evaluate the reconstruction methods under variations in lighting, occlusions, facial expression, acquisition
environment, and viewing angle, it is possible to employ the NoW benchmark [177]. This data set contains 2054 2D
images of 100 subjects (45 males and 55 females), captured with an iPhone X, and a 3D head scan for each subject as
ground truth, captured through an active stereo system with the individual in a neutral expression. However, further
demographic information about the subjects is not provided.
A data set which allows evaluating the reconstruction from videos is the Florence 2D/3D [21], providing 3D scans
of 53 subjects (39 males and 14 females) and indoor and outdoor videos acquired in controlled and uncontrolled settings.
In particular, it is composed of four 3D models for each subject (i.e., two frontal, a right-side, and a left-side) and a
further model with glasses whether he/she wears them. The HD videos (1280x720) were acquired indoors, at 25 FPS and
four levels of zoom, while asking the subject to generate specific head rotations. The uncontrolled videos were acquired
indoors at 25 FPS (704x576) and outdoors at 5-7 FPS (736x544), both at three levels of zoom and with the subject asked
to be spontaneous. Hence, this data set could be employed in studies on reconstruction and recognition in realistic
surveillance conditions, still lacking the occlusions which are typical of such a scenario. Moreover, the demographic
bias could represent an issue since all the subjects are Caucasian and mostly aged between 20 and 30.
All the data sets described in this subsection consider one or more of the issues addressed in realistic forensic
scenarios, both in terms of the environment (light conditions, indoor/outdoor), the subject (presence of occlusions,
adverse poses, facial expressions), and technological factors (lower probe resolution, motion-blur, out-of-focus). Some
of them also provide annotations related to forensic features (eye coordinates, facial parts), which could be useful in
actual law courts [83, 217]. What is currently missing in the state-of-the-art data sets is the presence of occlusions of the
lower face, such as in the case of facial masks, which could aid the research on the robustness to non-facial occlusions.

6 DISCUSSIONS
In this paper, we reviewed the state of the art of 3D face reconstruction (3DFR) from 2D images and videos for forensic
recognition, evaluating the proposed approaches with respect to the requirements of a potential forensics-related system.
Furthermore, the proposed approaches for enhancing forensic recognition in terms of performance were analyzed
together with their potential application scenarios (Fig. 6).

Manuscript submitted to ACM


3D Face Reconstruction: the Road to Forensics 25

The previously described studies mainly focus on enhancing the performance of recognition tasks in different
contexts, such as the identification or verification of suspects within a gallery of mugshot images or the search for
missing persons. They revealed the potential advantages of the fusion of the reconstructed model and the original
images, which would allow taking advantage of the characteristics of a 3D facial model while limiting the possible loss
of information in the reconstruction [132]. So far, researchers have proposed employing 3DFR either on the reference
data or the probe material by re-projecting the 3D model into 2D images to aid a 2D-to-2D recognition. In particular,
the first approach could find application in the adaptation of the pose of the model to the face in the probe material for
easing both the visual comparison for investigative purposes and for employing the so-obtained figure in comparing it
with the probe face through an automatic system, as preliminarily proposed for its application in forensic scenarios by
Dutta et al. [59] and then further investigated by Zeng et al. [228, 229]. Similarly, this projection of reference 3D faces
in the 2D domain in various poses demonstrated to improve the recognition performance of such systems, especially
concerning their robustness to pose variation, introducing it as an augmentation step for training feature-based [93, 230]
and DL systems [131, 132]. Moreover, Loohuis [135] suggested that 3DFR could be successfully employed for mimicking
the degraded quality of the probe data when coupled with rendering techniques for simulating such adverse conditions.
The 3DFR from probe material finds applications in many scenarios as well, easing the comparison from a single probe
image by rendering the face in order to match the pose with a reference image [93, 165] or by reconstructing it from
multiple frames of a surveillance video [204, 206].
Despite the promising results, especially concerning the robustness to pose variations in various probe and reference
data types, most of the previously described studies did not evaluate their methods considering other requirements of
an automated system supporting forensic analysis (Fig. 5) related to understandability and forensic evaluation [27, 110],
as summarized in Fig. 4. Moreover, the proposed methods do not assess their robustness to some typical issues of
forensic cases, such as the presence of occlusions [94, 114], making them inherently unsuitable for recognition scenarios
involving them (Subsection 2.2). However, some of them implicitly used a face recognition algorithm based on local
descriptors [93, 228–230], which supports the understandability of the output [190, 222]. Furthermore, a single study
[206] employed a framework for easing forensic evaluation.
Although most of the proposed methods aim to enhance face recognition performance, they are not comparable
quantitatively due to the variability in the considered settings. One of the most relevant differences is related to the
involved data sets, which differ in acquisition environment, size, and availability (Section 5). The differences in terms
of data type and quality represent another factor that makes them suitable for different tasks. Thus, it is necessary
to address and compare these data sets separately (Subsection 2.2) in terms of the recognition approach (Fig. 12) and
application scenarios (Sections 3 and 4). Of course, differences are due to the time of publication, but recently the
withdrawal of their availability due to more strict privacy rules on biometric data in the latter years made things
complex. For example, the General Data Protection Regulation (GDPR) rules in the European Union strongly differ
from those of other countries [116, 235]. In particular, future studies should be based on data sets suitable for forensic
research. The model, in our opinion, is the ForenFace data set [227] because it takes realistic circumstances into account
and also provides a set of anthropomorphic features proposed by the FISWG [83]. Furthermore, they should evaluate
the face reconstruction accuracy on large-scale 3D face data sets, such as the FIDENTIS one [203]. Some forensic use
cases are not yet included in any benchmark data set; for example, the special case of CCTV-based recognition from
images recorded at ATMs with a very short distance from the subject and a distorted perspective [108].
For both reconstruction and recognition tasks, a demographic analysis should be conducted on the performance to
assess the bias against some demographic groups, an undesired issue in forensics that is sometimes overlooked even
Manuscript submitted to ACM
26 La Cava et al.

in current research [110]. To this aim, explicit demographic information about subjects represented in the data sets
could aid in facing such an issue [180]. However, this useful data may be difficult to be assembled and recovered due
to the privacy rules mentioned above. Moreover, the source of this bias could be related to the unbalancedness of the
underlying data. This issue could be relieved by employing synthetic data sets, like the FAIR benchmark [66]. However,
the employment of synthetic data still requires more investigation to be fully validated [45] and then accepted in the
forensic context.
The eventual underlying 3D reference model could be affected by bias problems as well [110], which may affect
the face recognition system, making it unsuitable in forensic cases [206]. Therefore, a model-free reconstruction
approach should be employed whenever possible. An example of this reconstruction approach is stereophotogrammetry,
which allows capturing craniofacial morphology in high quality [97] to a level of detail that is often less important in
generic recognition applications but which becomes crucial in the forensic context. Although it could not be suitable for
its involvement in 3D-to-3D recognition scenarios, especially when based on shape comparison, such a reconstruction
approach could be exploited in the generation of synthetic views for the comparison with the reference material [206]
and, therefore, employed in a 2D-to-3D scenario. However, a drawback is the requirement of multiple images of the
suspects [206], which cannot be acquired in any forensic case. Another disadvantage of a model-free reconstruction
approach is the significantly higher computational time required, making it unaffordable for real-time applications.
Nonetheless, this represents a minor issue for many forensic applications, such as the ones related to lawsuits.
Thus, when a photometric reconstruction approach is unsuitable, a choice between approaches based on 3DMM and
DL must be made [148], even those not strictly proposed for forensic applications, taking into account their suitability,
advantages and drawbacks. For example, the methods based on 3DMM allow generating an arbitrary number of facial
expressions, while those based on DL provide high-quality face texture synthesis [77, 148]. Therefore, the morphological
model could be employed either for adapting the expression of the reference model or for imposing a neutral expression
on the normalised face in the probe image, while a detailed reconstruction could be obtained through a DL network
[77]. However, it must be pointed out that huge manipulation, such as expression modification, could not be allowed
in most evaluation cases, still being a valid aid for investigation purposes. These approaches have technical limits,
namely the focus on global characteristics rather than fine details of the morphological model and the requirement of a
great number of 3D scans for the training of DL networks [148]. The lack of understandability is another issue for the
DL approach as well [27]. However, combining two or more reconstruction approaches could help limit some of the
drawbacks of the single approach. For example, previous studies highlighted that it could be possible to reconstruct 3D
faces that are highly detailed even with a single image by combining the prior knowledge of the global facial shape
encoded in the 3DMM and refining it through a photometric approach [37, 130, 172]. Similarly, the combination between
a morphological model and one or more DL networks has been proposed as well [65]. State-of-the-art methods not
explicitly proposed for forensic applications should be further investigated in terms of potentialities and suitability
as well, especially those based on DL, which revealed to be promising in addressing some of the typical issues in
forensics like occlusion removal (e.g., [147, 183]), 3DFR from one or multiple in-the-wild images (e.g., [133, 231]), and
face frontalization (e.g., [223]), thus potentially representing an aid in many investigative scenarios.
The computational time represents one of the main reasons why automated systems should be employed in forensics.
It is an important feature in some specific applications, such as real-time identification through surveillance cameras
(e.g., [71]). In this regard, the online computational time must be assessed, representing the time required to test a single
probe image and, thus, to recognize the captured individual. Specifically, it depends on both the recognition algorithm
and the eventual strategy that must be applied to the probe to enhance the recognition task (e.g., some "canonical"
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 27

representation). In these terms, a reasonable computational time for some applications related to surveillance and
lawsuit was reported by Zhang et al. [230] and Zeng et al. [228] (Table 1). Zhang et al. [230] and Liang et al. [131, 132]
also evaluated the offline computational time, representing the time required for applying the proposed enhancement
approach based on 3D face reconstruction (e.g., gallery enlargement) and for the training of the recognition system. In
particular, reported values suggest a notable improvement with respect to earlier proposals. However, despite these
representing the most time-consuming processes, the offline computational time is generally of less concern since it
does not impact real-time operations.
It is important to remark that the most important feature in forensics is generally the reconstruction accuracy
[15, 68, 195] since it represents a requirement which is often more strict than in generic recognition tasks. In the
literature, 3D model quality is evaluated from the errors in terms of shape by estimating the distance between the
model and the corresponding ground truth. However, the extracted texture’s quality should be assessed as well due
to its role in the recognition task [12, 16, 97]. For example, the texture could allow exploiting facial marks, such as
scars and tattoos. Their exploitation would enhance both performance and understandability in forensic comparison
[15, 83, 157, 190, 191]. Furthermore, these facial marks are becoming even more valuable thanks to the availability
of higher resolution sensors and the growing size of face image databases and their capability to improve speed and
performance of recognition systems [111]. Hence, future research should take into account these additional features to
assess their permanence in the generated 3D models. This also holds for other morphological features, which forensic
examiners evaluate to justify the outcome of the facial comparison (e.g., the decision whether the suspect is likely to be
the one represented in a probe image) [9]. In addition to holistic ones (e.g., the overall shape), local characteristics are
related to the proportions and the position of facial features, such as the relative size of the ears with respect to the eyes,
nose, and mouth [83]. The asymmetry between facial components should also be considered [83], thanks to its higher
physical stability over time than other features. For example, the overall shape of the face could change because of the
weight increase [86]; however, the asymmetry between facial components is less affected. Therefore, these features
could be an effective aid for forensic examiners even to justify their conclusion on the comparison in law courts.
To sum up, we expect that great attention will be paid to the improvement of the recognition capability in forensic
scenarios by 3DFR. Extremely unfavourable conditions, typically encountered in criminal cases, could be more affordable
by considering both shape and texture appropriately modelled. To this goal, data representative of forensic trace and
reference material are necessary, also considering the robustness to other common factors altering the appearance,
such as facial hair and the presence of occlusions. The bias toward a demographic group would be avoided in the
data sets, favouring the system’s fairness. In our opinion, the proposed algorithms’ understanding would couple with
data availability. Data and algorithms will play a central role in effectively integrating 3D face reconstruction from 2D
images and videos in the forensic field. Similarly, the employment of frameworks for easing forensic evaluation by
non-expert professionals should become a practice for stressing the admissibility of the proposed methods in real cases.
To this aim, an interdisciplinary approach involving computer science and law experts would speed up this process.
Therefore, we believe that its future involvement in real-world forensic applications is not far and that this survey
contributes as a step toward this scenario.

ACKNOWLEDGEMENTS
This work has been partially supported by the Italian Ministry of University and Research (MUR) within the PRIN2017-
BullyBuster—A framework for bullying and cyberbullying action detection by computer vision and artificial intelligence

Manuscript submitted to ACM


28 La Cava et al.

methods and algorithms (CUP: F74I19000370001), and by SERICS (PE00000014) under the MUR National Recovery and
Resilience Plan funded by the European Union - NextGenerationEU.

REFERENCES
[1] 1923. Frye v. United States.
[2] 1993. Daubert v. Merrell Dow Pharmaceuticals, Inc. , 579 pages.
[3] 1994. Pinellas County Florida Sheriff’s Office.
[4] 1995. European network of forensic science institutes. http://www.enfsi.eu/
[5] 1995. R v. Clarke. , 425 pages.
[6] 1997. General Electric Co. v. Joiner. , 136 pages.
[7] 1999. Kumho Tire Co. v. Carmichael. , 137 pages.
[8] 2003. MORPH Non-Commercial Release Whitepaper. http://people.uncw.edu/vetterr/MORPH-NonCommercial-Stats.pdf
[9] 2008. Facial identification scientific working group. http://www.fiswg.org/
[10] AF Abate, M Nappi, D Riccio, and G Sabatino. 2007. 2D and 3D face recognition: A survey, Pattern Recognition Letters. In International Conference
on Pattern Recognition (ICPR). 41.
[11] Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access 6
(2018), 52138–52160.
[12] HM Rehan Afzal, Suhuai Luo, M Kamran Afzal, Gopal Chaudhary, Manju Khari, and Sathish AP Kumar. 2020. 3D face reconstruction from single
2D image using distinctive features. IEEE Access 8 (2020), 180681–180689.
[13] Timo Ahonen, Abdenour Hadid, and Matti Pietikainen. 2006. Face description with local binary patterns: Application to face recognition. IEEE
transactions on pattern analysis and machine intelligence 28, 12 (2006), 2037–2041.
[14] Tauseef Ali. 2014. Biometric score calibration for forensic face recognition. (2014).
[15] Tauseef Ali, Raymond Veldhuis, and Luuk Spreeuwers. 2010. Forensic face recognition: A survey. Centre for Telematics and Information Technology,
University of Twente, Tech. Rep. TR-CTIT-10-40 1 (2010).
[16] B Ben Amor, Karima Ouji, Mohsen Ardabilian, and Liming Chen. 2005. 3D Face recognition by ICP-based shape matching. LIRIS Lab, Lyon Research
Center for Images and Intelligent Information Systems, UMR 5205 (2005).
[17] ANSA. 2018. Ladri individuati grazie al nuovo sistema di riconoscimento facciale. ANSA (2018). https://www.ansa.it/lombardia/notizie/2018/09/07/
ladri-individuati-grazie-a-software-ps_cd3a5272-5a52-4999-9138-4b976d4e5738.html
[18] ANSI. 2004. ANSI INCITS 385-2004, Face Recognition Format for Data Interchange. ANSI (2004).
[19] Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio
Gil-López, Daniel Molina, Richard Benjamins, et al. 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and
challenges toward responsible AI. Information fusion 58 (2020), 82–115.
[20] Nicholas Bacci, Joshua Davimes, Maryna Steyn, and Nanette Briers. 2021. Development of the Wits Face Database: an African database of
high-resolution facial photographs and multimodal closed-circuit television (CCTV) recordings. F1000Research 10 (2021).
[21] Andrew D Bagdanov, Alberto Del Bimbo, and Iacopo Masi. 2011. The florence 2d/3d hybrid face dataset. In Proceedings of the 2011 joint ACM
workshop on Human gesture and behavior understanding. 79–80.
[22] Giovanni Barroccu. 2013. La prova scientifica nel processo penale. Diritto@ storia. Nuova serie 11 (2013).
[23] Lacey Best-Rowden, Hu Han, Charles Otto, Brendan F Klare, and Anil K Jain. 2014. Unconstrained face recognition: Identifying a person of interest
from a media collection. IEEE Transactions on Information Forensics and Security 9, 12 (2014), 2144–2157.
[24] Hitoshi Biwasaka, Takuya Tokuta, Yoshitoshi Sasaki, Kei Sato, Takashi Takagi, Toyohisa Tanijiri, Sachio Miyasaka, Masataka Takamiya, and
Yasuhiro Aoki. 2010. Application of computerised correction method for optical distortion of two-dimensional facial image in superimposition
between three-dimensional and two-dimensional facial images. Forensic science international 197, 1-3 (2010), 97–104.
[25] Volker Blanz, Patrick Grother, P Jonathon Phillips, and Thomas Vetter. 2005. Face recognition based on frontal views generated from non-frontal
images. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2. IEEE, 454–461.
[26] Volker Blanz and Thomas Vetter. 2003. Face recognition based on fitting a 3D morphable model. IEEE Transactions on pattern analysis and machine
intelligence 25, 9 (2003), 1063–1074.
[27] Timothy Bollé, Eoghan Casey, and Maëlig Jacquet. 2020. The role of evaluations in reaching decisions using automated systems supporting forensic
analysis. Forensic Science International: Digital Investigation 34 (2020), 301016.
[28] Fred L. Bookstein. 1989. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on pattern analysis and
machine intelligence 11, 6 (1989), 567–585.
[29] Kevin W Bowyer, Kyong Chang, and Patrick Flynn. 2004. A survey of 3D and multi-modal 3D+ 2D face recognition. (2004).
[30] Kevin W Bowyer, Kyong Chang, and Patrick Flynn. 2006. A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition.
Computer vision and image understanding 101, 1 (2006), 1–15.
[31] Ali Breland. 2017. How white engineers built racist code–and why it’s dangerous for black people. The Guardian 4 (2017).

Manuscript submitted to ACM


3D Face Reconstruction: the Road to Forensics 29

[32] Vicki Bruce, Zoë Henderson, Craig Newman, and A Mike Burton. 2001. Matching identities of familiar and unfamiliar faces caught on CCTV
images. Journal of Experimental Psychology: Applied 7, 3 (2001), 207.
[33] Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on
fairness, accountability and transparency. PMLR, 77–91.
[34] A Mike Burton, Stephen Wilson, Michelle Cowan, and Vicki Bruce. 1999. Face recognition in poor-quality video: Evidence from security surveillance.
Psychological Science 10, 3 (1999), 243–248.
[35] Marinella Cadoni, Andrea Lagorio, Enrico Grosso, and Massimo Tistarelli. 2010. Exploiting 3d faces in biometric forensic recognition. In 2010 18th
European Signal Processing Conference. IEEE, 1670–1674.
[36] Jie Cao, Yibo Hu, Hongwen Zhang, Ran He, and Zhenan Sun. 2020. Towards high fidelity face frontalization in the wild. International Journal of
Computer Vision 128, 5 (2020), 1485–1504.
[37] Xuan Cao, Zhang Chen, Anpei Chen, Xin Chen, Shiying Li, and Jingyi Yu. 2018. Sparse photometric 3D face reconstruction guided by morphable
models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4635–4644.
[38] Gaetano Carlizzi and Giovanni Tuzet. 2018. La prova scientifica nel processo penale. G Giappichelli Editore.
[39] Eoghan Casey. 2020. Standardization of forming and expressing preliminary evaluative opinions on digital evidence. Forensic Science International:
Digital Investigation 32 (2020), 200888.
[40] Gabriel Castaneda and Taghi M Khoshgoftaar. 2015. A survey of 2D face databases. In 2015 IEEE International Conference on Information Reuse and
Integration. IEEE, 219–224.
[41] Helder F Castro, Jaime S Cardoso, and Maria T Andrade. 2021. A Systematic Survey of ML Datasets for Prime CV Research Areas—Media and
Metadata. Data 6, 2 (2021), 12.
[42] Christophe Champod, Alex Biedermann, Joëlle Vuille, Sheila Willis, and Jan De Kinder. 2016. ENFSI guideline for evaluative reporting in forensic
science: A primer for legal practitioners. Criminal Law and Justice Weekly 180, 10 (2016), 189–193.
[43] Kyong I Chang, Kevin W Bowyer, and Patrick J Flynn. 2003. Face recognition using 2D and 3D facial data. In Workshop in Multidimonal User
Authentication pp25-32. Citeseer.
[44] Kyong I Chang, Kevin W Bowyer, and Patrick J Flynn. 2005. An evaluation of multimodal 2D+ 3D face biometrics. IEEE transactions on pattern
analysis and machine intelligence 27, 4 (2005), 619–624.
[45] Laurent Colbois, Tiago de Freitas Pereira, and Sébastien Marcel. 2021. On the use of automatically generated synthetic image datasets for
benchmarking face recognition. In 2021 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 1–8.
[46] Simon A Cole and Barry C Scheck. 2017. Fingerprints and Miscarriages of Justice: Other Types of Error and a Post-Conviction Right to Database
Searching. Alb. L. Rev. 81 (2017), 807.
[47] European Commission. 2019. Ethics Guidelines for Trustworthy AI. https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-
ai
[48] Kate Conger, Richard Fausset, and Serge F Kovaleski. 2019. San Francisco bans facial recognition technology. The New York Times 14 (2019), 1.
[49] Timothy F Cootes, Christopher J Taylor, David H Cooper, and Jim Graham. 1995. Active shape models-their training and application. Computer
vision and image understanding 61, 1 (1995), 38–59.
[50] Ciprian Adrian Corneanu, Marc Oliu Simón, Jeffrey F Cohn, and Sergio Escalera Guerrero. 2016. Survey on rgb, 3d, thermal, and multimodal
approaches for facial expression recognition: History, trends, and affect-related applications. IEEE transactions on pattern analysis and machine
intelligence 38, 8 (2016), 1548–1568.
[51] Clement Creusot, Nick Pears, and Jim Austin. 2013. A machine-learning approach to keypoint detection and landmarking on 3D meshes.
International journal of computer vision 102, 1 (2013), 146–179.
[52] Miguel De-la Torre, Eric Granger, Paulo VW Radtke, Robert Sabourin, and Dmitry O Gorodnichy. 2015. Partially-supervised learning from facial
trajectories for face recognition in video surveillance. Information fusion 24 (2015), 31–53.
[53] Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019. Accurate 3d face reconstruction with weakly-supervised learning:
From single image to image set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0.
[54] Damien Dessimoz and Christophe Champod. 2008. Linkages between biometrics and forensic science. In Handbook of biometrics. Springer, 425–459.
[55] Prithviraj Dhar, Ankan Bansal, Carlos D Castillo, Joshua Gleason, P Jonathon Phillips, and Rama Chellappa. 2020. How are attributes expressed in
face DCNNs?. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 85–92.
[56] Itiel E Dror. 2020. The error in “error rate”: Why error rates are so needed, yet so elusive. Journal of forensic sciences 65, 4 (2020), 1034–1039.
[57] Itiel E Dror and Glenn Langenburg. 2019. “Cannot decide”: the fine line between appropriate inconclusive determinations versus unjustifiably
deciding not to decide. Journal of forensic sciences 64, 1 (2019), 10–15.
[58] Itiel E Dror and Nicholas Scurich. 2020. (Mis) use of scientific measurements in forensic science. Forensic Science International: Synergy 2 (2020),
333–338.
[59] Abhishek Dutta, Raymond NJ Veldhuis, and Lieuwe Jan Spreeuwers. 2012. Non-frontal Model Based Approach to Forensic Face Recognition. In
ICT: The Innovation Highway 2012.
[60] Gary Edmond, Katherine Biber, Richard Kemp, and Glenn Porter. 2009. Law’s looking glass: expert identification evidence derived from photographic
and video images. Current Issues in Criminal Justice 20, 3 (2009), 337–377.

Manuscript submitted to ACM


30 La Cava et al.

[61] Bernhard Egger, William AP Smith, Ayush Tewari, Stefanie Wuhrer, Michael Zollhoefer, Thabo Beeler, Florian Bernard, Timo Bolkart, Adam
Kortylewski, Sami Romdhani, et al. 2020. 3d morphable face models—past, present, and future. ACM Transactions on Graphics (TOG) 39, 5 (2020),
1–38.
[62] ENFSI. 2018. ENFSI-BPM-DI-01 Version 01 - January 2018. Best Practice Manual for Facial Image Comparison. https://enfsi.eu/wp-content/
uploads/2017/06/ENFSI-BPM-DI-01.pdf
[63] Rosemary J Erickson and Rita James Simon. 1998. The use of social science data in Supreme Court decisions. University of Illinois Press.
[64] Alejandro J Estudillo, Peter Hills, and Hoo Keat Wong. 2021. The effect of face masks on forensic face matching: An individual differences study.
Journal of Applied Research in Memory and Cognition 10, 4 (2021), 554–563.
[65] Xin Fan, Shichao Cheng, Kang Huyan, Minjun Hou, Risheng Liu, and Zhongxuan Luo. 2020. Dual neural networks coupling data regression with
explicit priors for monocular 3D face reconstruction. IEEE Transactions on Multimedia 23 (2020), 1252–1263.
[66] Haiwen Feng, Timo Bolkart, Joachim Tesch, Michael J Black, and Victoria Abrevaya. 2022. Towards racially unbiased skin tone estimation via scene
disambiguation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIII. Springer,
72–90.
[67] Zuzana Ferková and Petr Matula. 2019. Multimodal Point Distribution Model for Anthropological Landmark Detection. In 2019 IEEE International
Conference on Image Processing (ICIP). IEEE, 2986–2990.
[68] Zuzana Ferková, Petra Urbanová, Dominik Černỳ, Marek Žuži, and Petr Matula. 2020. Age and gender-based human face reconstruction from
single frontal image. Multimedia Tools and Applications 79, 5 (2020), 3217–3242.
[69] Claudio Ferrari, Giuseppe Lisanti, Stefano Berretti, and Alberto Del Bimbo. 2016. Effective 3D based frontalization for unconstrained face recognition.
In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 1047–1052.
[70] Paolo Ferrua. 2008. Metodo scientifico e processo penale. (2008).
[71] Agence France-Presse. 2017. From ale to jail: facial recognition catches criminals at China beer festival. The Guardian 1 (2017).
[72] Danilo Franco, Luca Oneto, Nicolò Navarin, and Davide Anguita. 2021. Toward learning trustworthily from data combining privacy, fairness, and
explainability: an application to face recognition. Entropy 23, 8 (2021), 1047.
[73] Haibin Fu, Shaojun Bian, Ehtzaz Chaudhry, Andres Iglesias, Lihua You, and Jian Jun Zhang. 2021. State-of-the-art in 3D face reconstruction from a
single RGB image. In International Conference on Computational Science. Springer, 31–44.
[74] Giorgio Fumera, Gian Luca Marcialis, Battista Biggio, Fabio Roli, and Stephanie Caswell Schuckers. 2014. Multimodal anti-spoofing in biometric
recognition systems. In Handbook of Biometric Anti-Spoofing. Springer, 165–184.
[75] Mary Grace Galterio, Simi Angelic Shavit, and Thaier Hayajneh. 2018. A review of facial biometrics security for smart devices. Computers 7, 3
(2018), 37.
[76] Clare Garvie. 2016. The perpetual line-up: Unregulated police face recognition in America. Georgetown Law, Center on Privacy & Technology.
[77] Zhenglin Geng, Chen Cao, and Sergey Tulyakov. 2020. Towards photo-realistic facial expression manipulation. International Journal of Computer
Vision 128, 10 (2020), 2744–2761.
[78] Athinodoros S Georghiades, Peter N Belhumeur, and David J Kriegman. 2000. From few to many: Generative models for recognition under variable
pose and illumination. In Proceedings fourth ieee international conference on automatic face and gesture recognition (cat. no. pr00580). IEEE, 277–284.
[79] Cognitec Systems GmbH. 2013. FaceVACS SDK 8.8.0. http://www.cognitec-systems
[80] Mislav Grgic, Kresimir Delac, and Sonja Grgic. 2011. SCface–surveillance cameras face database. Multimedia tools and applications 51, 3 (2011),
863–879.
[81] Ralph Gross, Iain Matthews, Jeffrey Cohn, Takeo Kanade, and Simon Baker. 2010. Multi-pie. Image and vision computing 28, 5 (2010), 807–813.
[82] Patrick J Grother, Patrick J Grother, P Jonathon Phillips, and George W Quinn. 2011. Report on the evaluation of 2D still-image face recognition
algorithms. US Department of Commerce, National Institute of Standards and Technology.
[83] Facial Identification Scientific Working Group et al. 2018. Facial image comparison feature list for morphological analysis. Retrieved January 20
(2018), 2020.
[84] Facial Identification Scientific Working Group et al. 2019. Facial comparison overview and methodology guidelines. Retrieved January 20 (2019),
2020.
[85] Facial Identification Scientific Working Group et al. 2021. Image Factors to Consider in Facial Image Comparison. Retrieved January (2021).
[86] Facial Identification Scientific Working Group et al. 2021. Physical Stability of Facial Features of Adults. Retrieved January (2021).
[87] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining
black box models. ACM computing surveys (CSUR) 51, 5 (2018), 1–42.
[88] David Gunning, Mark Stefik, Jaesik Choi, Timothy Miller, Simone Stumpf, and Guang-Zhong Yang. 2019. XAI—Explainable artificial intelligence.
Science robotics 4, 37 (2019), eaay7120.
[89] Guodong Guo and Na Zhang. 2019. A survey on deep learning based face recognition. Computer vision and image understanding 189 (2019), 102805.
[90] Jianzhu Guo, Xiangyu Zhu, and Zhen Lei. 2018. 3DDFA. https://github.com/cleardusk/3DDFA.
[91] Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z Li. 2020. Towards Fast, Accurate and Stable 3D Dense Face Alignment. In
Proceedings of the European Conference on Computer Vision (ECCV).
[92] Linfeng Guo and Yan Meng. 2006. What is wrong and right with MSE?. In Eighth IASTED International Conference on Signal and Image Processing.
212–215.
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 31

[93] Hu Han and Anil K Jain. 2012. 3D face texture modeling from uncalibrated frontal and profile images. In 2012 IEEE Fifth International Conference on
Biometrics: Theory, Applications and Systems (BTAS). IEEE, 223–230.
[94] M Hassaballah and Saleh Aly. 2015. Face recognition: challenges, achievements and future directions. IET Computer Vision 9, 4 (2015), 614–626.
[95] Tal Hassner, Shai Harel, Eran Paz, and Roee Enbar. 2015. Effective face frontalization in unconstrained images. In Proceedings of the IEEE conference
on computer vision and pattern recognition. 4295–4304.
[96] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference
on computer vision and pattern recognition. 770–778.
[97] Carrie L Heike, Kristen Upson, Erik Stuhaug, and Seth M Weinberg. 2010. 3D digital stereophotogrammetry: a practical guide to facial image
acquisition. Head & face medicine 6, 1 (2010), 1–11.
[98] Javier Hernandez-Ortega, Javier Galbally, Julian Fiérrez, and Laurent Beslay. 2020. Biometric quality: Review and application to face recognition
with faceqnet. arXiv preprint arXiv:2006.03298 (2020).
[99] Dallas Hill, Christopher D O’Connor, and Andrea Slane. 2022. Police use of facial recognition technology: The potential for engaging the public
through co-constructed policy-making. International Journal of Police Science & Management (2022), 14613557221089558.
[100] Kashmir Hill. 2020. Wrongfully accused by an algorithm. In Ethics of Data and Analytics. Auerbach Publications, 138–142.
[101] Yu-Jin Hong. 2022. Facial Identity Verification Robust to Pose Variations and Low Image Resolution: Image Comparison Based on Anatomical
Facial Landmarks. Electronics 11, 7 (2022), 1067.
[102] Xiao Hu, Shaohu Peng, Li Wang, Zhao Yang, and Zhaowen Li. 2017. Surveillance video face recognition with single sample per person based on 3D
modeling and blurring. Neurocomputing 235 (2017), 46–58.
[103] Gary B Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. 2008. Labeled faces in the wild: A database forstudying face recognition in
unconstrained environments. In Workshop on faces in’Real-Life’Images: detection, alignment, and recognition.
[104] Patrik Huber, Guosheng Hu, Rafael Tena, Pouria Mortazavian, P Koppen, William J Christmas, Matthias Ratsch, and Josef Kittler. 2016. A
multiresolution 3d morphable face model and fitting framework. In Proceedings of the 11th international joint conference on computer vision, imaging
and computer graphics theory and applications.
[105] Lucas Introna and Helen Nissenbaum. 2010. Facial recognition technology a survey of policy and implementation issues. (2010).
[106] ISO/IEC JTC 1/SC 37 Biometrics. . ISO/IEC DIS 29794-1 Information technology — Biometric sample quality — Part 1: Framework. https:
//www.iso.org/standard/79519.html.
[107] ISO/IEC JTC 1/SC 37 Biometrics. . ISO/IEC WD TS 24358 Face-aware capture subsystem specifications. https://www.iso.org/standard/78489.html.
[108] Maëlig Jacquet and Christophe Champod. 2020. Automated face recognition in forensic science: Review and perspectives. Forensic science
international 307 (2020), 110124.
[109] Rabia Jafri and Hamid R Arabnia. 2009. A survey of face recognition techniques. journal of information processing systems 5, 2 (2009), 41–68.
[110] Anil K Jain, Debayan Deb, and Joshua J Engelsma. 2021. Biometrics: Trust, but Verify. IEEE Transactions on Biometrics, Behavior, and Identity Science
(2021).
[111] Anil K Jain, Brendan Klare, and Unsang Park. 2011. Face recognition: Some challenges in forensics. In 2011 IEEE International Conference on
Automatic Face & Gesture Recognition (FG). IEEE, 726–733.
[112] Anil K Jain and Arun Ross. 2015. Bridging the gap: from biometrics to forensics. Philosophical Transactions of the Royal Society B: Biological Sciences
370, 1674 (2015), 20140254.
[113] László A Jeni, Jeffrey F Cohn, and Fernando De La Torre. 2013. Facing imbalanced data–recommendations for the use of performance metrics. In
2013 Humaine association conference on affective computing and intelligent interaction. IEEE, 245–251.
[114] Felix Juefei-Xu, Dipan K Pal, Karanhaar Singh, and Marios Savvides. 2015. A preliminary investigation on the sensitivity of COTS face recognition
systems to forensic analyst-style face processing for occlusions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops. 25–33.
[115] Dervis Karaboga et al. 2005. An idea based on honey bee swarm for numerical optimization. Technical Report. Technical report-tr06, Erciyes
university, engineering faculty, computer . . . .
[116] Jane Kaye, Linda Briceño Moraia, Liam Curren, Jessica Bell, Colin Mitchell, Sirpa Soini, Nils Hoppe, Morten Øien, and Emmanuelle Rial-Sebbag.
2016. Consent for biobanking: the legal frameworks of countries in the BioSHaRE-EU project. Biopreservation and biobanking 14, 3 (2016), 195–200.
[117] Brendan F Klare, Mark J Burge, Joshua C Klontz, Richard W Vorder Bruegge, and Anil K Jain. 2012. Face recognition performance: Role of
demographic information. IEEE Transactions on Information Forensics and Security 7, 6 (2012), 1789–1801.
[118] Krista F Kleinberg and Peter Vanezis. 2007. Variation in proportion indices and angles between selected facial landmarks with rotation in the
Frankfort plane. Medicine, science and the law 47, 2 (2007), 107–116.
[119] Krista F Kleinberg, Peter Vanezis, and A Mike Burton. 2007. Failure of anthropometry as a facial identification technique using high-quality
photographs. Journal of forensic sciences 52, 4 (2007), 779–783.
[120] Yassin Kortli, Maher Jridi, Ayman Al Falou, and Mohamed Atri. 2020. Face recognition systems: A survey. Sensors 20, 2 (2020), 342.
[121] KS Krishnapriya, Vítor Albiero, Kushal Vangara, Michael C King, and Kevin W Bowyer. 2020. Issues related to face recognition accuracy varying
based on race and skin tone. IEEE Transactions on Technology and Society 1, 1 (2020), 8–20.
[122] Kelsey M Kyllonen and Keith L Monson. 2020. Depiction of ethnic facial aging by forensic artists and preliminary assessment of the applicability of
facial averages. Forensic Science International 313 (2020), 110353.
Manuscript submitted to ACM
32 La Cava et al.

[123] Simone Maurizio La Cava, Giulia Orrù, Tomáš Goldmann, Martin Drahansky, and Gian Luca Marcialis. 2022. 3D Face Reconstruction for Forensic
Recognition-A Survey. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 930–937.
[124] Napa Lakshmi and Megha P Arakeri. 2018. Face recognition in surveillance video for criminal investigations: a review. In International Conference
on Communication, Networks and Computing. Springer, 351–364.
[125] Timothy Lau and Alex Biedermann. 2019. Assessing AI Output in Legal Decision-Making with Nearest Neighbors. Penn St. L. Rev. 124 (2019), 609.
[126] Anja Leipner, Zuzana Obertová, Martin Wermuth, Michael Thali, Thomas Ottiker, and Till Sieberth. 2019. 3D mug shot—3D head models from
photogrammetry for forensic identification. Forensic science international 300 (2019), 6–12.
[127] Jiawei Li, Yiming Li, Xingchun Xiang, Shu-Tao Xia, Siyi Dong, and Yun Cai. 2020. Tnt: An interpretable tree-network-tree learning framework
using knowledge distillation. Entropy 22, 11 (2020), 1203.
[128] PEI LI, PATRICK J FLYNN, LORETO PRIETO, and DOMINGO MERY. 2019. Face Recognition in Low Quality Images: A Survey. ACM Comput. Surv
1, 1 (2019).
[129] Stan Z Li, RuFeng Chu, ShengCai Liao, and Lun Zhang. 2007. Illumination invariant face recognition using near-infrared images. IEEE Transactions
on pattern analysis and machine intelligence 29, 4 (2007), 627–639.
[130] Yue Li, Liqian Ma, Haoqiang Fan, and Kenny Mitchell. 2018. Feature-preserving detailed 3d face reconstruction from a single image. In Proceedings
of the 15th ACM SIGGRAPH European Conference on Visual Media Production. 1–9.
[131] Jie Liang, Feng Liu, Huan Tu, Qijun Zhao, and Anil K Jain. 2018. On mugshot-based arbitrary view face recognition. In 2018 24th International
Conference on Pattern Recognition (ICPR). IEEE, 3126–3131.
[132] Jie Liang, Huan Tu, Feng Liu, Qijun Zhao, and Anil K Jain. 2020. 3D face reconstruction from mugshots: Application to arbitrary view face
recognition. Neurocomputing 410 (2020), 12–27.
[133] Jiangke Lin, Yi Yuan, Tianjia Shao, and Kun Zhou. 2020. Towards high-fidelity 3d face reconstruction from in-the-wild images using graph
convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5891–5900.
[134] Feng Liu, Ronghang Zhu, Dan Zeng, Qijun Zhao, and Xiaoming Liu. 2018. Disentangling features in 3D face shapes for joint face reconstruction
and recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5216–5225.
[135] JE Loohuis. 2021. Synthesising Security Camera Images for Face Recognition. B.S. thesis. University of Twente.
[136] Nessa Lynch, Liz Campbell, Joe Purshouse, and Marcin Betkier. 2020. Facial Recognition Technology in New Zealand: Towards a Legal and Ethical
Framework. (2020).
[137] Xanthé Mallett and Martin P Evison. 2013. Forensic facial comparison: issues of admissibility in the development of novel analytical technique.
Journal of forensic sciences 58, 4 (2013), 859–865.
[138] Gian Luca Marcialis, Fabio Roli, and Gianluca Fadda. 2014. A novel method for head pose estimation based on the “Vitruvian Man”. International
Journal of Machine Learning and Cybernetics 5, 1 (2014), 111–124.
[139] Giulia Margagliotti and Timothy Bollé. 2019. Machine learning & forensic science. Forensic Science International 298 (2019), 138–139. https:
//doi.org/10.1016/j.forsciint.2019.02.045
[140] MarketMarkets. 2021. Video Surveillance Market with COVID-19 Impact Analysis, By Offering (Hardware (Camera, Storage Device, Monitor),
Software (Video Analytics, VMS), Service (VSaaS)), System (IP, Analog), Vertical, and Geography - Global Forecast to 2026. https://www.
marketsandmarkets.com/Market-Reports/video-surveillance-market-645.html
[141] Vincenzo Mastronardi and M Dellisanti Fabiano Vilardi. 2014. Ricostruzione della Scena del Crimine in 3D. Mondo Digitale (2014), 2.
[142] Vincenzo Maria Mastronardi and Giuseppe Castellini. 2009. Meredith: luci e ombre a Perugia. Armando Editore.
[143] Lerato Masupha, Tranos Zuva, Seleman Ngwira, and Omobayo Esan. 2015. Face recognition techniques, their advantages, disadvantages and
performance evaluation. In 2015 International Conference on Computing, Communication and Security (ICCCS). IEEE, 1–5.
[144] Brianna Maze, Jocelyn Adams, James A. Duncan, Nathan Kalka, Tim Miller, Charles Otto, Anil K. Jain, W. Tyler Niggel, Janet Anderson, Jordan
Cheney, and Patrick Grother. 2018. IARPA Janus Benchmark - C: Face Dataset and Protocol. In 2018 International Conference on Biometrics (ICB).
158–165. https://doi.org/10.1109/ICB2018.2018.00033
[145] Rachel Metz. 2020. Portland passes broadest facial recognition ban in the US. CNN. Available at: https://edition. cnn. com/2020/09/09/tech/portland-
facial-recognition-ban/index. html (2020).
[146] Didier Meuwly, Daniel Ramos, and Rudolf Haraksim. 2017. A guideline for the validation of likelihood ratio methods used for forensic evidence
evaluation. Forensic science international 276 (2017), 142–153.
[147] Hoda Mohaghegh, Farid Boussaid, Hamid Laga, Hossein Rahmani, and Mohammed Bennamoun. 2023. Robust monocular 3D face reconstruction
under challenging viewing conditions. Neurocomputing 520 (2023), 82–93.
[148] Araceli Morales, Gemma Piella, and Federico M Sukno. 2021. Survey on 3D face reconstruction from uncalibrated images. Computer Science Review
40 (2021), 100400.
[149] Emilio Mordini. 2017. Ethics and policy of forensic biometrics. In Handbook of Biometrics for Forensic Science. Springer, 353–365.
[150] Reuben Moreton. 2021. Forensic Face Matching. Forensic Face Matching: Research and Practice (2021), 144.
[151] Reuben Moreton and Johanna Morley. 2011. Investigation into the use of photoanthropometry in facial image comparison. Forensic science
international 212, 1-3 (2011), 231–237.
[152] Vidhyashree Nagaraju and Lance Fiondella. 2016. A survey of homeland security biometrics and forensics research. In 2016 IEEE Symposium on
Technologies for Homeland Security (HST). IEEE, 1–7.
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 33

[153] Cedric Neumann, Ian W Evett, and James Skerrett. 2012. Quantifying the weight of evidence from a forensic fingerprint comparison: a new
paradigm. Journal of the Royal Statistical Society: Series A (Statistics in Society) 175, 2 (2012), 371–415.
[154] NeuroTechnology. . VeryLook. http://www.neurotechnology.com.
[155] Joao Neves, Juan Moreno, and Hugo Proença. 2018. QUIS-CAMPI: an annotated multi-biometrics data feed from surveillance scenarios. IET
Biometrics 7, 4 (2018), 371–379.
[156] Xin Ning, Fangzhe Nan, Shaohui Xu, Lina Yu, and Liping Zhang. 2020. Multi-view frontal face image generation: a survey. Concurrency and
Computation: Practice and Experience (2020), e6147.
[157] Unsang Park and Anil K Jain. 2010. Face matching and retrieval using soft biometrics. IEEE Transactions on Information Forensics and Security 5, 3
(2010), 406–415.
[158] Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D face model for pose and illumination invariant
face recognition. In 2009 sixth IEEE international conference on advanced video and signal based surveillance. Ieee, 296–301.
[159] Yuxi Peng. 2019. Face recognition at a distance: low-resolution and alignment problems. (2019).
[160] Alex Pentland, Baback Moghaddam, Thad Starner, et al. 1994. View-based and modular eigenspaces for face recognition. (1994).
[161] P Jonathon Phillips, Hyeonjoon Moon, Syed A Rizvi, and Patrick J Rauss. 2000. The FERET evaluation methodology for face-recognition algorithms.
IEEE Transactions on pattern analysis and machine intelligence 22, 10 (2000), 1090–1104.
[162] P Jonathon Phillips, Amy N Yates, Ying Hu, Carina A Hahn, Eilidh Noyes, Kelsey Jackson, Jacqueline G Cavazos, Géraldine Jeckeln, Rajeev
Ranjan, Swami Sankaranarayanan, et al. 2018. Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms.
Proceedings of the National Academy of Sciences 115, 24 (2018), 6171–6176.
[163] Paulo Henrique Pisani, Abir Mhenni, Romain Giot, Estelle Cherrier, Norman Poh, André Carlos Ponce de Leon Ferreira de Carvalho, Christophe
Rosenberger, and Najoua Essoukri Ben Amara. 2019. Adaptive biometric systems: Review and perspectives. ACM Computing Surveys (CSUR) 52, 5
(2019), 1–38.
[164] Bo Qiu. 2020. Application Analysis of Face Recognition Technology in Video Investigation. In Journal of Physics: Conference Series, Vol. 1651. IOP
Publishing, 012132.
[165] Siti Zaharah Abd Rahman, Siti Norul Huda Sheikh Abdullah, Lim Eng Hao, Mohammed Hasan Abdulameer, Nazri Ahmad Zamani, and Mohammad
Zaharudin A Darus. 2016. MAPPING 2D TO 3D FORENSIC FACIAL RECOGNITION VIA BIO-INSPIRED ACTIVE APPEARANCE MODEL. Jurnal
Teknologi 78, 2-2 (2016).
[166] Inioluwa Deborah Raji and Joy Buolamwini. 2019. Actionable auditing: Investigating the impact of publicly naming biased performance results of
commercial ai products. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 429–435.
[167] Daniel Ramos and Joaquin Gonzalez-Rodriguez. 2013. Reliable support: measuring calibration of likelihood ratios. Forensic science international
230, 1-3 (2013), 156–169.
[168] Daniel Ramos, Ram P Krish, Julian Fierrez, and Didier Meuwly. 2017. From biometric scores to forensic likelihood ratios. In Handbook of biometrics
for forensic science. Springer, 305–327.
[169] Karl Ricanek and Tamirat Tesafaye. 2006. Morph: A longitudinal image database of normal adult age-progression. In 7th international conference on
automatic face and gesture recognition (FGR06). IEEE, 341–345.
[170] Edward M Robinson. 2016. Crime scene photography. Academic Press.
[171] Andrea Macarulla Rodriguez, Zeno Geradts, and Marcel Worring. 2022. Calibration of score based likelihood ratio estimation in automated forensic
facial image comparison. Forensic Science International 334 (2022), 111239.
[172] Gemma Rotger Moll, Francesc Moreno-Noguer, Felipe Lumbreras, and Antonio Agudo Martínez. 2019. Detailed 3D face reconstruction from a
single RGB image. Journal of WSCG (Plzen, Print) 27, 2 (2019), 103–112.
[173] Ernestina Sacchetto. 2020. Face to face: il complesso rapporto tra automated facial recognition technology e processo penale. (2020).
[174] Debanjan Sadhya and Sanjay Kumar Singh. 2019. A comprehensive survey of unimodal facial databases in 2D and 3D domains. Neurocomputing
358 (2019), 188–210.
[175] Michael J Saks and Jonathan J Koehler. 2005. The coming paradigm shift in forensic identification science. Science 309, 5736 (2005), 892–895.
[176] Angelo Salici and Claudio Ciampini. 2017. Automatic face recognition and identification tools in the forensic science domain. In International
Tyrrhenian Workshop on Digital Communication. Springer, 8–17.
[177] Soubhik Sanyal, Timo Bolkart, Haiwen Feng, and Michael Black. 2019. Learning to Regress 3D Face Shape and Expression from an Image without
3D Supervision. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 7763–7772.
[178] Arman Savran, Neşe Alyüz, Hamdi Dibeklioğlu, Oya Çeliktutan, Berk Gökberk, Bülent Sankur, and Lale Akarun. 2008. Bosphorus database for 3D
face analysis. In European workshop on biometrics and identity management. Springer, 47–56.
[179] A Scalfati. 2019. Le indagini atipiche (2. edizione). (2019).
[180] Morgan Klaus Scheuerman, Kandrea Wade, Caitlin Lustig, and Jed R Brubaker. 2020. How we’ve taught algorithms to see identity: Constructing
race and gender in image databases for facial analysis. Proceedings of the ACM on Human-computer Interaction 4, CSCW1 (2020), 1–35.
[181] Torsten Schlett, Christian Rathgeb, Olaf Henniger, Javier Galbally, Julian Fierrez, and Christoph Busch. 2022. Face image quality assessment: A
literature survey. ACM Computing Surveys (CSUR) 54, 10s (2022), 1–49.
[182] Mohamad Firham Efendy Md Senan, Siti Norul Huda Sheikh Abdullah, Wafa Mohd Kharudin, and Nur Afifah Mohd Saupi. 2017. CCTV quality
assessment for forensics facial recognition analysis. In 2017 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence.
Manuscript submitted to ACM
34 La Cava et al.

IEEE, 649–655.
[183] Sahil Sharma and Vijay Kumar. 2021. 3D landmark-based face restoration for recognition using variational autoencoder and triplet loss. IET
Biometrics 10, 1 (2021), 87–98.
[184] Biao Shi, Huaijuan Zang, Rongsheng Zheng, and Shu Zhan. 2019. An efficient 3D face recognition approach using Frenet feature of iso-geodesic
curves. Journal of Visual Communication and Image Representation 59 (2019), 455–460.
[185] Jiazheng Shi, Ashok Samal, and David Marx. 2006. How effective are landmarks and their geometry for face recognition? Computer vision and
image understanding 102, 2 (2006), 117–133.
[186] Yichun Shi and Anil K Jain. 2019. Probabilistic face embeddings. In Proceedings of the IEEE/CVF International Conference on Computer Vision.
6902–6911.
[187] Raymond P Siljander and Lance W Juusola. 2012. CLANDESTINE PHOTOGRAPHY: Basic to Advanced Daytime and Nighttime Manual Surveillance
Photography Techniques-For Military Special Operations Forces, Law Enforcement, Intelligence Agencies, and Investigators. Charles C Thomas
Publisher.
[188] Terence Sim, Simon Baker, and Maan Bsat. 2002. The CMU pose, illumination, and expression (PIE) database. In Proceedings of fifth IEEE international
conference on automatic face gesture recognition. IEEE, 53–58.
[189] Sima Soltanpour, Boubakeur Boufama, and QM Jonathan Wu. 2017. A survey of local feature methods for 3D face recognition. Pattern Recognition
72 (2017), 391–406.
[190] Nicole A Spaun. 2007. Forensic biometrics from images and video at the Federal Bureau of Investigation. In 2007 First IEEE International Conference
on Biometrics: Theory, Applications, and Systems. IEEE, 1–3.
[191] Nicole A Spaun. 2009. Facial comparisons by subject matter experts: Their role in biometrics and their training. In International Conference on
Biometrics. Springer, 161–168.
[192] Ailsa Strathie and Allan McNeill. 2016. Facial wipes don’t wash: Facial image comparison by video superimposition reduces the accuracy of face
matching decisions. Applied Cognitive Psychology 30, 4 (2016), 504–513.
[193] Ailsa Strathie, Allan McNeill, and David White. 2012. In the dock: Chimeric image composites reduce identification accuracy. Applied Cognitive
Psychology 26, 1 (2012), 140–148.
[194] Abby Stylianou, Richard Souvenir, and Robert Pless. 2019. Visualizing deep similarity networks. In 2019 IEEE winter conference on applications of
computer vision (WACV). IEEE, 2029–2037.
[195] Ambika Suman. 2008. Using 3D pose alignment tools in forensic applications of Face Recognition. In 2008 IEEE Second International Conference on
Biometrics: Theory, Applications and Systems. IEEE, 1–6.
[196] Philipp Terhörst, Jan Niklas Kolf, Marco Huber, Florian Kirchbuchner, Naser Damer, Aythami Morales Moreno, Julian Fierrez, and Arjan Kuijper.
2021. A comprehensive study on face recognition biases beyond demographics. IEEE Transactions on Technology and Society 3, 1 (2021), 16–30.
[197] Massimo Tistarelli, Enrico Grosso, and Didier Meuwly. 2014. Biometrics in forensic science: challenges, lessons and new technologies. In
International Workshop on Biometric Authentication. Springer, 153–164.
[198] Pedro Tome, Julian Fierrez, Ruben Vera-Rodriguez, and Mark S Nixon. 2014. Soft biometrics and their application in person recognition at a
distance. IEEE Transactions on information forensics and security 9, 3 (2014), 464–475.
[199] Pedro Tome, Julian Fierrez, Ruben Vera-Rodriguez, and Daniel Ramos. 2013. Identification using face regions: Application and assessment in
forensic scenarios. Forensic science international 233, 1-3 (2013), 75–83.
[200] Pedro Tome, Ruben Vera-Rodriguez, Julian Fierrez, and Javier Ortega-Garcia. 2015. Facial soft biometric features for forensic face recognition.
Forensic science international 257 (2015), 271–284.
[201] Bill Triggs. 1997. Autocalibration and the absolute quadric. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern
Recognition. IEEE, 609–614.
[202] Matthew A Turk and Alex P Pentland. 1991. Face recognition using eigenfaces. In Proceedings. 1991 IEEE computer society conference on computer
vision and pattern recognition. IEEE Computer Society, 586–587.
[203] Petra Urbanová, Zuzana Ferková, Marie Jandová, Mikoláš Jurda, Dominik Černỳ, and Jiří Sochor. 2018. Introducing the FIDENTIS 3D face database.
AnthropologicAl review 81, 2 (2018), 202–223.
[204] Chris van Dam, Raymond Veldhuis, and Luuk Spreeuwers. 2012. Towards 3d facial reconstruction from uncalibrated cctv footage. In Information
Theory in the Benelux and The 2rd Joint WIC/IEEE Symposium on Information Theory and Signal Processing in the Benelux. 228.
[205] Chris van Dam, Raymond Veldhuis, and Luuk Spreeuwers. 2013. Landmark-based model-free 3d face shape reconstruction from video sequences.
In 2013 International Conference of the BIOSIG Special Interest Group (BIOSIG). IEEE, 1–5.
[206] Chris van Dam, Raymond Veldhuis, and Luuk Spreeuwers. 2016. Face reconstruction from image sequences for forensic face comparison. IET
biometrics 5, 2 (2016), 140–146.
[207] Ruben Vera-Rodriguez, Pedro Tome, Julian Fierrez, Nicomedes Expósito, and Francisco Javier Vega. 2013. Analysis of the variability of facial
landmarks in a forensic scenario. In 2013 International Workshop on Biometrics and Forensics (IWBF). IEEE, 1–4.
[208] Ruben Vera-Rodriguez, Pedro Tome, Julian Fierrez, and Javier Ortega-Garcia. 2013. Comparative analysis of the variability of facial landmarks for
forensics using CCTV images. In Pacific-Rim Symposium on Image and Video Technology. Springer, 409–418.
[209] Rajesh Verma, Navdha Bhardwaj, Arnav Bhavsar, and Kewal Krishan. 2022. Towards facial recognition using likelihood ratio approach to facial
landmark indices from images. Forensic Science International: Reports 5 (2022), 100254.
Manuscript submitted to ACM
3D Face Reconstruction: the Road to Forensics 35

[210] Frank Wallhoff, Stefan Muller, and Gerhard Rigoll. 2001. Recognition of face profiles from the MUGSHOT database using a hybrid connectionist/HMM
approach. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), Vol. 3. IEEE, 1489–1492.
[211] Haitao Wang, Yangsheng Wang, and Hong Wei. 2003. Face representation and reconstruction under different illumination conditions. In Proceedings
on Seventh International Conference on Information Visualization, 2003. IV 2003. IEEE, 72–78.
[212] Mei Wang, Weihong Deng, Jiani Hu, Xunqiang Tao, and Yaohai Huang. 2019. Racial faces in the wild: Reducing racial bias by information
maximization adaptation network. In Proceedings of the ieee/cvf international conference on computer vision. 692–702.
[213] Wei Wang, Jing Dong, Bo Tieniu Tan, et al. 2017. Position determines perspective: Investigating perspective distortion for image forensics of faces.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1–9.
[214] Andrew B Watson, Quingmin J Hu, and John F McGowan III. 2001. Digital video quality metric based on human vision. Journal of Electronic
imaging 10, 1 (2001), 20–29.
[215] Craig I Watson. 1993. NIST special database 14. Fingerprint Database, US National Institute of Standards and Technology (1993).
[216] David White, Kristin Norell, P Jonathon Phillips, and Alice J O’Toole. 2017. Human factors in forensic face identification. In Handbook of biometrics
for forensic science. Springer, 195–218.
[217] SM Willis, L McKenna, S McDermott, G O’Donell, A Barrett, B Rasmusson, A Nordgaard, CEH Berger, MJ Sjerps, J Lucena-Molina, et al. 2015.
ENFSI guideline for evaluative reporting in forensic science, European network of forensic science institutes.
[218] Sarah Wu. 2019. Somerville City Council Passes Facial Recognition Ban. Boston Globe (2019).
[219] Wei Xiong, Hongyu Yang, Pei Zhou, Keren Fu, and Jiangping Zhu. 2021. Spatiotemporal correlation-based accurate 3D face imaging using speckle
projection and real-time improvement. Applied Sciences 11, 18 (2021), 8588.
[220] Yanjun Yan and Lisa Ann Osadciw. 2008. Bridging biometrics and forensics. In Security, Forensics, Steganography, and Watermarking of Multimedia
Contents X, Vol. 6819. SPIE, 278–285.
[221] Moi Hoon Yap, Nazre Batool, Choon-Ching Ng, Mike Rogers, and Kevin Walker. 2021. A Survey on Facial Wrinkles Detection and Inpainting:
Datasets, Methods, and Challenges. IEEE Transactions on Emerging Topics in Computational Intelligence (2021).
[222] Bangjie Yin, Luan Tran, Haoxiang Li, Xiaohui Shen, and Xiaoming Liu. 2019. Towards interpretable face recognition. In Proceedings of the IEEE/CVF
International Conference on Computer Vision. 9348–9357.
[223] Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, and Manmohan Chandraker. 2017. Towards large-pose face frontalization in the wild. In Proceedings
of the IEEE international conference on computer vision. 3990–3999.
[224] Mineo Yoshino, Hideaki Matsuda, Satoshi Kubota, Kazuhiko Imaizumi, and Sachio Miyasaka. 2000. Computer-assisted facial image identification
system using a 3-D physiognomic range finder. Forensic science international 109, 3 (2000), 225–237.
[225] Dilovan Asaad Zebari, Araz Rajab Abrahim, Dheyaa Ahmed Ibrahim, Gheyath M Othman, and Falah YH Ahmed. 2021. Analysis of Dense
Descriptors in 3D Face Recognition. In 2021 IEEE 11th International Conference on System Engineering and Technology (ICSET). IEEE, 171–176.
[226] Chris G Zeinstra, Didier Meuwly, A Cc Ruifrok, Raymond NJ Veldhuis, and Lieuwe Jan Spreeuwers. 2018. Forensic face recognition as a means to
determine strength of evidence: a survey. Forensic Sci Rev 30, 1 (2018), 21–32.
[227] Chris G Zeinstra, Raymond NJ Veldhuis, Luuk J Spreeuwers, Arnout CC Ruifrok, and Didier Meuwly. 2017. ForenFace: a unique annotated forensic
facial image dataset and toolset. IET biometrics 6, 6 (2017), 487–494.
[228] Dan Zeng, Shuqin Long, Jing Li, and Qijun Zhao. 2016. A novel approach to mugshot based arbitrary view face recognition. Journal of the Optical
Society of Korea 20, 2 (2016), 239–244.
[229] Dan Zeng, Qijun Zhao, Shuqin Long, and Jing Li. 2017. Examplar coherent 3D face reconstruction from forensic mugshot database. Image and
Vision Computing 58 (2017), 193–203.
[230] Xiaozheng Zhang, Yongsheng Gao, and Maylor KH Leung. 2008. Recognizing rotated faces from frontal and side views: An approach toward
effective use of mugshot databases. IEEE Transactions on Information Forensics and Security 3, 4 (2008), 684–697.
[231] Zhenyu Zhang, Yanhao Ge, Renwang Chen, Ying Tai, Yan Yan, Jian Yang, Chengjie Wang, Jilin Li, and Feiyue Huang. 2021. Learning to aggregate
and personalize 3d face from in-the-wild photo collection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
14214–14224.
[232] Sanqiang Zhao, Wen Gao, Shiguang Shan, and Baocai Yin. 2004. Enhance the alignment accuracy of active shape models using elastic graph
matching. In International Conference on Biometric Authentication. Springer, 52–58.
[233] Hang Zhou, Jihao Liu, Ziwei Liu, Yu Liu, and Xiaogang Wang. 2020. Rotate-and-render: Unsupervised photorealistic face rotation from single-view
images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5911–5920.
[234] Michael Zollhöfer, Justus Thies, Pablo Garrido, Derek Bradley, Thabo Beeler, Patrick Pérez, Marc Stamminger, Matthias Nießner, and Christian
Theobalt. 2018. State of the art on monocular 3D face reconstruction, tracking, and applications. In Computer graphics forum, Vol. 37. Wiley Online
Library, 523–550.
[235] Jin Zou, Xu Fu, Chi Gong, and Yi Shi. 2021. Is Face Recognition Being Abused?–A Case Study from the Perspective of Personal Privacy. In 2021 7th
Annual International Conference on Network and Information Systems for Computers (ICNISC). IEEE, 957–962.

Manuscript submitted to ACM

You might also like