0% found this document useful (0 votes)

32 views19 pages

Deepfake Detection For Human Face Images and Videos: A Survey

Uploaded by

21it422

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views19 pages

Deepfake Detection For Human Face Images and Videos: A Survey

Uploaded by

21it422

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Received January 25, 2022, accepted February 9, 2022, date of publication February 11, 2022, date of current version

February 22, 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3151186

DeepFake Detection for Human Face Images

and Videos: A Survey
ASAD MALIK 1 , (Member, IEEE), MINORU KURIBAYASHI 2 , (Senior Member, IEEE),
SANI M. ABDULLAHI 3 , (Member, IEEE), AND AHMAD NEYAZ KHAN 4 , (Member, IEEE)
1 Department of Computer Science, Aligarh Muslim University, Aligarh 202002, India
2 Department of Electrical and Communication Engineering, Okayama University, Okayama 7008530, Japan
3 College of Computer and Information Technology, China Three Gorges University, Yichang 443002, China
4 Department of Computer Application, Integral University, Lucknow 611731, India

Corresponding author: Asad Malik (amalik_co@myamu.ac.in)

This research was supported by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number 19K22846, Japan Science
and Technology agency Strategic International Collaborative Research Program (JST SICORP) Grant Number JPMJSC20C3, and Japan
Science and Technology agency Core Research for Evolutional Science and Technology (JST CREST) Grant Number JPMJCR20D3.

ABSTRACT Techniques for creating and manipulating multimedia information have progressed to the point
where they can now ensure a high degree of realism. DeepFake is a generative deep learning algorithm that
creates or modifies face features in a superrealistic form, in which it is difficult to distinguish between
real and fake features. This technology has greatly advanced and promotes a wide range of applications in
TV channels, video game industries, and cinema, such as improving visual effects in movies, as well as a
variety of criminal activities, such as misinformation generation by mimicking famous people. To identify
and classify DeepFakes, research in DeepFake detection using deep neural networks (DNNs) has attracted
increased interest. Basically, DeepFake is the regenerated media that is obtained by injecting or replacing
some information within the DNN model. In this survey, we will summarize the DeepFake detection methods
in face images and videos on the basis of their results, performance, methodology used and detection type.
We will review the existing types of DeepFake creation techniques and sort them into five major categories.
Generally, DeepFake models are trained on DeepFake datasets and tested with experiments. Moreover,
we will summarize the available DeepFake dataset trends, focusing on their improvements. Additionally,
the issue of how DeepFake detection aims to generate a generalized DeepFake detection model will be
analyzed. Finally, the challenges related to DeepFake creation and detection will be discussed. We hope that
the knowledge encompassed in this survey will accelerate the use of deep learning in face image and video
DeepFake detection methods.

INDEX TERMS Deep learning, DeepFake, CNNs, GANs.

I. INTRODUCTION are also fueling an uptick in cybercrime. In this context,

Fake document detection is not a new issue. Rather, this issue the trend indicates serious vulnerabilities and a decrease in
has existed for quite some time. In the past, the process of the trustworthiness of digital data. Furthermore, discerning
legitimizing documents was confined to proofing, verifica- whether the acquired digital data are authentic or altered and
tion, and inquiry, and digital data had no significant role legitimizing digital documents are currently major problems.
in this process. The recent growth of digital data through- Multimedia forensics research [1] has been active for
out the Internet, as well as its relevance in everyday life, at least 15 years and comes from not only research com-
such as digital marketing, legal forensics imagery, medical munities but also major IT businesses and government
imagery, sensitive satellite image processing, and many other organizations. The U.S. Department of Defense’s Defense
applications, cannot be overlooked. Moreover, digital data in Advanced Research Projects Agency (DARPA) established
different applications are evolving in such a way that they the large-scale Media Forensic project (MediFor) in 2016 to
encourage research on media integrity, with significant
The associate editor coordinating the review of this manuscript and results in terms of methodologies and benchmark datasets.
approving it for publication was Michele Nappi. Digital media confirmation may check for physical, digital,

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 18757
A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

scenery production. This breakthrough, however, is vulner-

able to misuse. Many people with sinister intentions have
utilized these technologies to make fake videos of female
celebrities and members of the general public in ways that
have created significant societal issues. According to recent
research,2 96 percent of DeepFakes come from porn films.
Due to the lack of supporting data, the recognition of these
DeepFakes or fabricated images/videos3 is difficult. Many
malicious applications have made use of DeepFakes, such
as DeepNude,4 as they can take a fully dressed woman’s
photograph and generate an image with her clothes removed.
Because of the use of deep learning to construct DeepFakes
and web-based tools to quickly create DeepFakes, forgery
detection is extremely difficult for forensics professionals.
Thus, researchers are developing a DNN model to detect
DeepFakes.
In essence, the model is trained on DeepFake datasets and
FIGURE 1. An example of Style-GAN [4] images.
then tested in trials to see how well it performs. We will
discuss picture and video DeepFake detection techniques
in depth in this article. We will also review the DeepFake
and semantic integrity, according to the MediFor taxonomy. production methods and datasets that are employed to detect
Deep learning models’ efficacy can no longer be overlooked; DeepFakes. Recently, studies based on DeepFake generation
in fact, they are gradually replacing most technology and are and detection in pictures, audio, and videos [5]–[9] have been
being rapidly embraced by many research communities and published.
large IT firms. The main goals of this article are highlighted below:
The combination of deep learning and computer vision
• to introduce DeepFake tools that are used to manipulate
techniques, e.g., GANs [2] and autoencoders [3], has opened
the different aspects of images and videos;
the door to producing superrealistic fake images and videos,
• to introduce DeepFake datasets and some traditional
which are known as DeepFakes. DeepFakes (a combination
datasets for forensic evaluation; and
of the terms ‘‘deep learning’’ and ‘‘fakes’’) allow attackers or
• to review some recent existing DeepFake detection tech-
even nontechnical machine learning users to modify a picture
niques used in images and videos.
or video by swapping out the content and generating a new
image or video that cannot be differentiated by humans or The review starts with providing a technical background in
computers. The creation of DeepFakes reduces people’s trust Section II. Then, DeepFake tools and applications are dis-
in digital media content since they can no longer believe cussed in Section III, and Section IV proceeds to understand
the images they are seeing. In the absence of deep learning, the types of manipulation methods. Section V discusses the
research on identifying or detecting fake manipulated media available image and video datasets and their fungibility.
is considered traditional research. A brief survey of image and video detection methods is pre-
At present, generative deep models are very powerful for sented in Section VI. Then, additional major challenges for
creating DeepFakes, which are difficult to distinguish by DeepFake creation and detection are discussed in Section VI,
traditional methods. This gap creates the need for DeepFake and conclusions are drawn in Section VII.
detection research to maintain people’s trust in digital multi-
media. For example, FaceSwap1 is a technology that creates II. TECHNICAL BACKGROUND
DeepFake videos of genuine individuals performing fictional A. CNN BACKGROUND
activities, with even humans having difficulties differenti- The CNN or ConvNet is a special kind of deep-learning archi-
ating what is fake from what is authentic. These technolo- tecture that has gained much attention in computer vision and
gies can cause distress for and negatively affect those who robotics. The initial idea of CNN, called neocognitron, was
are targeted, promote disinformation and hate speech, and presented in 1979 by Kunihiko Fukushima [10], which later
even heighten political tensions, spark controversy, terrorism, became known as the predecessor of CNN. Furthermore, the
or violence. An example of different fake images generated CNN architecture has been explained by Le-Cun et al. [11];
by Style-GAN [4] is shown in Figure 1, which looks very later, an improved version was explained in [12]. A developed
realistic. The AI-based generation of DeepFakes has a wide CNN network called LeNet-5 was found to be able to clas-
range of applications in the computer vision and graph- sify handwritten digits. Popular architectures from 2012 to
ics industries, including human face synthesis and stunning 2 https://rb.gy/9ffkom
3 https://rb.gy/bv5530
1 https://faceswap.dev/ 4 https://rb.gy/lgho24

18758 VOLUME 10, 2022

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

FIGURE 2. The basic architecture of CNN.

2015 are examined in [13], along with their basic compo-

nents, and their applications are discussed in [14].
The basic structure of the CNN model comprises three FIGURE 3. The basic architecture of RNN.
types of layers: convolutional, pooling, and fully connected.
Figure 2 presents the basic structure of the CNN model. usually equal to the number of classes. A nonlinear function,
The purpose of the convolution layer is to perform feature such as ReLU, follows each fully connected layer. Finally,
extraction. In the convolutional operation, an array of num- a loss function is calculated to assess the compatibility of the
bers (kernel) is applied across inputs (tensor) to construct the CNN’s forward propagation output predictions with the pro-
feature map. The procedure of constructing a feature map is vided ground truth labels. The loss of CNN can be calculated
an elementwise product between each element of the kernel as follows:
and the input tensor, and the outputs are summed to obtain
N
the element of the kernel. The kernel convolves across all the 1 X
elements on the input tensor to construct the elements of the L= `(θ; y(n) , o(n) ), (4)
N
feature map for that kernel. An arbitrary number of feature n=1

maps can be obtained by implementing the convolution oper- where N denotes the number of input-output relations
ation with different kernels. While training, the convolution (x (n) , y(n) ), x (n) is the nth input data, y(n) is its target label,
operation is called forward propagation; during backpropaga- and o(n) is the output of the CNN [13]. Training a CNN
tion, the gradient descent optimization technique updates the determines the global minima, which identify the best-fitting
learnable parameters (kernels and weights) according to the set of parameters by minimizing the loss function. Currently,
l ) at location (i, j) in the k th
loss value. The feature value (Zi,j,k many CNN models exist, such as AlexNet [15], ZFNet [16],
th
feature map of the l layer in [13] is as follows: VGGNet [17], GoogLeNet/Inception [18] and ResNet [19].
l
Zi,j,k = (Wkl )T xi,j
l
+ blk (1) B. RNN BACKGROUND
where Wkl and blk are the weight vector and bias term of the An RNN is a neural network in which the output from the
k th filterof the l th layer, l is the input patch
respectively. xi,j previous step is used as input in the next phase. All inputs
centered at location (i, j) of the l th layer. Then, a nonlinear and outputs in typical neural networks are independent of one
activation function is applied to detect nonlinear features such another; however, in some situations, such as when predicting
as sigmoid, tanh and ReLU. A nonlinear activation function the next word of a phrase, the prior words are necessary, and
A(·) can be expressed as: therefore, the previous words must be remembered. Conse-
quently, RNNs were created, which use a hidden layer to
ali,j,k = A(Zi,j
l
), (2) overcome the problem. The hidden state, which remembers
certain information about a sequence, is the most significant
where ali,j,k is the output value after applying the nonlinear aspect of RNNs. RNNs have a ‘‘memory’’ that stores all infor-
activation function. mation about the calculations. This memory utilizes the same
A pooling layer provides a typical downsampling oper- settings for each input since it produces the same outcome
ation to reduce the dimensionality of the feature maps to by performing the same job on all inputs or hidden layers.
introduce translation invariance to small shifts and distortions Unlike in other neural networks, this method minimizes the
and thereby decrease the number of subsequent learnable complexity of the parameters. When the gap between the
parameters. The pooling function is pool(·); for each feature relevant input data is large, Hochreiter and Schmidhuber
map al:,:,k , we have: [20] proposed long short-term memory (LSTM) in 1997,
yli,j,k = pool(alm,n,k ), ∀(m, n) ∈ Ri,j , (3) which handles long-term dependencies. LSTM has been the
focus of deep learning since it accomplishes nearly all the
where Ri,j is a local neighborhood around location (i, j). The exciting outcomes based on RNNs. The recurrent layers, also
fully connected layers are the final outputs of the CNN, such known as hidden layers in RNNs, are made up of recurrent
as the probabilities for each class in classification tasks. The cells whose states are influenced by both previous states and
number of output nodes in the final fully connected layer is current input via feedback connections. The classic recurrent

VOLUME 10, 2022 18759

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

PGGAN [23], BigGAN [24], and Style-GAN [4], [25],

[26], were created to improve designs, losses, and training
techniques.

III. TOOLS USED TO CREATE A DEEPFAKE

In recent years, deep learning has achieved remarkable
progress in computer vision and robotics. Moreover, the areas
of digital face images and video manipulation are of leading
interest because they use the power of GANs, which are
capable of producing very realistic results. However, GANs
still have challenges in establishing disentangled and control-
lable syntheses, particularly in the high-resolution domain.
FIGURE 4. The basic architecture of a GAN.
Disentangling distinct elements allows us to regulate changes
across all factors independently. Nevertheless, without fur-
sigma cell and LSTM with only input and output gates are ther adjustments such as regularization to encourage greater
depicted in Figure 3. The LSTM mathematical expressions disentanglement, this technique is difficult to apply in GANs.
are as follows: Table 1 shows the tools used to create deep-fake images and
it = σ (Wi ht−1 + Wi xt + bi ) videos. Mobile-based applications such as the Chinese apps
ZAO, Auto FaceSwap and FaceApp allow ordinary internet
ĉt = tanh(Wĉ ht−1 + Wĉ xt + bĉ ) users to easily create fake images and videos, which greatly
ct = ct−1 + it · ĉt helps the spread of DeepFakes. Several spoof videos cre-
ot = σ (Wo ht−1 + Wo xt + bo ) ated using GAN-based face-swapping techniques have been
ht = ot · tanh(ct ), (5) uploaded to YouTube and other video sites. Face swapping is
very popular for moving a face from a source image to a target
where xt , ct , ot and ht denote the input, the recurrent infor- image to obtain realistic, unedited results. The main idea
mation, and the output of the cell at time t, respectively; Wi , behind realistic face swapping is GANs [2]. Increasing num-
Wĉ , and Wo are the weights; and b is the bias. ct denotes the bers of face-swapping-, face synthesis-, face reenactment-
cell state of LSTM, and the operator ‘·’ denotes the pointwise and attribute manipulation-based applications are becoming
multiplication of two vectors. popular; for example, images produced using Style-GAN [4],
Style-GAN2 [25] and StyleGAN2-Ada [26] are becoming
C. GANS BACKGROUND increasingly realistic and completely indistinguishable from
GANs are a revolutionary tool used for teaching generative human vision systems. By manipulating skin color or eye size
models to generate realistic examples from a data distribution without influencing other facial parameters, StyleGAN [4]
[2]. Basically, GANs are a combination of two neural net- cannot be utilized to generate high-fidelity human faces, and
works: the generator, (G), and the discriminator, (D). These BigGAN [24] is unable to alter the color or length of a dog’s
two neural networks compete in a dynamic minimax game. hair without altering other aspects of the image.
The intuition behind this idea is that G attempts to create Basically, face manipulation methods can be divided into
fake samples, while D attempts to determine which samples five types [7]: entire face synthesis, identity swap, attribute
are fake and which are real. If the two models are allowed manipulation, expression swap and miscellaneous. Table 2
to compete for a long time, they will ultimately improve. shows the underlying idea of face manipulation methods.
In other words, the generator G aims to capture the data Detailed information on the face manipulation categories is
distribution, whereas a D aims to estimate the probability summarized below.
that a sample comes from the training data rather than from
G. The basic structure of the GAN model can be visualized A. ENTIRE FACE SYNTHESIS
in Figure 4. The mathematical minmax optimization (G∗ ) of This type of method generates nonexisting face images, usu-
neural networks G and D is as follows: ally using a powerful GAN, such as Style-GAN [4], Style-
GAN2 [25] and StyleGAN2-Ada [26]. These approaches
G∗ ∈ arg min max V (G, D) produce incredible outcomes, such as high-resolution facial
= arg min max EX ∼Pdata(X ) [log(D(X ))] images with a great degree of realism. Moreover, realistic
+EZ ∼PZ (Z ) [1 − log(D(G(Z ))], (6) face syntheses are becoming increasingly advanced. Entire
face synthesis is based on datasets such as Generated-
where Z is the input for generator G(Z ) with probability Images [4](100k-StyleGAN), Faces [27](100k-StyleGAN),
distribution PZ and return X with certain probability distri- DFFD [28](100k-StyleGAN, 200k-ProGAN), and iFake-
bution Pg . The discriminator D(X ) estimates the probability FaceDB [29](250k-StyleGAN, 80k-ProGAN). This kind of
that X is from the distribution of training data Pdata . Recently, manipulation might help a variety of businesses, including
various kinds of GANs, such as DCGAN [21], WGAN [22], video games and 3D modelling, but it could also be used for

18760 VOLUME 10, 2022

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

TABLE 1. Tools used to create a DeepFake.

TABLE 2. Facial manipulation techniques used to create DeepFakes.

FIGURE 5. Example of entire face synthesis in [25].

negative purposes, such as the development of very realistic

false accounts on social media to spread disinformation. Fig-
ure 5 depicts the nonexisting face images created by Style-
GAN2 [25].

B. IDENTITY SWAP
The identity swap technique, also called the face-swap
method, is very popular for replacing the face of one person
in an image or video with that of another person. An example
of an identity swap can be seen in Figure 6, where the face-swap datasets are UADFV (49-FakeApp), D-TIMIT
source image shows the identity, the target image provides (620-faceswap-GAN), FF++ (1k-FaceSwap,1k-DeepFake),
the attributes and a swapped face image is generated. Such DFD(3k-DeepFake), Celeb-DF (5k-DeepFake) and DFDC
swaps can be divided into two major types: i) graphics- Preview (4k-Unknown). This kind of manipulation might
based approaches such as FaceSwap and ii) deep learning be useful in a variety of industries, including the entertain-
technique-based approaches such as DeepFakes. The existing ment industry. However, it might also be used for malicious

VOLUME 10, 2022 18761

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

FIGURE 6. Example of identity swap in [30].

objectives, such as the production of celebrity pornographic FIGURE 8. Example of expression Swap in [34].
videos, fraud, and financial fraud.
406k-Neural-Textures [36]). This form of fraud could have
C. ATTRIBUTE MANIPULATION
significant consequences, such as a video of someone saying
Attribute manipulation, also known as face editing or face something that he or she never said.
retouching, entails changing aspects of the face, such as hair
or skin color, gender, age, and the addition of spectacles
E. MISCELLANEOUS
[31]. An example of attribute manipulation can be seen in
Figure 7, where Figure 7(a) shows the source image and the Regarding miscellaneous manipulation, we identified three
corresponding generated images: blond hair, gender, aged, types: face morphing, face deidentification, audio-to-video
and pale skin. Figure 7(b) shows the source image and the and text-to-video facial expression swaps.
corresponding generated images: angry, happy, fearful. This Face morphing is a technique used for creating artificial
manipulation process is usually carried out through a GAN, biometric face samples that mimic the biometric data of
such as the StarGAN approach proposed in [31]. The popu- multiple people. This type of manipulation leads to correctly
lar AI face editor FaceApp, which is a mobile application, verifying the created morphed face images against a manip-
is an example of this type of manipulation. The existing ulated reference in a facial recognition system database if a
attribute manipulation dataset is DFFD [28](80K-StarGAN, morphed face image is stored as a reference. Hence, morphed
12K-FaceAPP). Consumers may utilize this technology to face images constitute a significant threat to face recognition
test a wide range of items in a virtual environment, including systems, as they contradict the core principle of biometrics,
cosmetics and makeup, spectacles, and hairstyles. which is the unique link between the sample and its matching
person. [37] presented a comprehensive study of face morph-
ing in 2019, covering both morphing strategies and morphing
attack detectors.
Face deidentification is a type of manipulation used to
remove artificial biometric fingerprints from images and
videos. This technique can save artificial biometric finger-
print information for illegal verification. This action can be
accomplished in a variety of ways. The most basic method is
face blurring or pixelating. Other methods also exist, such as
swapping an identity or synthesis identity swapping (apply-
ing some operations, i.e., pose, expression). An adversarial
FIGURE 7. Example of attribute manipulation in [31]. autoencoder-based video face deidentification method was
demonstrated in [38].
Audio-to-video (A2V) and text-to-video (T2V) are also
D. EXPRESSION SWAP called lip-sync deep fakes [39]. Basically, the expression
Expression swap, also known as face reenactment, mod- of the face in a video is synthesized using audio or text.
ifies the facial expression of a person. An example of An example of a fake video [40] describes a method
an expression swap can be seen in Figure 8, where the used for synthesizing high-quality films of a person (in
input expression is transferred to the targeted image, which this case, Barack Obama) speaking with an accurate lip-
then generates a reenactment result. The available tech- sync track. Other important state-of-the-art methods are dis-
niques, such as image-level manipulation through popular cussed in [41], [42]. In addition, [43] presents a procedure
GAN architectures [32], [33] and some popular video-based for blending counterfeit recordings from a text that takes
manipulation techniques, such as Face2Face [34] and neural information from a video of an individual talking and the
textures [35], replace one person’s facial expression in a necessary content to be spoken and makes another video
video with another person’s facial expression. The existing wherein the individual’s lips are synchronized with the new
reenactment-based datasets are FF++(509k-Face2Face [34], words.

18762 VOLUME 10, 2022

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

TABLE 3. Publicly available forgeries detection datasets.

IV. DATASETS 160 tampered and 440 original images. The IEEE Informa-
Forensics datasets can be classified into two broad types: tra- tion Forensics and Security Technical Committee (IFS-TC)
ditional and DeepFake datasets. Traditional forensics datasets conducted the First Image Forensics Challenge (2013), which
are created manually with extensive manual effort under care- is an international competition that collected thousands of
fully controlled conditions such as camera artifacts, splicing, photographs of varied scenes, both indoors and outdoors,
inpainting, resampling and rotation detection. The Dresden using 25 digital cameras. The Wild Web Dataset (WWD)
Image Database (DID) [59] is based on camera fingerprint- [45] contains 82 cases of 92 forgery variants and 101 unique
ing and consists of 14,000 images from 73 cameras. The mask splice detections. The WWD aims to address that
73 different cameras were of 25 different models and camera gap in the evaluation of image tampering localization algo-
fingerprinting types (indoor and outdoor scenes). While most rithms. The performance of [45] is evaluated in [60]. The
traditional datasets incorporate image alteration forensics, CelebFaces Attributes Dataset (CelebA) is a large-scale face
only some of them cover video-based manipulation forensics. attribute dataset with more than 200K celebrity images, each
For example, MICC-F220, MICC F2000, and MICC-F600 with 40 attribute annotations. The images in this dataset
are image datasets used to detect copy-move modifications. cover large pose variations and background clutter. CelebA
MICC-F220 is composed of 110 tampered and 110 orig- has large diversities, large quantities, and rich annotations,
inal images, MICC-F2000 is composed of 700 tampered including 10,177 identities, 202,599 face images, 5 landmark
and 1300 original images, and MICC-F600 is composed of locations, and 40 binary attribute annotations per image.

VOLUME 10, 2022 18763

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

In 2017, a VISION dataset was created that contained 11,732 WildDeepfake dataset (WDF) [57] was found to consist of
original images and 648 original videos. The images were 7,314 face sequences extracted from 707 DeepFake videos
uploaded to social platforms such as Facebook and What- collected completely from the internet. WildDeepfake is a
sApp, and the videos were uploaded to YouTube and What- small dataset that can be used in addition to extending the
sApp, resulting in a total of 34,427 images and 1,914 videos. existing datasets. Moreover, WDF is used to develop and test
The second main type of forensics datasets are DeepFake the effectiveness of DeepFake detectors against real-world
datasets. These datasets are generally created by GAN-based DeepFakes. On the other hand, research on DeepFakes is also
models, which are very popular due to their realistic per- expanding to examine more than one face in a single image to
formance. The UADFV [48] consists of 49 real YouTube detect DeepFake forgery, such as the OpenForensics dataset
and 49 DeepFake videos. The DeepFake videos are gen- (OF) [58]. The OF dataset consists of 115K unrestricted
erated using the DNN model with FakeAPP. The average images with 334K human faces. Table 3 summarizes these
length of these videos is approximately 11:14 seconds, with existing datasets.
a typical resolution of 294 × 500. The DeepFake-TIMIT
(DF-TIMIT) dataset [49] was created by using the VidTIMIT V. DEEPFAKE DETECTION
dataset [61] and FaceSwap-GAN; 16 similar-looking pairs of DeepFake face images and video detection dominate research
people from VidTIMIT [61] were selected, and for each of the on monitoring multimedia information and have the pos-
32 people, the database generated approximately 10 videos itive intention to improve the confidentiality and integrity
using low-quality of size 64 × 64, i.e., DF-TIMIT-(LQ), of multimedia content. In addition, it is not an easy task
and high-quality of size 128 × 128, i.e., DF-TIMIT-(HQ) to detect such altered multimedia content. This task has
by using a face-swap GAN model. FaceFornesics (FF) become more challenging after the emergence of genera-
[50] is a DeepFake dataset that aims to perform forensic tive models. Basically, forgery detection in multimedia con-
tasks for facial identification and segmentation to forged tent entails analyzing the multimedia content to determine
images. It is composed of 1004 videos (face videos down- whether the generated multimedia has been tampered with
loaded from YouTube) over 500,000 frames. The two types or is original. In the past, forgery detection techniques were
of manipulation are source-to-target, where facial expres- considered traditional research; however, in recent years,
sions from a source video to a target video use Face2Face DNN (AI-based)-based generated multimedia detection has
[34], and self-reenactment, where Face2Face reenacts the become more popular. In this section, we will discuss both
facial expressions of a source video. The FaceFornesics++ traditional and DeepFakes forensics-based techniques.
(FF++) [51] dataset has 1,000 real videos collected from
YouTube, and 1,000 DeepFake videos were generated by A. TRADITIONAL FORENSIC-BASED TECHNIQUES
applying each of the 4 face modification techniques: Deep- To modify image content, various traditional image process-
Fake, Face2Face [34], FaceSwap and Neural Texture [36] ing technologies are employed, such as copy-move (splicing),
(4,000 face modification videos were created overall). These resampling (resize, rotate, stretch), and the addition and/or
fake videos have produced 1.8 million manipulated face removal of any part of the image. Traditional forensics-based
images. The Diverse Fake Face Dataset (DFFD) dataset com- techniques are commonly divided into two types: active and
bines multiple forgery types (FaceSwap, Deepfake, Deep- passive.
FaceLab, FaceAPP, StarGAN and StyleGAN) in a single Active techniques require prior knowledge of multime-
dataset. DeepFake Detection (DFD) [55] was developed by dia for the authentication process. Basically, at the time of
Google and JigSaw; 363 original videos were filmed with multimedia generation, some information is encoded, such
the assistance of 28 invited actors based on over 3,600 Deep- as watermarks and digital signatures. For instance, a water-
Fake videos using DeepFake techniques. In September 2019, mark is information that is added to a source image without
Amazon Web Services, Facebook, Microsoft, and a number degrading the visible artifact. Watermark extraction proce-
of academics collected a large-scale DeepFake dataset for dure is used to recover the watermark on the target image
the DeepFake Detection Challenge-Preview (DFDC-P) [53]. to discern whether the image has been manipulated. The
A full version of the DFDC-P was developed with eight manipulated portions in the target image can be detected
manipulation methods and is known as the DeepFake Detec- using the extracted watermark. Over the past few years, mim-
tion Challenge (DFDC). The Celeb-DF dataset [55] contains icking aspects of genuine users or generating hyperrealistic
590 actual videos and 5,639 DeepFake videos. Recently, masks at the presentation side for face images and videos
the DeeperForensics-1.0 dataset (DF-1.0) [56] was found have highlighted one kind of biometric vulnerability (bio-
to consist of 60,000 videos with a total of 17.6 million metric attack). To monitor or identify such biometric attacks,
frames for real-world face forgery detection. In addition, a variety of anti-spoofing techniques are used to counter these
100 paid actors were invited from 26 countries to collect attacks, including eye blink detection in live stream scenarios,
high-resolution images of size 1920 × 1080. The new end- challenge-response techniques, 3D cameras, Active Flash and
to-end face-swapping method (i.e., DF-VAE) was introduced deep learning.
and systematically applied to seven types of perturbations of Facial recognition [110] is essential for face image and
fake videos at five intensity levels. More recently, a small video detection before applying a traditional or a deep fake

18764 VOLUME 10, 2022

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

TABLE 4. DeepFake detection methods.

VOLUME 10, 2022 18765

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

method. In this context, many researchers are interested in to be misused for unwanted activities. However, it is also a
recognizing face images to identify authentic expressions, technique that cyber attackers employ to penetrate identifi-
such as gestures made by the human face, which commu- cation or authentication systems to gain illegitimate access,
nicate information such as fear, disgust, happiness, sadness, thus violating privacy and compromising social security and
surprise, anger, and neutrality. Umer et al. [111], [112] pro- democracy.
posed a method to identify human facial expressions using To combat the destructive impacts of DeepFakes,
data augmentation and fine-tuning the CNN model. A brief researchers have also turned dedicated attention to multi-
survey of biometric anti-spoofing methods for face recog- media forensic techniques to identify DeepFakes. Existing
nition is available in [113]. To check the validity of the methods have focused on either spatial and temporal artifacts
face images, Umer et al. [114] proposed a method that left from the generation process or data-driven classification.
combines preprocessing, feature extraction and classification Recently, researchers have used features such as those in
techniques. Initially, the landmark is extracted from the face Figure 9 to generate DeepFake detection models. This section
images to identify the face region of the person; next, the reviews these features to create detection methods, and a
detected face region is used to extract features. Finally, fea- summary of typical approaches is provided in Table 4. Incon-
tures are extracted from the detected facial region, and the sistencies, irregularities in the background, and GAN finger-
scores are fused to calculate the final result based on the prints are examples of spatial artifacts. Detecting fluctuations
performance of the classifier according to these features. in a person’s behavior, physiological signals, coherence, and
In contrast to active techniques, passive techniques do not video frame synchronization are all examples of temporal
require prior knowledge of multimedia for the authentica- artifacts.
tion process. In fact, statistical information about the source In this part, we will review recent DeepFake detection-
image (multimedia) that is highly consistent between distinct based techniques grouped into three types: (1) traditional-
images is used. Consequently, the inherent statistical informa- based techniques for DeepFakes, (2) DNN-based techniques
tion of images is utilized to detect any fake areas of the image. for DeepFakes, and (3) artifact analysis for DeepFakes.
Moreover, in the absence of digital watermarks, signatures,
or specialized hardware, passive forensic techniques are used 1) TRADITIONAL-BASED TECHNIQUES FOR DEEPFAKE
[115]. In Table 5, passive forensic techniques used in specific In this method, pixel-level differences in the image and videos
types of applications are summarized. are examined to identify DeepFakes. Focusing on pixels and
exploiting the correlations are easy to understand and pro-
TABLE 5. Traditional forensics methods.
vides hints in the detection process to clarify the variations
between real and counterfeit (fake). When images or videos
are modified by basic transformations, however, these efforts
suffer from robustness concerns.
A novel photoresponse nonuniformity (PRNU) analysis
method has been tested for its effectiveness at detecting
DeepFake video manipulation [62]. This PRNU analysis
reveals a statistically significant difference in mean normal-
ized cross-correlation scores between real and DeepFake
videos. However, the model has been tested on a very small
dataset. The DeepFake GUI OpenFaceSwap application was
used to create 10 authentic and 16 DeepFake images. The
results shows that the cut-off value of 0.05 has a 3.8% false
positive rate and a 0% false negative rate. In [64], a ste-
ganalysis method was adopted to identify DeepFake images.
In fact, the co-occurrence matrices were constructed from
B. DEEPFAKES FORENSICS-BASED TECHNIQUES RGB images, and the resulting values were trained with a
Currently, DeepFake forensics-based techniques are a very deep convolutional neural network to identify the fakes. The
active research area. Due to the popularity of DeepFake tools experimental result shows 99% classification accuracy for
on the internet, it is very easy to create fake content that looks cycleGAN- and StarGAN-based fake images. Li et al. [65]
highly realistic and is difficult to distinguish with traditional evaluated the statistical properties of deep network-generated
techniques. To mitigate this challenging task or classify the images, such as the correlation between adjacent pixels in
content as either fake or pristine, researchers are developing HSV and YCbCr color spaces, to distinguish DeepFake
DeepFake detection models. In contrast, many researchers are images. In Lips Don’t Lie, Haliassos et al. [66] suggested
focusing on generating generalized realistic models to create a generalizable and robust approach to detect face forgery
DeepFakes. Creating DeepFakes is fun for users because in videos also known as LipForensics. The fundamental
many web-based tools are available online to perform such theme is monitoring lip movements with high-level semantic
manipulations, which can still identify people and cause them inconsistencies that are present in many synthesized videos.

18766 VOLUME 10, 2022

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

FIGURE 9. Some important features used for detection.

Lugstein et al. [67] designed a novel pipeline to detect computer-generated image/video detection in [69], where a
DeepFakes using photoresponse nonuniformity (PRNU). capsule network was developed to resolve computer vision
Basically, the PRNU technique is famous for detecting facial challenges and digital forensics issues. The ability of a
retouching and face morphing attacks. In Lugstein et al. [67], capsule network based on a dynamic routing algorithm
the PRNU feature detection is similar to that in [116], [117] [118] to represent hierarchical pose relationships between
and adds a face image extraction stage, as well as an SVM object pieces has recently been demonstrated. To distinguish
classification stage. Two types of mesoscopic (a compact between fake and real images, a dynamic routing algorithm
facial video forgery detection network) models (Meso-4 and is used to route the outputs of the three capsules to the output
MesoInception-4) have been proposed by Afchar et al. [63] capsules over a series of iterations. Four datasets are used to
to classify hyperrealistic forged videos based on DeepFake test the approach, which cover a wide spectrum of fabricated
and Face2Face. It is obvious that uncompressed videos are image and video attacks. In these four datasets, the sug-
severely degraded by image noise, wherein microscopic gested strategy outperforms existing methods. This outcome
investigation-based image noise is not applicable. Moreover, demonstrates the capsule network’s utility in developing a
the models are efficient in detecting hyperrealistic forged generic detection system that can effectively detect a variety
videos at a low computational cost. The average detection of counterfeit image and video attacks.
efficiency rate was found to be 98% for DeepFake videos and A generalized fake face image detection method was pro-
95% for Face2Face videos under real conditions of diffusion posed by Xuan et. al. [71] in 2019. The key aim is to
on the internet. explicitly add a preprocessing step in the training stage to
remove low-level unstable artifacts of GAN images and force
2) DNN-BASED TECHNIQUES FOR DEEPFAKES the forensics classifier to focus on higher intrinsic forensic
In this method, existing DNN models are used to analyze spa- indications to detect such GAN-based images. In the prepro-
tial characteristics, boost detection efficacy and improve the cessing step, Xuan et al. used Gaussian blur and Gaussian
generalization capacity to detect DeepFakes. These methods noise methods. Adding Gaussian blur and Gaussian noise
are entirely data-driven. However, all of these DNN-based to low-level pixel data can depress low-level unstable arti-
detection approaches are vulnerable to adversarial attacks, facts. DCGAN [21], WGAN-GP [22] and PGGAN [23] are
and very few studies have been able to assess their perfor- used to generate the GAN images, where pristine images
mance in combating adversarial attacks. Existing studies that are taken from CelebA-HQ. The generated image is used for
use DNN to detect DeepFakes can be divided into three types. PGGAN [23] to train the CNN and other DCGANs [21], and
A fine-tuning approach is employed to improve the detection WGAN-GP [22] is used for testing purposes. However, the
capacity of existing DNN models, explore artifact clues and model shows little improvement in generalization ability on
train DNN models on different types of datasets to improve unseen types of fake image datasets.
the generalization capacity. Güera and Delp [68] proposed a Investigating the artifact clues in the image and videos
face-swapping-based detection method combining CNN and is also a prominent scheme to detect DeepFakes. In [72],
LSTM. InceptionV3 (CNN) is used to extract frame-level a combination of a recurrent convolutional model and face
features, and the output of CNN is fed to LSTM to construct a alignment approach was introduced to detect the three types
sequence descriptor that is used for classification. The highest of manipulations: DeepFake, Face2Face and FaceSwap. Ini-
accuracy of the model is greater than 97% when classifying tially, preprocessing operations are applied on video to
a video as pristine or DeepFake. detect, crop and align faces in a sequence of frames. Next,
A capsule network is used to detect forged images a combination of appropriate CNN models ResNet [19] or
and videos in a variety of forging scenarios, includ- DenseNet [119] with alignment and a bidirectional recurrent
ing replay attack detection and (both full and partial) network is used to test the accuracy. The model [72] is able to

VOLUME 10, 2022 18767

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

utilize micro-, meso- and macroscopic features for manipula- significant resistance to perturbation attacks such as down-
tion detection. Finally, according to the experimental results, sampling, JPEG compression, blur, and noise, according
landmark-based face alignment with bidirectional recurrent to experimental data. Gram-Net, which has demonstrated
DenseNet performs the best for detecting face manipulation encouraging results in the wild, also has a proven general-
in videos. ization capacity in working with various GANs.
Jeon et al. [73] introduced an FDFtNet method to improve The current DeepFake detection methods use small
the capability of existing CNN models, such as SqueezeNet, datasets for specific types of manipulation. These types
ShallowNetV3, ResNetV2, and Xception. In this method, of generated deep fakes are highly realistic. The detection
the fine-tuning method is used to extract the features using techniques for such DeepFakes suffer from performance.
MBblockV3, and the method can be called fine-tuning trans- To solve this issue, Khalid and Woo [79] proposed the
formation. This method shows a higher performance than that OC-FakeDect method, which uses a one-class variational
of the existing classical models. Moreover, the preference for autoencoder (VAE) to train only on real face images and
unseen types of GAN-based image permutation attacks has detects nonreal images such as DeepFakes by treating them
not been calculated. Jeon et al. [74] proposed a transferable as anomalies.
GAN-image detection framework (T-GD) technique, which Fung et. al. [80] introduced a unique unsupervised learn-
efficiently detects DeepFake images. The model works on ing method for detecting facial modification. Two modified
teacher and student relations, which mutually improve the copies of a face image are generated using two distinct trans-
detection performance. formations and fed into two sequential subnetworks (Xcep-
Hsu et. al. [75] proposed a pairwise learning model to tion and projection head network). Furthermore, the outputs
detect GAN-based generated fake images. The model was of the projection head networks maximize the agreement.
designed by combining the architecture of the improved The model architecture was inspired by the method proposed
version of the DenseNet backbone network and the Siamese by Chen et al. [120], which shows high accuracy of visual
network and is also called a common fake feature network representations over previous state-of-the-art methods.
(CFFN). To learn the discriminative common fake feature, By improving the generalization ability, conventional
pairwise information (labeled training dataset) is provided DNNs have been frequently used to detect fake faces; how-
to the CFFN. The trained CFFN is capable of performing ever, they can overfit specific manipulation types and suffer
the classification task indicating whether the image is real or from transferability concerns when unknown manipulation
fake. methods are not available. Tariq et. al. [81] proposed a gener-
Gandhi and Jain [76] proposed a method to enhance the alized method to detect multiple types of DeepFakes. Addi-
performance of existing DeepFake models by adding adver- tionally, the model was tested on unseen types of DeepFakes,
sarial perturbations in DeepFake images. The fast gradient such as the DeepFake-in-the-Wild video dataset (Shahroz-
sign method and the Carlini and Wagner L2 norms are used tariq/CLRNet/blob/main/dataset_samples). The main idea is
to create adversarial perturbations in both black box and white to trace the spatial and temporal information in DeepFakes
box settings, and Lipschitz regularization and deep image by a convolutional LSTM-based residual network (CLRNet),
prior (DIP) are introduced to increase the robustness of CNN which has a unique type of training strategy. The best perfor-
(ResNet and VGG)-based deep-fake detectors. Lipschitz reg- mance of the CLRNet model on the DeepFake-in-the-Wild
ularization increases the detection of perturbed DeepFakes, video dataset is 93.86%.
with a 10 percent improvement in the black box scenario, and
3) ARTIFACT ANALYSIS FOR DEEPFAKES
DIP defense obtains a 95 percent accuracy with an original
DeepFakes frequently produce artifacts that are difficult to
98 percent accuracy. Moreover, there are two models with
identify by humans but are quickly recognized by machine
some limitations. The performance of Lipschitz regulariza-
and forensic analysis. Inconsistencies, irregularities in the
tion in the white box scenario only improves by 2.2 percent,
background, and GAN fingerprints are examples of spa-
and the DIP method shows higher performance than that of
tial artifacts. Detecting fluctuation in a person’s behavior,
Lipschitz regularization; however, the detection process is
physiological signals, coherence, and video frame synchro-
highly time-consuming even after a high-performance con-
nization are all examples of temporal artifacts. Agarwal
figuration. Wu et al. [77] introduced an SSTNet method that
et al. [88], [97] proposed a combination of static biomet-
combines spatial, steganalysis and feature extracted proce-
rics on facial identity with temporal behavioral biometrics
dures to detect DeepFakes. Basically, XceptionNet is used on facial expressions and head movements for DeepFake
to monitor the spatial features and statistical information of detection. According to Chai et al. [98], redundant arti-
the image. Moreover, steganalysis operations are applied, and facts can be evaluated from local patches to identify the
RNN is also used to mine the temporal features. Finally, all fake face. This idea has been tested using different existing
the extracted information is combined for binary classifica- models, such as Resnet-18 [19], Xception [121], MesoIn-
tion to detect DeepFakes. ception4 [63], and CNN [122], with p values of 0.1 and
Liu et al. [78], using global texture data, increased 0.5 on the CelebA-HQ and FFHQ datasets, respectively.5
the robustness and generalization capabilities of existing
CNNs in identifying synthetic fake faces. Gram-Net shows 5 https://github.com/NVlabs/ffhq-dataset

18768 VOLUME 10, 2022

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

This idea shows generalized characteristics with different net- videos is a very important clue to detect the synthesized
work architectures and different datasets. Zhang et. al. [82] video. The techniques [39], [99], [102] can clearly demon-
raised the concern about the applications used for face swap- strate why the video is a fake. Mittal et al. [99] distinguish
ping in less than a minute. This issue can be a serious ‘‘real’’ and ‘‘fake’’ videos using a correlation between modal-
problem for face authentication on the internet. To solve ities and affective signals. For modelling the visual and audio
this issue, automated face swapping and its detection method in videos, a Siamese network is used, along with a mixture of
were proposed with a combination of basic machine learn- the two triplet loss functions to determine similarity. One loss
ing techniques. Initially, the key points from the face image function aims to calculate the similarity between visual and
are detected and presented as descriptors (capturing local auditory stimuli, while the other is designed to calculate effect
information about the key point). Because each key point cues such as perceived emotion. The experimental results
is independent, a further clustering operation is applied to show that the idea of estimating the audio-visual correlation
generate the codebook for each image. This codebook is taken is efficient in estimating DeepFake videos. Agarwal et al.
as input for linear or nonlinear-based machine learning to [39] introduced a fake video detection method that takes
estimate its legitimacy. However, the features are extracted advantage of abnormalities in the dynamics of the mouth
using speeded-up robust features (SURF) [123], and bag of shape (visemes) and the pronounced phoneme. Mama, baba,
words (bow) [124] methods are used to generate the code- and papa are examples of phonemes that require the lips to
book. The codebook information is then fed into support vec- be totally closed to be properly spoken. The authors’ recom-
tor machines (SVMs), random forests (RFs) and multilayer mended strategy worked well, especially as the video became
perceptrons (MLPs) for binary classification. In the experi- longer. The Modality Dissonance Score (MDS) was proposed
ments, the best solution for detection accuracy is greater than by Chugh et al. [102] to detect DeepFake videos. Basi-
92%. Nirkin et al. [109] used the discrepancy between faces cally, dissimilarity scores are calculated between audio-visual
and their context to identify fake faces. In other words, two segments over 1-second video segments, and the MDS is
networks are trained; the first network is trained to identify estimated after applying aggregation to all the segments.
the person’s face, and the second context recognition network The resultant value can efficiently estimate the DeepFake
takes the face’s context into account, such as the person’s video. This method can also be utilized for temporal forgery
hair, ears, and neck. To identify fake faces, discrepancies are localization, which identifies the video segment that has been
calculated by comparing these two networks. This method tampered with.
exhibits a high generalization ability. The idea of monitoring the lack of visual consistency in
Rather than looking at the visual artifacts in fake faces, [48], [84], [87], [94], which is used to estimate DeepFake
other researchers are looking at the imperfect designs of the videos, particularly the shape, facial features, and landmarks
current GANs, which offer signals for distinguishing between of faces, is not based in nature. Li et al. [84] proposed an
genuine and DeepFake faces. McCloskey and Albright [89] eye blinking-based fake face video detection method using a
explored the architecture of a GAN generator, which intended CNN and an RNN, which is an LRCN model. Basically, the
to enhance methods for detecting visual artifacts in DeepFake LRCN model consists of three steps: feature extraction from
images. In fact, the generator’s normalization processes are the eye sequence by using VGG16, sequence learning by
taken into account, which will reduce the frequency of sat- using LSTM, a special kind of RNN, and finally, state predic-
urated and underexposed pixels. Finally, the generated fea- tion, which generates the likelihood of eye open and closure
tures are classified by SVM. Marra et al. [90] proposed GAN states based on the output of LSTM. The best performance
fingerprints (unique artifacts of Pro-GAN and Cycle-GAN of the model under the ROC curve was 0.99. Li and Lyu [87]
fingerprints), which aim to detect DeepFake images. described a new deep learning-based model that can distin-
Yu et al. [92] studied GAN fingerprints for image guish DeepFake videos from real videos. The model takes
attribution and used them to classify images as real or leverage of the warping step during DeepFake creation. This
produced GANs. This study also identified the source of step leaves a resolution discrepancy between the warped face
GAN-generated images. If the model is trained by very little area and the surrounding context, and noticeable artifacts
change in the dataset, then the model fingerprint will be appear. Then, CNN models are used to detect such artifacts.
distinct, which lends greater granularity to model authentica- CNN is specifically trained to recognize faces first and then
tion. Additionally, finetuning is an effective technique used to extract landmarks to compute transform matrices to align the
immunize the DNN model against adversarial perturbations faces to a standard configuration. Gaussian blurring is applied
in fingerprint images. to the aligned face, and then the inverse of the predicted
Analyzing artifacts in biological signals is also gaining transformation matrix is used to affine and warp it back to
prominent attention from researchers who aim to identify the original image. Faces are aligned into several scales to
DeepFakes. In the synthesized fake faces, biological signal boost data diversity and to simulate more varied resolution
artifacts provide evident signals for fake detection. These bio- scenarios of affine warped faces. The performance was cal-
logical signals are divided into the following groups: visual- culated on four CNN models, namely, VGG16, ResNet50,
audio inconsistency, visual inconsistency and biological ResNet101 and ResNet152, and on DeepFake datasets
signal-in-video. The visual-audio irregularity in DeepFake (UADFV and DF-TIMIT with two qualities, LQ and HQ).

VOLUME 10, 2022 18769

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

The ResNet50-based DeepFake detection model outperforms Demir and Ciftci [108] proposed a model to detect DeepFakes
the DeepFake datasets. by analyzing the gaze in videos.
Yang et al. [48] suggested a method for detecting changes The biological signs in such videos are difficult to dupli-
between 3D head pose movement, which includes head ori- cate. Heart rate has been demonstrated in studies to be
entation and position. To detect such orientation and position- useful in detecting DeepFake videos. Extracting the heart
ing, 68 facial landmarks of the central face region are used. rate from videos is another challenging task. Taking advan-
The 3D head postures are investigated since the DeepFake tage of the neural ordinary differential equation (Neural-
face generator pipeline has a flaw. After obtaining the detec- ODE [127]) to identify DeepFake videos was presented
tion results, the retrieved features are passed into an SVM by Fernandes et al. [96]. Qi et al. [106] proposed a Deep-
classifier. Experiments on two datasets (UADFV, DARPA Rhythm model that also exposes DeepFake videos using
MediFor) reveal that the detection method outperforms the heartbeat rhythms. The authors created motion-magnified
other methods. Guarnera et. al. [103] proposed a model for spatial-temporal representation (MMSTR) for the video to
DeepFake detection by monitoring the hidden forensics traces highlight heart rhythm signals. Finally, based on the output
in images. Basically, the expectation maximization (EM) of MMSTR, a dual-spatial-temporal attentional network was
algorithm [125] is used to extract a set of local features to built to identify fraudulent videos.
model the underlying convolutional generative process. The
model was evaluated with five different types of DeepFake VI. CHALLENGES FOR DEEPFAKE CREATION AND
creation techniques, namely, GDWCT, StarGAN, ATTGAN, DETECTION
StyleGAN and StyleGAN2, and on the CELEBA dataset In recent years, many DeepFake tools have become avail-
using naïve classifiers to discriminate between originals and able that have highly realistic performance levels, and many
fakes. more are in development. In contrast, the development of the
Matern et al. [94] investigated a way to exploit DeepFake DeepFake generation model is creating large challenges for
and face manipulation artifacts based on visual attributes forensics experts in terms of combatting them. DeepFakes
such as eyes, teeth, and facial features. The visual artifacts are AI-generated hyperrealistic images or videos that have
are caused by a lack of global consistency, an incorrect or been digitally edited using techniques such as face swapping,
inadequate estimate of incident illumination, or an inaccurate changing the attributes and representing individuals speaking
estimate of the actual geometry. To detect DeepFakes, geo- and doing things that never happened.
metrical inconsistencies in reflections, eye and tooth areas GANs, which are popular artificial intelligence (AI) tech-
are monitored, and textural characteristics collected from the niques, consist of two discriminative and generative models
face region based on facial landmarks and other factors are that compete against each other to improve their performance
taken into account. Consequently, eye, teeth, and full-face to generate believable fakes. These impersonations of real
crop features are employed. Following feature extraction, two persons are frequently highly viral and spread swiftly across
classifiers, namely, logistic regression and a shallow neural social media platforms, thereby making them an effective
network, are used to distinguish DeepFakes from original tool for propaganda. In digital forensics, as in other security-
videos. The model works well on YouTube videos, with a related disciplines, it is necessary to account for the presence
best result of 0.851 in terms of the area under the receiver of an adversary who is actively attempting to fool inves-
operating characteristics curve. The drawback of this method tigators. In reality, a knowledgeable attacker who under-
is that it requires pictures that satisfy specific criteria, such as stands the concepts on which the forensic tools are based
open eyes or visible teeth. Fernandes et. al. [104] proposed an may take a variety of counterforensic steps to avoid detec-
attribution-based confidence (ABC) metric [126] for detect- tion [128]. Forensics tools should be able to detect such situ-
ing DeepFake videos. Initially, DeepFake videos were created ational threats, as well as any real-world situations that tend
using a commercial website (https://deepfakesweb.com/). to degrade test accuracy. Therefore, the numerous counter
Then, the generated DeepFake was tested on a pretrained forensics approaches intended to confuse current detectors
ResNet50 model, where the model was trained with the are a valuable aid in the development of multimedia forensics,
VGGFace2 dataset [105]. According to the obtained attribu- as they expose the flaws in current solutions and encourage
tion score, a threshold value of 0.94 was considered for the research to find a more robust resolution.
ABC metric that can differentiate a pristine from a DeepFake To date, many models are available to create or detect
video. Hu et al. [107] analyzed the inconsistency between fakes, but they still have weaknesses. In the following sub-
two eyes for detecting DeepFake face images. The detection section, we will discuss the main challenges, point by point,
model takes advantage of physical/physiological restrictions in creating or detecting DeepFakes.
in GAN-based images and then sufficiently estimates the
discrepancy between two eyes to identify fakes. These restric- A. CHALLENGES FOR DEEPFAKE CREATION
tions provide solid assurances for explaining the choice to Despite the fact that significant efforts have been made
differentiate a real from a fake; however, when improved to increase the visual quality of created DeepFakes, there
GANs are suggested, they will be invalid. In addition, the are still a number of hurdles to overcome. Some chal-
model’s resistance against perturbation attacks is unknown. lenges related to creating DeepFakes include generalization,

18770 VOLUME 10, 2022

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

temporal coherence, illumination stipulations, lack of realism

in eyes and lips, hand movement behavior and identity leak-
age.
• Generalization: The characteristics of generative mod-
els depend on the type of dataset provided during train-
ing. Therefore, after finishing training on a particular
dataset, the output produced by the model reflects the
learned characteristics (fingerprint). In addition, the out-
put quality depends on the size of the dataset provided FIGURE 11. Abnormalities of temporal coherence.
during training. Thus, to generate high-quality output,
the model should be fed a dataset large enough to achieve
a particular type of characteristic. Moreover, creating a
convincing model requires hours of training. It is usually
simpler to obtain a dataset that contains relevant content;
however, finding enough data for a single victim might
be difficult. Retraining the model for each unique target
identification is also time-consuming. Figure 10 shows
the fingerprints left by different DeepFake generator
models, which can be easily detected by a DeepFake
detector.

FIGURE 12. Abnormalities of eye blinking in [84].

abnormalities in DeepFake-generated video can be seen

in Figure 12.
• Hand movement behavior: Another issue is that when
the target expresses emotion through hand movement,
it is difficult for the DeepFake model to reflect such
expressions. Moreover, this kind of expression dataset
is limited; therefore, producing this type of DeepFake is
challenging.
• Identity leakage: Target identity preservation becomes
a challenge when there is considerable discrepancy
FIGURE 10. An example of a GAN fingerprint present in between the target identity and the driving identity, such
DeepFake-generated media using different environments can be
discovered easily by a DeepFake detector.
as in face reenactment tasks where target expressions
are driven by some source identity. The driving ’identity
• Temporal coherence: Other flaws include visible facial data are partially transmitted to the manufactured
abnormalities such as flickering and jittering between face. This event occurs when training is performed on
frames. These flaws occur because the DeepFake gen- a single identity or many identities, yet data pairing is
eration frameworks work on each frame without con- performed on the same identity.
sidering temporal consistency. To overcome these flaws, Many DeepFake tools are available, but they are not
some researchers offer this context to the generator or perfect. In fact, the available tools are uniquely designed
discriminator, consider temporal coherence losses, use and focus only on certain types of characteristics. Given
RNNs, or use a combination of these approaches. Visible the abovementioned challenges, generating DeepFake tools
abnormalities can be seen in Figure 11. requires more research to improve performance. To summa-
• Illumination stipulations: Most available DeepFake rize, developing a DeepFake generation tool is a challenging
datasets are produced in a controlled environment, such task.
as using the same type of lighting and background.
However, a sudden shift in lighting circumstances in B. CHALLENGES FOR DEEPFAKE DETECTION
indoor/outdoor scenarios causes color discrepancies and Although significant progress has been achieved in the per-
odd abnormalities in the resultant output. formance of DeepFake detectors, several issues related to
• Lack of realism in eyes and lips: The lack of natu- the current detection algorithms need to be addressed. Some
ral emotions, interruptions, and the rate at which the of the difficulties faced by DeepFake detection techniques
target talks are the primary difficulties of eye and lip include a lack of datasets, unknown types of attacks on media,
synchronization-based DeepFake creation. Eye blinking temporal aggregation and unlabeled data.

VOLUME 10, 2022 18771

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

Most DeepFake detection models, particularly those

based on deep learning approaches, lack such an expla-
nation because of their black-box nature. Therefore,
designing a DeepFake detection model using a small and
unlabeled dataset is challenging.

VII. CONCLUSION
This article offers a comprehensive survey of a new and
prominent technology, namely, DeepFake. It communicates
the basics, benefits and threats associated with DeepFake,
GAN-based DeepFake applications. In addition, DeepFake
detection models are also discussed. The inability to trans-
FIGURE 13. An example of an adversarial attack on a DeepFake detector fer and generalize is common in most existing deep
in [76]. learning-based detection methods, which implies that multi-
media forensics has not yet reached its zenith. Much interest
• Lack of DeepFake datasets: The performance of a has been shown by different important organizations and
DeepFake detection model depends on the variety of experts that are contributing to the improvement of applied
large datasets used during training. If the model is tested techniques. However, much effort is still needed to ensure
on downloaded media, which have an unknown type data integrity, hence the need for other protection meth-
of manipulation, then designing the model to identify ods. Furthermore, experts are anticipating a new wave of
the unknown type of manipulation is challenging. Due DeepFake propaganda in AI against AI encounters where
to the popularity of web-based applications, postpro- none of the sides has an edge over the other.
cessing operations are applied to DeepFake multimedia
with the intention of fooling the DeepFake detector; REFERENCES
such manipulation could consist of removing temporal [1] H. Farid, ‘‘Image forgery detection,’’ IEEE Signal Process. Mag., vol. 26,
artifices, blurring, smoothing, cropping, etc. no. 2, pp. 16–25, Mar. 2009.
• Unknown type of attack: Another challenging task is [2] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial nets,’’ in
to design a robust DeepFake detection model against Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014, pp. 1–9.
unknown types of attacks such as the fast gradient [3] P. Baldi, ‘‘Autoencoders, unsupervised learning, and deep architectures,’’
sign method (FGSM) [129] and the Carlini and Wag- in Proc. ICML Workshop Unsupervised Transf. Learn., 2012, pp. 37–49.
ner L2 norm attack (CW-L2) [130]. These attacks are [4] T. Karras, S. Laine, and T. Aila, ‘‘A style-based generator architecture for
generative adversarial networks,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
used to fool classifiers in their actual output. An exam- Pattern Recognit. (CVPR), Jun. 2019, pp. 4401–4410.
ple of a DeepFake creation using source and target [5] Y. Mirsky and W. Lee, ‘‘The creation and detection of deep-
faces, with adversarial perturbations, can be seen in Fig- fakes: A survey,’’ ACM Comput. Surv., vol. 54, no. 1, pp. 1–41,
Jan. 2022.
ure 13. DeepFakes are accurately classified as fake by [6] M. Masood, M. Nawaz, K. M. Malik, A. Javed, and A. Irtaza, ‘‘Deepfakes
a DeepFake detector, but adversarially perturbed Deep- generation and detection: State-of-the-art, open challenges, countermea-
Fakes are classified as real. sures, and way forward,’’ 2021, arXiv:2103.00484.
[7] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and
• Temporal Aggregation: Existing DeepFake detection J. Ortega-Garcia, ‘‘Deepfakes and beyond: A survey of face manipulation
algorithms use binary frame-level classification, which and fake detection,’’ Inf. Fusion, vol. 64, pp. 131–148, Dec. 2020.
involves determining whether each video frame is real [8] T. T. Nguyen, Q. V. H. Nguyen, D. T. Nguyen, D. T. Nguyen,
T. Huynh-The, S. Nahavandi, T. T. Nguyen, Q.-V. Pham, and
or fake. However, as these methods do not take inter-
C. M. Nguyen, ‘‘Deep learning for deepfakes creation and detection:
frame temporal consistency into consideration, they may A survey,’’ 2019, arXiv:1909.11573.
encounter issues, such as exhibiting temporal abnor- [9] L. Verdoliva, ‘‘Media forensics and DeepFakes: An overview,’’ IEEE J.
malities and real/artificial frames occurring in consec- Sel. Topics Signal Process., vol. 14, no. 5, pp. 910–932, Aug. 2020.
[10] K. Fukushima, ‘‘Neocognitron: A self-organizing neural network model
utive intervals. Furthermore, these methods necessitate for a mechanism of pattern recognition unaffected by shift in position,’’
an extra step to compute the video integrity score, which Biol. Cybern., vol. 36, no. 4, pp. 193–202, Apr. 1980.
must be integrated for each frame to obtain the final [11] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard,
and L. Jackel, ‘‘Handwritten digit recognition with a back-propagation
result. network,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 2, 1989,
• Unlabeled data: Usually, DeepFake detection models pp. 396–404.
are trained with large datasets. However, in some cases, [12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learn-
ing applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11,
such as journalism or law enforcement-based DeepFake pp. 2278–2324, Nov. 1998.
detection, only a small dataset may be available. More- [13] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang,
over, this kind of dataset needs an additional effort to G. Wang, J. Cai, and T. Chen, ‘‘Recent advances in convolutional neural
label the score corresponding to the type of forgery used. networks,’’ Pattern Recognit., vol. 77, pp. 354–377, May 2018.
[14] S. Dong, P. Wang, and K. Abbas, ‘‘A survey on deep learn-
Consequently, further study is required to understand ing and its applications,’’ Comput. Sci. Rev., vol. 40, May 2021,
journalism or law enforcement-based forgery cases. Art. no. 100379.

18772 VOLUME 10, 2022

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

[15] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, [39] S. Agarwal, H. Farid, O. Fried, and M. Agrawala, ‘‘Detecting deep-
A. Karpathy, A. Khosla, M. Bernstein, and A. C. Berg, ‘‘ImageNet large fake videos from phoneme-viseme mismatches,’’ in Proc. IEEE/CVF
scale visual recognition challenge,’’ Int. J. Comput. Vis., vol. 115, no. 3, Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2020,
pp. 211–252, Dec. 2015. pp. 660–661.
[16] M. D. Zeiler and R. Fergus, ‘‘Visualizing and understanding convolu- [40] S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-Shlizerman, ‘‘Syn-
tional networks,’’ in Proc. Eur. Conf. Comput. Vis., Cham, Switzerland: thesizing Obama: Learning lip sync from audio,’’ ACM Trans. Graph.,
Springer, 2014, pp. 818–833. vol. 36, no. 4, pp. 1–13, 2017.
[17] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for [41] Y. Song, J. Zhu, D. Li, X. Wang, and H. Qi, ‘‘Talking face generation by
large-scale image recognition,’’ 2014, arXiv:1409.1556. conditional recurrent adversarial network,’’ 2018, arXiv:1804.04786.
[18] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, [42] L. Song, W. Wu, C. Qian, R. He, and C. C. Loy, ‘‘Everybody’s Talkin’:
V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’ Let me talk as you want,’’ 2020, arXiv:2001.05201.
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, [43] O. Fried, A. Tewari, M. Zollhöfer, A. Finkelstein, E. Shechtman,
pp. 1–9. D. B. Goldman, K. Genova, Z. Jin, C. Theobalt, and M. Agrawala, ‘‘Text-
[19] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for based editing of talking-head video,’’ ACM Trans. Graph., vol. 38, no. 4,
image recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp. 1–14, Aug. 2019.
(CVPR), Jun. 2016, pp. 770–778. [44] I. Amerini, L. Ballan, R. Caldelli, A. Del Bimbo, and G. Serra, ‘‘A sift-
[20] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural based forensic method for copy–move attack detection and transfor-
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. mation recovery,’’ IEEE Trans. Inf. Forensics Security, vol. 6, no. 3,
[21] A. Radford, L. Metz, and S. Chintala, ‘‘Unsupervised representation pp. 1099–1110, Sep. 2011.
learning with deep convolutional generative adversarial networks,’’ 2015, [45] (2015). Wild Web Tampered Image Dataset. [Online]. Available:
arXiv:1511.06434. https://mklab.iti.gr/results/the-wild-web-tampered-image-dataset/
[22] A. Creswell and A. A. Bharath, ‘‘Inverting the generator of a generative [46] Z. Liu, P. Luo, X. Wang, and X. Tang, ‘‘Deep learning face attributes
adversarial network,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 30, in the wild,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015,
no. 7, pp. 1967–1974, Jul. 2019. pp. 3730–3738.
[23] T. Karras, T. Aila, S. Laine, and J. Lehtinen, ‘‘Progressive grow- [47] D. Shullani, M. Fontani, M. Iuliani, O. A. Shaya, and A. Piva, ‘‘VISION:
ing of GANs for improved quality, stability, and variation,’’ 2017, A video and image dataset for source identification,’’ EURASIP J. Inf.
arXiv:1710.10196. Secur., vol. 2017, no. 1, pp. 1–16, Dec. 2017.
[24] A. Brock, J. Donahue, and K. Simonyan, ‘‘Large scale GAN training for [48] X. Yang, Y. Li, and S. Lyu, ‘‘Exposing deep fakes using inconsistent
high fidelity natural image synthesis,’’ 2018, arXiv:1809.11096. head poses,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
[25] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, (ICASSP), May 2019, pp. 8261–8265.
‘‘Analyzing and improving the image quality of StyleGAN,’’ in Proc.
[49] P. Korshunov and S. Marcel, ‘‘DeepFakes: A new threat to face recogni-
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
tion? Assessment and detection,’’ 2018, arXiv:1812.08685.
pp. 8110–8119.
[50] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner,
[26] T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila,
‘‘FaceForensics: A large-scale video dataset for forgery detection in
‘‘Training generative adversarial networks with limited data,’’ 2020,
human faces,’’ 2018, arXiv:1803.09179.
arXiv:2006.06676.
[51] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and
[27] Generated Photos. Face Generator—Generate Faces Online Using AI.
M. Niessner, ‘‘FaceForensics++: Learning to detect manipulated facial
[Online]. Available: https://generated.photos/face-generator
images,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019,
[28] H. Dang, F. Liu, J. Stehouwer, X. Liu, and A. K. Jain, ‘‘On the detection of
pp. 1–11.
digital face manipulation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
[52] Google AI Blog. (2019). Contributing Data to Deepfake Detec-
Recognit. (CVPR), Jun. 2020, pp. 5781–5790.
tion Research. [Online]. Available: https://ai.googleblog.com/2019/09/
[29] J. C. Neves, R. Tolosana, R. Vera-Rodriguez, V. Lopes, H. Proenca, and
contributing-data-to-deepfake-detection.html
J. Fierrez, ‘‘GANprintR: Improved fakes and evaluation of the state of the
art in face manipulation detection,’’ IEEE J. Sel. Topics Signal Process., [53] B. Dolhansky, R. Howes, B. Pflaum, N. Baram, and C. C. Ferrer,
vol. 14, no. 5, pp. 1038–1048, Aug. 2020. ‘‘The deepfake detection challenge (DFDC) preview dataset,’’ 2019,
[30] Y. Zhu, Q. Li, J. Wang, C. Xu, and Z. Sun, ‘‘One shot face swapping on arXiv:1910.08854.
megapixels,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. [54] B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and
(CVPR), Jun. 2021, pp. 4834–4844. C. C. Ferrer, ‘‘The DeepFake detection challenge (DFDC) dataset,’’
[31] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, ‘‘Star- 2020, arXiv:2006.07397.
GAN: Unified generative adversarial networks for multi-domain image- [55] Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, ‘‘Celeb-DF: A large-scale
to-image translation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern challenging dataset for DeepFake forensics,’’ in Proc. IEEE/CVF Conf.
Recognit., Jun. 2018, pp. 8789–8797. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 3207–3216.
[32] Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, ‘‘AttGAN: Facial attribute [56] L. Jiang, R. Li, W. Wu, C. Qian, and C. C. Loy, ‘‘DeeperForensics-1.0:
editing by only changing what you want,’’ 2017, arXiv:1711.10678. A large-scale dataset for real-world face forgery detection,’’ in Proc.
[33] M. Liu, Y. Ding, M. Xia, X. Liu, E. Ding, W. Zuo, and S. Wen, ‘‘STGAN: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
A unified selective transfer network for arbitrary image attribute edit- pp. 2889–2898.
ing,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), [57] B. Zi, M. Chang, J. Chen, X. Ma, and Y.-G. Jiang, ‘‘WildDeepfake: A
Jun. 2019, pp. 3673–3682. challenging real-world dataset for deepfake detection,’’ in Proc. 28th
[34] J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Niessner, ACM Int. Conf. Multimedia, Oct. 2020, pp. 2382–2390.
‘‘Face2Face: Real-time face capture and reenactment of RGB videos,’’ [58] T.-N. Le, H. H. Nguyen, J. Yamagishi, and I. Echizen, ‘‘Openforen-
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, sics: Large-scale challenging dataset for multi-face forgery detection and
pp. 2387–2395. segmentation in-the-wild,’’ in Proc. Int. Conf. Comput. Vis., Oct. 2021,
[35] J. Thies, M. Zollhöfer, and M. Nießner, ‘‘Deferred neural rendering: pp. 10117–10127.
Image synthesis using neural textures,’’ ACM Trans. Graph., vol. 38, [59] T. Gloe and R. Böhme, ‘‘The ‘Dresden image Database’ for benchmark-
no. 4, pp. 1–12, 2019. ing digital image forensics,’’ in Proc. ACM Symp. Appl. Comput. (SAC),
[36] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, ‘‘Image-to-image translation 2010, pp. 1584–1590.
with conditional adversarial networks,’’ in Proc. IEEE Conf. Comput. Vis. [60] M. Zampoglou, S. Papadopoulos, and Y. Kompatsiaris, ‘‘Detecting image
Pattern Recognit., Jul. 2017, pp. 1125–1134. splicing in the wild (web),’’ in Proc. IEEE Int. Conf. Multimedia Expo.
[37] U. Scherhag, C. Rathgeb, J. Merkle, R. Breithaupt, and C. Busch, ‘‘Face Workshops (ICMEW), Jun. 2015, pp. 1–6.
recognition systems under morphing attacks: A survey,’’ IEEE Access, [61] C. Sanderson, ‘‘The VidTIMIT database,’’ IDIAP Inst. Res., Martigny,
vol. 7, pp. 23012–23026, 2019. Switzerland, Tech. Rep. Idiap-Com-06-2002, 2002.
[38] O. Gafni, L. Wolf, and Y. Taigman, ‘‘Live face de-identification in [62] M. Koopman, A. M. Rodriguez, and Z. Geradts, ‘‘Detection of deepfake
video,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, video manipulation,’’ in Proc. 20th Irish Mach. Vis. image Process. Conf.
pp. 9378–9387. (IMVIP), Aug. 2018, pp. 133–136.

VOLUME 10, 2022 18773

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

[63] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, ‘‘MesoNet: A com- [88] S. Agarwal, H. Farid, Y. Gu, M. He, K. Nagano, and H. Li, ‘‘Protecting
pact facial video forgery detection network,’’ in Proc. IEEE Int. Workshop world leaders against deep fakes,’’ in Proc. CVPR Workshops, vol. 1,
Inf. Forensics Secur. (WIFS), Dec. 2018, pp. 1–7. Jun. 2019, pp. 1–8.
[64] L. Nataraj, T. M. Mohammed, B. Manjunath, S. Chandrasekaran, [89] S. McCloskey and M. Albright, ‘‘Detecting GAN-generated imagery
A. Flenner, J. H. Bappy, and A. K. Roy-Chowdhury, ‘‘Detecting GAN using saturation cues,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP),
generated fake images using co-occurrence matrices,’’ Electron. Imag., Sep. 2019, pp. 4584–4588.
vol. 2019, no. 5, pp. 1–532, 2019. [90] F. Marra, D. Gragnaniello, L. Verdoliva, and G. Poggi, ‘‘Do GANs leave
[65] H. Li, B. Li, S. Tan, and J. Huang, ‘‘Identification of deep network artificial fingerprints?’’ in Proc. IEEE Conf. Multimedia Inf. Process.
generated images using disparities in color components,’’ Signal Process., Retr. (MIPR), Mar. 2019, pp. 506–511.
vol. 174, Sep. 2020, Art. no. 107616. [91] D.-T. Dang-Nguyen, C. Pasquini, V. Conotter, and G. Boato, ‘‘RAISE:
[66] A. Haliassos, K. Vougioukas, S. Petridis, and M. Pantic, ‘‘Lips Don’t lie: A raw images dataset for digital image forensics,’’ in Proc. 6th ACM
A generalisable and robust approach to face forgery detection,’’ in Proc. Multimedia Syst. Conf., Mar. 2015, pp. 219–224.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, [92] N. Yu, L. Davis, and M. Fritz, ‘‘Attributing fake images to GANs:
pp. 5039–5049. Learning and analyzing GAN fingerprints,’’ in Proc. IEEE/CVF Int. Conf.
[67] F. Lugstein, S. Baier, G. Bachinger, and A. Uhl, ‘‘PRNU-based deep- Comput. Vis. (ICCV), Oct. 2019, pp. 7556–7566.
fake detection,’’ in Proc. ACM Workshop Inf. Hiding Multimedia Secur., [93] F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao, ‘‘LSUN:
Jun. 2021, pp. 7–12. Construction of a large-scale image dataset using deep learning with
[68] D. Guera and E. J. Delp, ‘‘Deepfake video detection using recurrent humans in the loop,’’ 2015, arXiv:1506.03365.
neural networks,’’ in Proc. 15th IEEE Int. Conf. Adv. Video Signal Based [94] F. Matern, C. Riess, and M. Stamminger, ‘‘Exploiting visual artifacts to
Surveill. (AVSS), Nov. 2018, pp. 1–6. expose deepfakes and face manipulations,’’ in Proc. IEEE Winter Appl.
[69] H. H. Nguyen, J. Yamagishi, and I. Echizen, ‘‘Capsule-forensics: Comput. Vis. Workshops (WACVW), Jan. 2019, pp. 83–92.
Using capsule networks to detect forged images and videos,’’ in Proc. [95] D. P. Kingma and P. Dhariwal, ‘‘Glow: Generative flow with invertible
IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2019, 1x1 convolutions,’’ 2018, arXiv:1807.03039.
pp. 2307–2311. [96] S. Fernandes, S. Raj, E. Ortiz, I. Vintila, M. Salter, G. Urosevic, and
[70] I. Chingovska, A. Anjos, and S. Marcel, ‘‘On the effectiveness of local S. Jha, ‘‘Predicting heart rate variations of deepfake videos using neural
binary patterns in face anti-spoofing,’’ in Proc. BIOSIG Int. Conf. Bio- ODE,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW),
metrics Special Interest Group (BIOSIG), Sep. 2012, pp. 1–7. Oct. 2019, pp. 1721–1729.
[71] X. Xuan, B. Peng, W. Wang, and J. Dong, ‘‘On the generalization of [97] S. Agarwal, H. Farid, T. El-Gaaly, and S.-N. Lim, ‘‘Detecting deep-fake
GAN image forensics,’’ in Proc. Chin. Conf. biometric Recognit., Cham, videos from appearance and behavior,’’ in Proc. IEEE Int. Workshop Inf.
Switzerland: Springer, 2019, pp. 134–141. Forensics Secur. (WIFS), Dec. 2020, pp. 1–6.
[72] E. Sabir, J. Cheng, A. Jaiswal, W. AbdAlmageed, I. Masi, and
[98] L. Chai, D. Bau, S.-N. Lim, and P. Isola, ‘‘What makes fake images
P. Natarajan, ‘‘Recurrent convolutional strategies for face manipulation
detectable? Understanding properties that generalize,’’ in Proc. Eur. Conf.
detection in videos,’’ Interface (GUI), vol. 3, no. 1, pp. 80–87, 2019.
Comput. Vis., Cham, Switzerland: Springer, 2020, pp. 103–120.
[73] H. Jeon, Y. Bang, and S. S. Woo, ‘‘FDFtNet: Facing off fake images
[99] T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, ‘‘Emo-
using fake detection fine-tuning network,’’ in Proc. IFIP Int. Conf. ICT
tions don’t lie: An audio-visual deepfake detection method using affective
Syst. Secur. Privacy Protection. Cham, Switzerland: Springer, 2020,
cues,’’ 2020, arXiv:2003.06711.
pp. 416–430.
[100] Four in-the-Wild Lip-Sync Deep Fakes, Instagram. [Online]. Available:
[74] H. Jeon, Y. Bang, J. Kim, and S. S. Woo, ‘‘T-GD: Transferable GAN-
https://www.instagram.com/bill_posters_uk
generated images detection framework,’’ 2020, arXiv:2008.04115.
[75] C.-C. Hsu, Y.-X. Zhuang, and C.-Y. Lee, ‘‘Deep fake image detection [101] Four in-the-Wild Lip-Sync Deep Fakes, Youtube. [Online]. Available:
based on pairwise learning,’’ Appl. Sci., vol. 10, no. 1, p. 370, Jan. 2020. https://www.youtube.com/watch?v=VWMEDacz3L4
[76] A. Gandhi and S. Jain, ‘‘Adversarial perturbations fool deepfake detec- [102] K. Chugh, P. Gupta, A. Dhall, and R. Subramanian, ‘‘Not made for each
tors,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2020, pp. 1–8. other- audio-visual dissonance-based deepfake detection and localiza-
[77] X. Wu, Z. Xie, Y. Gao, and Y. Xiao, ‘‘SSTNet: Detecting manipu- tion,’’ in Proc. 28th ACM Int. Conf. Multimedia, Oct. 2020, pp. 439–447.
lated faces through spatial, steganalysis and temporal features,’’ in Proc. [103] L. Guarnera, O. Giudice, and S. Battiato, ‘‘DeepFake detection by analyz-
IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2020, ing convolutional traces,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
pp. 2952–2956. Recognit. Workshops (CVPRW), Jun. 2020, pp. 666–667.
[78] Z. Liu, X. Qi, and P. H. S. Torr, ‘‘Global texture enhancement for fake face [104] S. Fernandes, S. Raj, R. Ewetz, J. S. Pannu, S. K. Jha, E. Ortiz, I. Vintila,
detection in the wild,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern and M. Salter, ‘‘Detecting deepfake videos using attribution-based confi-
Recognit. (CVPR), Jun. 2020, pp. 8060–8069. dence metric,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
[79] H. Khalid and S. S. Woo, ‘‘OC-FakeDect: Classifying deepfakes using Workshops (CVPRW), Jun. 2020, pp. 308–309.
one-class variational autoencoder,’’ in Proc. IEEE/CVF Conf. Comput. [105] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, ‘‘VGGFace2:
Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2020, pp. 656–657. A dataset for recognising faces across pose and age,’’ in Proc. 13th IEEE
[80] S. Fung, X. Lu, C. Zhang, and C.-T. Li, ‘‘DeepfakeUCL: Deepfake detec- Int. Conf. Autom. Face Gesture Recognit. (FG), May 2018, pp. 67–74.
tion via unsupervised contrastive learning,’’ 2021, arXiv:2104.11507. [106] H. Qi, Q. Guo, F. Juefei-Xu, X. Xie, L. Ma, W. Feng, Y. Liu, and
[81] S. Tariq, S. Lee, and S. Woo, ‘‘One detector to rule them all: Towards J. Zhao, ‘‘DeepRhythm: Exposing DeepFakes with attentional visual
a general deepfake attack detection framework,’’ in Proc. Web Conf., heartbeat rhythms,’’ in Proc. 28th ACM Int. Conf. Multimedia, Oct. 2020,
Apr. 2021, pp. 3625–3637, doi: 10.1145/3442381.3449809. pp. 4318–4327.
[82] Y. Zhang, L. Zheng, and V. L. L. Thing, ‘‘Automated face swapping and its [107] S. Hu, Y. Li, and S. Lyu, ‘‘Exposing GAN-generated faces using inconsis-
detection,’’ in Proc. IEEE 2nd Int. Conf. Signal Image Process. (ICSIP), tent corneal specular highlights,’’ in Proc. IEEE Int. Conf. Acoust., Speech
Aug. 2017, pp. 15–19. Signal Process. (ICASSP), Jun. 2021, pp. 2500–2504.
[83] G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, ‘‘Labeled faces [108] I. Demir and U. A. Ciftci, ‘‘Where do deep fakes look? Synthetic face
in the wild: A database forstudying face recognition in unconstrained detection via gaze tracking,’’ in Proc. ACM Symp. Eye Tracking Res.
environments,’’ in Proc. Workshop Faces Real-Life’Images, Detection, Appl., May 2021, pp. 1–11.
Alignment, Recognit., 2008, pp. 1–11. [109] Y. Nirkin, L. Wolf, Y. Keller, and T. Hassner, ‘‘DeepFake detec-
[84] Y. Li, M.-C. Chang, and S. Lyu, ‘‘In Ictu oculi: Exposing AI created tion based on discrepancies between faces and their context,’’ IEEE
fake videos by detecting eye blinking,’’ in Proc. IEEE Int. Workshop Inf. Trans. Pattern Anal. Mach. Intell., early access, Jun. 29, 2021, doi:
Forensics Secur. (WIFS), Dec. 2018, pp. 1–7. 10.1109/TPAMI.2021.3093446.
[85] Cew Dataset. [Online]. Available: http://parnec.nuaa.edu.cn/_upload/ [110] M. Wang and W. Deng, ‘‘Deep face recognition: A survey,’’ Neurocom-
tpl/02/db/731/template731/pages/xtan/ClosedEyeDatabases.html puting, vol. 429, pp. 215–244, Mar. 2021.
[86] Ebv Dataset. [Online]. Available: http://www.cs.albany.edu/lsw/ [111] S. Umer, R. K. Rout, C. Pero, and M. Nappi, ‘‘Facial expression recog-
downloads.html nition with trade-offs between data augmentation and deep learning
[87] Y. Li and S. Lyu, ‘‘Exposing DeepFake videos by detecting face warping features,’’ J. Ambient Intell. Humanized Comput., vol. 13, pp. 721–735,
artifacts,’’ 2018, arXiv:1811.00656. Jan. 2021.

18774 VOLUME 10, 2022

A. Malik et al.: DeepFake Detection for Human Face Images and Videos: Survey

[112] S. Hossain, S. Umer, V. Asari, and R. K. Rout, ‘‘A unified framework of MINORU KURIBAYASHI (Senior Member,
deep learning-based facial expression recognition system for diversified IEEE) received the B.E., M.E., and D.E. degrees
applications,’’ Appl. Sci., vol. 11, no. 19, p. 9174, Oct. 2021. from Kobe University, Japan, in 1999, 2001, and
[113] J. Galbally, S. Marcel, and J. Fierrez, ‘‘Biometric antispoofing methods: 2004, respectively. From 2002 to 2007, he was a
A survey in face recognition,’’ IEEE Access, vol. 2, pp. 1530–1552, 2014. Research Associate at Kobe University, where he
[114] S. Umer, B. C. Dhara, and B. Chanda, ‘‘Face recognition using fusion was an Assistant Professor, from 2007 to 2015.
of feature learning techniques,’’ Measurement, vol. 146, pp. 43–54, Since 2015, he has been an Associate Profes-
Nov. 2019.
sor with the Graduate School of Natural Sci-
[115] H. Farid, ‘‘Digital image forensics,’’ Sci. Amer., vol. 298, no. 6, pp. 66–71,
ence and Technology, Okayama University. His
2008.
[116] C. Rathgeb, A. Botaljov, F. Stockhardt, S. Isadskiy, L. Debiasi, A. Uhl, research interests include multimedia security,
and C. Busch, ‘‘PRNU-based detection of facial retouching,’’ IET Bio- digital watermarking, cryptography, and coding theory. He is a member of the
metrics, vol. 9, no. 4, pp. 154–164, Jul. 2020. Information Forensics and Security Technical Committee of the IEEE Signal
[117] U. Scherhag, L. Debiasi, C. Rathgeb, C. Busch, and A. Uhl, ‘‘Detection Processing Society. He received the Young Professionals Award from the
of face morphing attacks based on PRNU analysis,’’ IEEE Trans. Biomet- IEEE Kansai Section in 2014 and the Best Paper Award at IWDW 2015 and
rics, Behav., Identity Sci., vol. 1, no. 4, pp. 302–317, Oct. 2019. 2019. He is the Vice Chair of the APSIPA Multimedia Security and Forensics
[118] S. Sabour, N. Frosst, and G. E. Hinton, ‘‘Dynamic routing between Technical Committee. He serves as an Associate Editor for IEEE Signal
capsules,’’ 2017, arXiv:1710.09829. Processing Letters, JISA, and IEICE.
[119] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, ‘‘Densely
connected convolutional networks,’’ in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708.
[120] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, ‘‘A simple framework
for contrastive learning of visual representations,’’ in Proc. Int. Conf.
Mach. Learn., 2020, pp. 1597–1607.
[121] F. Chollet, ‘‘Xception: Deep learning with depthwise separable convo-
lutions,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jul. 2017, pp. 1251–1258.
[122] S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, ‘‘CNN-
generated images are surprisingly easy to Spot. . . for now,’’ in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
pp. 8695–8704.
[123] H. Bay, T. Tuytelaars, and L. Van Gool, ‘‘Surf: Speeded up robust fea- SANI M. ABDULLAHI (Member, IEEE) received
tures,’’ in Proc. Eur. Conf. Comput. Vis., Berlin, Germany: Springer, 2006, the M.Sc. degree from The University of Manch-
pp. 404–417. ester, U.K., in 2013, and the Ph.D. degree from
[124] G. Qiu, ‘‘Indexing chromatic and achromatic patterns for content-based Southwest Jiaotong University, China, in 2019.
colour image retrieval,’’ Pattern Recognit., vol. 35, no. 8, pp. 1675–1686,
He is currently a Postdoctoral Researcher at China
Aug. 2002.
Three Gorges University, Yichang, China. He has
[125] T. K. Moon, ‘‘The expectation-maximization algorithm,’’ IEEE Signal
Process. Mag., vol. 13, no. 6, pp. 47–60, Nov. 1997. published a number of reputable journals and
[126] S. Jha, S. Raj, S. Fernandes, S. K. Jha, S. Jha, B. Jalaian, G. Verma, conferences, including the IEEE TRANSACTIONS ON
and A. Swami, ‘‘Attribution-based confidence metric for deep neural INFORMATION FORENSICS AND SECURITY, IEEE AVSS,
networks,’’ Tech. Rep., 2019. IWDW, and IWDCF. His research interests include
[127] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, information security, biometric template protection, digital forensics, mul-
‘‘Neural ordinary differential equations,’’ in Advances in timedia security, and digital watermarking. He received the Best Paper
Neural Information Processing Systems, vol. 31, S. Bengio, Award at the International Workshop on Digital Crime and Forensics
H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and (IWDCF-2017).
R. Garnett, Eds. Red Hook, NY, USA: Curran Associates, 2018.
[Online]. Available: https://proceedings.neurips.cc/paper/2018/file/
69386f6bb1dfed68692a24c8686939b9-Paper.pdf
[128] T. Gloe, M. Kirchner, A. Winkler, and R. Böhme, ‘‘Can we trust digital
image forensics?’’ in Proc. 15th Int. Conf. Multimedia (MULTIMEDIA),
2007, pp. 78–86.
[129] I. J. Goodfellow, J. Shlens, and C. Szegedy, ‘‘Explaining and harnessing
adversarial examples,’’ 2014, arXiv:1412.6572.
[130] N. Carlini and D. Wagner, ‘‘Towards evaluating the robustness of neu-
ral networks,’’ in Proc. IEEE Symp. Secur. Privacy (SP), May 2017,
pp. 39–57.

ASAD MALIK (Member, IEEE) received the AHMAD NEYAZ KHAN (Member, IEEE)
B.Sc. degree (Hons.) in computer application received the B.Sc. (Hons.) and master’s degrees
from Aligarh Muslim University, Aligarh, India, in computer applications from Aligarh Muslim
in 2012, the master’s degree in computer applica- University, India, in 2009 and 2012, respectively,
tion from Jamia Millia Islamia University, India, and the Ph.D. degree from the School of Com-
in 2015, and the Ph.D. degree from the School of puter Science and Engineering, University of Elec-
Information Science and Technology, Southwest tronic Science and Technology of China, Chengdu,
Jiaotong University, Chengdu, China, in 2020. China. He is currently an Assistant Professor with
He is currently an Assistant Professor with the Integral University, India. His research interests
Department of Computer Science, Aligarh Muslim include information security, machine learning,
University. His research interests include multimedia forensics and security, and reversible data hiding in the encrypted domain.
image processing, information hiding, and deep learning.

VOLUME 10, 2022 18775

DeepFake Detection Survey
100% (1)
DeepFake Detection Survey
19 pages
DeepFake Detection for Students
No ratings yet
DeepFake Detection for Students
2 pages
A Review of Deepfake Techniques Architecture Detection and Datasets
No ratings yet
A Review of Deepfake Techniques Architecture Detection and Datasets
25 pages
Research Paper Draft
No ratings yet
Research Paper Draft
12 pages
Deepfake Detection Using CNN and DCGANS To Drop-Out Fake Multimedia Content A Hybrid Approach
No ratings yet
Deepfake Detection Using CNN and DCGANS To Drop-Out Fake Multimedia Content A Hybrid Approach
6 pages
Seminar
No ratings yet
Seminar
18 pages
Visual Deepfake Detection Review of Techniques Tools Limitations and Future Prospects
No ratings yet
Visual Deepfake Detection Review of Techniques Tools Limitations and Future Prospects
39 pages
A Performance Enhancement of Deepfake Video
No ratings yet
A Performance Enhancement of Deepfake Video
10 pages
Deepfake Elsevier CVIU
No ratings yet
Deepfake Elsevier CVIU
19 pages
A Systematic Literature Review On The Effectiveness of Deepfake Detection Techniques 2023
No ratings yet
A Systematic Literature Review On The Effectiveness of Deepfake Detection Techniques 2023
32 pages
Report of Technical Seminar 6
No ratings yet
Report of Technical Seminar 6
26 pages
Applsci 12 09820 v2
No ratings yet
Applsci 12 09820 v2
15 pages
Deep Learning in Deepfake Detection
No ratings yet
Deep Learning in Deepfake Detection
19 pages
Deepfake
No ratings yet
Deepfake
10 pages
Advancements in Detecting Deepfakes: AI Algorithms and Future Prospects A Review
No ratings yet
Advancements in Detecting Deepfakes: AI Algorithms and Future Prospects A Review
30 pages
IJRPR7765
No ratings yet
IJRPR7765
5 pages
Computers 12 00216 With Cover
No ratings yet
Computers 12 00216 With Cover
27 pages
Nccds 3
No ratings yet
Nccds 3
3 pages
Deepfake Detection
No ratings yet
Deepfake Detection
10 pages
Computers 13 00031
No ratings yet
Computers 13 00031
18 pages
K024 K006 DWM ResearchPaper
No ratings yet
K024 K006 DWM ResearchPaper
16 pages
Computers 12 00216 v2
No ratings yet
Computers 12 00216 v2
26 pages
Deepfake Video Detection Using Neural Networks
No ratings yet
Deepfake Video Detection Using Neural Networks
6 pages
Deepfake Video Detection Using Convolutional Visio
No ratings yet
Deepfake Video Detection Using Convolutional Visio
9 pages
CNN Based Deep Learning Model For Deepfake Detection
No ratings yet
CNN Based Deep Learning Model For Deepfake Detection
5 pages
Deepfake Detection of Images
No ratings yet
Deepfake Detection of Images
9 pages
Deepfake Detection for AI Experts
No ratings yet
Deepfake Detection for AI Experts
51 pages
Innovative Project
No ratings yet
Innovative Project
7 pages
Group 85 Survey Paper (1) - 1
No ratings yet
Group 85 Survey Paper (1) - 1
5 pages
Deepfake Video Detection Using Convolutional Vision Transformer
No ratings yet
Deepfake Video Detection Using Convolutional Vision Transformer
9 pages
Research Paper
No ratings yet
Research Paper
6 pages
Paper (Related Project-3)
No ratings yet
Paper (Related Project-3)
9 pages
WIREs Data Min Knowl - 2023 - Heidari - Deepfake Detection Using Deep Learning Methods A Systematic and Comprehensive
No ratings yet
WIREs Data Min Knowl - 2023 - Heidari - Deepfake Detection Using Deep Learning Methods A Systematic and Comprehensive
45 pages
Deep Learning On Deep Fake Creation and Detection
No ratings yet
Deep Learning On Deep Fake Creation and Detection
14 pages
IJATCSEDeepfake Video Detection Using Convolutional Neural Network
No ratings yet
IJATCSEDeepfake Video Detection Using Convolutional Neural Network
6 pages
Final PDF
No ratings yet
Final PDF
42 pages
Synopsis 2
No ratings yet
Synopsis 2
7 pages
IJNRD2310407
No ratings yet
IJNRD2310407
6 pages
Survey Paper
No ratings yet
Survey Paper
6 pages
Arasetv47 N1 PP16 28
No ratings yet
Arasetv47 N1 PP16 28
13 pages
IET Biometrics - 2021 - Yu - A Survey On Deepfake Video Detection
No ratings yet
IET Biometrics - 2021 - Yu - A Survey On Deepfake Video Detection
18 pages
Exploring Deepfake Technology: Creation, Consequences and Countermeasures
No ratings yet
Exploring Deepfake Technology: Creation, Consequences and Countermeasures
12 pages
Deepfake Detection Using Convolutional Vision Transformers and Convolutional Neural Networks
No ratings yet
Deepfake Detection Using Convolutional Vision Transformers and Convolutional Neural Networks
17 pages
TSP Iasc 30486
No ratings yet
TSP Iasc 30486
14 pages
Paper 9
No ratings yet
Paper 9
16 pages
IEEE Xplore Reference Download 2025.5.29.11.51.9
No ratings yet
IEEE Xplore Reference Download 2025.5.29.11.51.9
2 pages
Deepfake Detection A Comparative Analysis
No ratings yet
Deepfake Detection A Comparative Analysis
28 pages
Wa0009
No ratings yet
Wa0009
12 pages
Deepfake Detection A Systematic Literature Review
No ratings yet
Deepfake Detection A Systematic Literature Review
20 pages
CVi T
No ratings yet
CVi T
29 pages
Deepfake Detection Techniques
No ratings yet
Deepfake Detection Techniques
11 pages
Deepfake Catcher
No ratings yet
Deepfake Catcher
11 pages
On Machine Learning and Deep Learning Based Deepfake Generation and Detection
No ratings yet
On Machine Learning and Deep Learning Based Deepfake Generation and Detection
10 pages
Electronics
No ratings yet
Electronics
22 pages
Synopsis Report
No ratings yet
Synopsis Report
8 pages
Using Deep Learning To Identif
No ratings yet
Using Deep Learning To Identif
22 pages
Identity Consistency For Enhanced Deep Fake Detection in Video Content Using Deep Learning
No ratings yet
Identity Consistency For Enhanced Deep Fake Detection in Video Content Using Deep Learning
6 pages
3417-SUBMISSION - Manuscript File (.Pdf-.Docx) - 14755-1-10-20241223
No ratings yet
3417-SUBMISSION - Manuscript File (.Pdf-.Docx) - 14755-1-10-20241223
10 pages
I. Objectives (Annotations) - PPST Indicators/ Kra Objectives/Rubric Indicators To Be Observed During The Classroom Observation
No ratings yet
I. Objectives (Annotations) - PPST Indicators/ Kra Objectives/Rubric Indicators To Be Observed During The Classroom Observation
12 pages
Grammar Practice: Too/Enough
No ratings yet
Grammar Practice: Too/Enough
1 page
Sujet de Dissertation Philosophique Sur La Science
100% (2)
Sujet de Dissertation Philosophique Sur La Science
4 pages
Fuzzy Set Theory: 07IC3E8 Soft Computing 3 0 0 100 Unit I 10
No ratings yet
Fuzzy Set Theory: 07IC3E8 Soft Computing 3 0 0 100 Unit I 10
2 pages
HR Management Trainee Role
No ratings yet
HR Management Trainee Role
4 pages
FS 1
100% (1)
FS 1
40 pages
Admission Letter-APP-04601
No ratings yet
Admission Letter-APP-04601
1 page
GLOSSARY (Communication Skills Terms)
100% (1)
GLOSSARY (Communication Skills Terms)
12 pages
English Fal Grade 12 Mind The Gap Study Guide Short Stories
100% (1)
English Fal Grade 12 Mind The Gap Study Guide Short Stories
86 pages
Activity - 5 Reading Comprehension 6º
No ratings yet
Activity - 5 Reading Comprehension 6º
2 pages
Assignment 1 Essay Understanding
No ratings yet
Assignment 1 Essay Understanding
11 pages
Young Learners Speaking Guide
100% (10)
Young Learners Speaking Guide
3 pages
Rock Guitar Booklet
No ratings yet
Rock Guitar Booklet
12 pages
Pallavi Techfinal
No ratings yet
Pallavi Techfinal
25 pages
Cbse X English Dpt-10 Phase-II Set-B 12-12-2024 QP
No ratings yet
Cbse X English Dpt-10 Phase-II Set-B 12-12-2024 QP
4 pages
Art Therapy For Children
No ratings yet
Art Therapy For Children
13 pages
NCKH 2
No ratings yet
NCKH 2
3 pages
10 Minute Reading Challenge Teacher Pack
No ratings yet
10 Minute Reading Challenge Teacher Pack
11 pages
Boys Hostel 4th Allotment List 2016
No ratings yet
Boys Hostel 4th Allotment List 2016
2 pages
Psychodynamic Theory for Clinicians
No ratings yet
Psychodynamic Theory for Clinicians
4 pages
Maths Project
No ratings yet
Maths Project
16 pages
Get To Know MUET Speaking
No ratings yet
Get To Know MUET Speaking
7 pages
Reviewer Insy55
No ratings yet
Reviewer Insy55
14 pages
Rabbi Mordechai Breuer and Modern Orthodox Biblical Commentary
No ratings yet
Rabbi Mordechai Breuer and Modern Orthodox Biblical Commentary
14 pages
Deep Research
No ratings yet
Deep Research
9 pages
Sunday School Coordinator
No ratings yet
Sunday School Coordinator
2 pages
DLL Types of Conflict
100% (1)
DLL Types of Conflict
2 pages
1-2-3-Magic Book Cheat Sheet
84% (19)
1-2-3-Magic Book Cheat Sheet
8 pages
CSS Secrets Better Solutions To Everyday Web Design Problems 1 (2nd Release) Edition Lea Verou PDF Download
No ratings yet
CSS Secrets Better Solutions To Everyday Web Design Problems 1 (2nd Release) Edition Lea Verou PDF Download
51 pages
Book 2
No ratings yet
Book 2
16 pages

Deepfake Detection For Human Face Images and Videos: A Survey

Uploaded by

Deepfake Detection For Human Face Images and Videos: A Survey

Uploaded by

Received January 25, 2022, accepted February 9, 2022, date of publication February 11, 2022, date of current version

February 22, 2022.

DeepFake Detection for Human Face Images

Corresponding author: Asad Malik (amalik_co@myamu.ac.in)

INDEX TERMS Deep learning, DeepFake, CNNs, GANs.

I. INTRODUCTION are also fueling an uptick in cybercrime. In this context,

scenery production. This breakthrough, however, is vulner-

18758 VOLUME 10, 2022

FIGURE 2. The basic architecture of CNN.

2015 are examined in [13], along with their basic compo-

VOLUME 10, 2022 18759

PGGAN [23], BigGAN [24], and Style-GAN [4], [25],

III. TOOLS USED TO CREATE A DEEPFAKE

18760 VOLUME 10, 2022

TABLE 1. Tools used to create a DeepFake.

TABLE 2. Facial manipulation techniques used to create DeepFakes.

FIGURE 5. Example of entire face synthesis in [25].

negative purposes, such as the development of very realistic

VOLUME 10, 2022 18761

FIGURE 6. Example of identity swap in [30].

18762 VOLUME 10, 2022

TABLE 3. Publicly available forgeries detection datasets.

VOLUME 10, 2022 18763

18764 VOLUME 10, 2022

TABLE 4. DeepFake detection methods.

VOLUME 10, 2022 18765

18766 VOLUME 10, 2022

FIGURE 9. Some important features used for detection.

VOLUME 10, 2022 18767

18768 VOLUME 10, 2022

VOLUME 10, 2022 18769

18770 VOLUME 10, 2022

temporal coherence, illumination stipulations, lack of realism

FIGURE 12. Abnormalities of eye blinking in [84].

abnormalities in DeepFake-generated video can be seen

VOLUME 10, 2022 18771

Most DeepFake detection models, particularly those

18772 VOLUME 10, 2022

VOLUME 10, 2022 18773

18774 VOLUME 10, 2022

VOLUME 10, 2022 18775

You might also like