Detecting Deepfake Videos with LSTM
Detecting Deepfake Videos with LSTM
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by
Assistant Professor
DEPARTMENT OF
COMPUTER SCIENCE & ENGINEERING
GITAM
(Deemed to be University)
VISAKHAPATNAM
OCTOBER 2023
(Deemed to be University)
DECLARATION
I/We, hereby declare that the project report entitled “Detection of Deep Fake morphing in
Video files using Long Short-Term Memory(LSTM) based Recurring Neural
Networks(RNN)” is an original work done in the Department of Computer Science and
Engineering, GITAM School of Technology, GITAM (Deemed to be University) submitted in
partial fulfillment of the requirements for the award of the degree of B.Tech. in Computer
Science and Engineering. The work has not been submitted to any other college or University
for the award of any degree or diploma.
Date: 16/10/2023
CERTIFICATE
This is to certify that the project report entitled “Detection of Deep Fake morphing in Video
files using Long Short-Term Memory(LSTM) based Recurring Neural Networks(RNN)”
is a bona fide record of work carried out by MEDAPATI HARSHAVARDHAN REDDY
(122010325001),PAMARTHI NITHIN (122010325005), SAMBANA TEJA SAI
(122010325011), CHILUKURI KARTHEEK SAI KUMAR (122010325009)
students submitted in partial fulfillment of requirement for the award of degree of Bachelor of
Technology in Computer Science and Engineering.
TABLE Project Guide Head of the Department
OF
1. ABSTRACT 5
2. INTRODUCTION 6
3. SYNOPSIS 8
4. TECHNICAL KEYWORDS 9
6. PROJECT PLAN 13
9. PROJECT IMPLEMENTATION 25
4
12. REFERENCES 35
1. ABSTRACT
The deep learning algorithms have become so strong due to the increasing processing power that
it is now relatively easy to produce a human-like synthetic video, sometimes known as a "deep
fake." It is easy to imagine scenarios in which people are blackmailed, manipulated into
believing they are victims of revenge porn, or exploited to incite political unrest through realistic
face swapping deepfakes. In this study, we provide a novel deep learning-based technique that
can reliably discriminate between real videos and artificial intelligence-generated phony ones.
Our technique can recognize whether a deep fake is being replaced or reenacted automatically.
The goal of our efforts is to combat artificial intelligence (AI) by using AI. Our approach extracts
frame-level characteristics using a Res-Re-enactment of Option Neural Network, and then uses
these features to train the Long Short-Term Memory (LSTM). based Recurrent Neural Network
(RNN) to categorize whether or not the video has been altered, that is, if it is a deepfake or
authentic video. We trial our technique on a real-time event and enhance the model's
performance on real-time data, we test our technique on a sizable balanced and mixed dataset
created by applying the several available datasets, including FF++, DFD challenge, and
Celebrity-DF. We also demonstrate how, using a relatively straightforward and reliable
methodology, our system may generate competitive results.
5
2. INTRODUCTION
PROJECT IDEA
Increasing processing power has made deep learning algorithms so powerful that creating a
synthetic movie that resembles a human being a technique referred to as a "DF" has become
comparatively simple. It is simple to imagine situations where people are used to provoke unrest
in politics through realistic face swapping deepfakes, blackmailed, or tricked into thinking they
are the victims of revenge porn. In this work, we provide a revolutionary deep learning-based
method that can accurately distinguish between fake videos produced by artificial intelligence
and real ones. Our method can automatically discern whether a deep fake is being replaced or
reenacted. Our aim is to use artificial intelligence (AI) to fight artificial intelligence (AI).
Through the use of a Res-Re-enactment of Option Neural Network, our method extracts frame-
level characteristics. These features are then used to train an LSTM-based Recurrent Neural
Network (RNN) to determine whether the video is authentic or a deepfake. We test our method
on a large-scale balanced and mixed dataset made by integrating the several existing data-sets
including Face-Forensic++, DFD Challenge, and Celeb-DF in order to replicate the real-time
scenarios and improve the model's performance on real-time data. We also demonstrate the
relatively straightforward and reliable output that our system is capable of.
LITERATURE SURVEY
By comparing the generated face areas and their surrounding regions with a specialized
Convolutional Neural Network model, Face Warping Artifacts employed a strategy to detect
artifacts. There were two distinct Face Artifacts in this piece.
Their approach is predicated on the realization that the existing deepfake algorithm can only
produce images with a certain level of resolution, necessitating further processing to match the
faces that need to be replaced in the original movie. The temporal analysis of the frames has not
been considered by their method.
6
The article "Detection by Eye Blinking" presents a novel technique for identifying deepfakes by
using the blinking of the eyes as a key factor in determining whether to classify the movies as
pristine or deepfake. An eye blinking clipped frame was subjected to temporal analysis using the
Long-term Recurrent Convolution Network (LRCN). The deepfake creation algorithms of today
are so strong that even not blinking your eyes won't be enough to identify a deepfake. In order to
identify deepfakes, additional factors need to be taken into account, such as crooked teeth, facial
wrinkles, incorrect eyebrow positioning, etc.
Capsule networks to detect altered images and videos: This technique makes use of a capsule
network to identify manipulated images and videos in many contexts, such as computer-
generated video detection and replay attack detection.
They have employed random noise in the training phase of their approach, which is not a
recommended practice. Even so, the model did well on their dataset; however, noise in the
training set could cause it to perform poorly on real-time data. We suggest using real-time and
noiseless datasets to train our technique.
7
3. SYNOPSIS
A method called "deep fake" is used to simulate human images using neural network tools such
as auto encoders and GANs (Generative Adversarial Networks). These applications overlay
target images onto source movies using deep learning techniques to create deepfake videos that
appear realistic. The variations between these deepfake videos and the real ones cannot be
noticed with an unaided eye. In this paper, we provide a revolutionary deep learning-based
technique that can accurately distinguish between authentic videos and fake ones produced by
AI. Our effective method of differentiating between pristine and deep fake videos is based on the
limitations of the deep fake generating tools. The existing deep fake creation techniques leave
some recognizable artifacts in the frames during the deep fake production process.
Our system Res-Next Convolution Neural Networks are used by our system to retrieve frame-
level characteristics. These characteristics is utilized to instruct a Recurrent Neural Network
(RNN) with Long Short-Term Memory (LSTM) to decide whether that video was altered, or put
another way, if it is a deepfake or authentic video. We suggested testing our approach on a
sizable collection of deepfake films gathered from several online video portals. Our goal is to
improve the performance of the deep fake detection model using real-time data. We used a
variety of accessible datasets to train our model to accomplish this. for our model to be able to
extract characteristics from many types of photos. From Face-Forensic++, we sufficiently
retrieved films for DFD.
8
4. TECHNICAL KEYWORDS
AREA OF PROJECT
Our project uses neural network technology inspired by the human brain and falls under the
category of deep learning, a subfield of artificial intelligence. Our project involves a significant
amount of computer vision. With the aid of Open-CV, it assists in processing the video and
frames. A model trained on Python Torch capable of determining whether a source video is
pristine or deepfake.
TECHNICAL KEYWORDS
• Deep learning
• Computer vision
• Res-Next Convolution Neural Network
• Long short-term memory (LSTM)
• OpenCV, PyTorch
• Face Recognition
• GAN (Generative Adversarial Network)
9
5. DEFINITION AND SCOPE OF THE PROBLEM
• Our project aims at discovering the distorted truth of the deep fakes.
• Our project will reduce the Abuses’ and misleading of the common people on the world
wide web.
• Our project will distinguish and classify the video as deep fake or pristine.
• Provide an easy-to-use system to upload the video and distinguish whether the video is
real or fake.
10
of research we found that the balanced training of the algorithm is the best way to avoid
the bias and variance in the algorithm and get a good accuracy.
• Solution Constraints
We analysed the solution in terms of cost, speed of processing, requirements, level of
expertise, availability of equipment’s.
• Parameter Identifier
DESIGN
After research and analysis, we developed the system architecture of the solution as mentioned in
Chapter 6. We decided the baseline architecture of the Model which includes the different layers
and their numbers.
DEVELOPMENT
Following investigation, we chose to program using the Python 3 language and the PyTorch
framework. PyTorch is selected because it is very customizable and offers strong support for
CUDA, or the Graphic Processing Unit. Google Cloud Platform for training a huge number of
data sets to create the final model.
11
EVALUATION
We evaluated a variety of real-time datasets, including YouTube video datasets, to assess our
model. The trained model's accuracy is assessed using the Confusion Matrix technique.
OUTCOME
The outcome of the solution is trained deepfake detection models that will help the users to
check if the new video is deep fake or real.
APPLICATIONS
The user will submit the video for processing by uploading it using web-based applications. The
uploaded video will be pre-processed by the model to determine if it is a real or deepfake.
12
6. PROJECT PLAN
13
PROJECT SCHEDULE
Major Tasks in the Project stages are.
• Task 3: Pre-processing
Pre-processing includes the creation of the new dataset which includes only face cropped
videos.
• Task 8: Testing
The complete application is tested using unit testing.
14
7. SOFTWARE REQUIREMENT SPECIFICATION
INTRODUCTION
15
DFD level – 0
Indicates the basic flow of data in the system. In this System Input is given equal importance as
that for Output.
• Input: Here input to the system is uploading video.
• System: In system it shows all the details of the Video.
• Output: Output of this system shows the fake video or not.
Hence, the data flow diagram indicates the visualization of a system with its input and output
flow.
DFD LEVEL-1
16
DFD LEVEL-2
17
ACTIVITY DIAGRAM:
TRAINING WORKFLOW:
18
TESTING WORKFLOW:
19
NON-FUNCTIONAL REQUIREMENTS:
PERFORMANCE REQUIREMENT
The software should have an effective construction that allows it to be utilized for more real-
world uses and provides consistent identification of fake videos.
• The design is versatile and quick to operate.
• The application reduces time, and it was quick and reliable.
• The system has adaptations for all regions.
• The system is easily integrated and compatible with future upgrades.
SAFETY REQUIREMENT
20
• The Data integrity is preserved. Once the video is uploaded to the system. It is only
processed by the algorithm. The videos are kept secure from human interventions, as the
uploaded video is not able for human manipulation.
• To extend the safety of the videos uploaded by the user will be deleted after 30 min from
the server.
SECURITY REQUIREMENT
• While uploading the video, the video will be encrypted using a certain symmetric
encryption algorithm. On the server also the video is in encrypted format only. The video
is only decrypted from preprocessing till we get the output. After getting the output the
video is again encrypted.
SEQUENCE DIAGRAM
21
8. DETAILED DESIGN DOCUMENT
INTRODUCTION
SYSTEM ARCHITECTURE
22
CREATING DEEP FAKE VIDEOS
To identify deep fake films, it is essential to comprehend the process of deep fake video
generation. Numerous techniques, like GANs and autoencoders, accept input from both the
source image and the target video. These programs divide a video into frames, locate faces
within the film, and then, on each frame, switch from the source face to the target face. The
updated frames are then integrated with several pre-trained models. These models help improve
video quality by erasing traces left over by the deepfake production technique. As a result, the
deepfake appears realistic in nature. We also used the same method to detect deepfakes.
Deepfakes built with pre-trained neural network models. Deepfakes generated with pre-trained
neural network models are so lifelike that it is nearly hard to tell them apart with the naked eye.
However, the deepfakes generating techniques leave some traces or artifacts in the video that are
not visible to the naked eye. The goal of this work is to find invisible traces and recognizable
artifacts in these videos and classify them as deepfake or real footage.
23
TOOLS FOR DEEP FAKE CREATION.
1. Face swap
2. Face it
3. Deep Face Lab
4. Deepfake Capsule GAN
5. Large resolution face masked
ARCHITECTURAL DESIGN
To enhance the model's capacity to make quick predictions. We acquired data from many
publicly accessible data sets, including Face Forensic++ (FF), Deepfake Detection Challenge
(DFDC), and Celeb-DF. In order to develop our own original dataset for precise and prompt
identification on a range of movies, the dataset was then combined with the gathered datasets. To
eliminate the model's training bias, we used a 50/50 mix of real and phony videos.
Since audio deepfake is outside the scope of this article, it is included in the dataset for the Deep
Fake Detection Challenge (DFDC). Using a Python script, we pre-processed the DFDC dataset
and eliminated the audio-altered videos.
The DFDC dataset was pre-processed, and 1500 Real and 1500 Fake films were extracted from
it. The Face Forensic++(FF) dataset features 1000 Real and 1000 Fake movies, in contrast to the
Celeb-Dataset's 500 Real and 500 Fake videos. As a result, our dataset has a total of 6000 videos,
of which 3000 are real and 2000 are fake.
In this phase, the films have been prepared to remove any extraneous material and background
noise. The video is only identified and edited to the relevant parts, focusing on the face. Frame-
by-frame segmentation of the video is the first stage in video preprocessing.
After the video is divided into frames, the face is detected in each frame and the frame is clipped
along the face. Later, the video's frames are mixed to create a new frame from the chopped
frame. A processed dataset made up exclusively of movies with faces is created by repeating the
technique for each video. The frame that doesn't have a face is disregarded during preprocessing.
24
We set a threshold value based on the mean of the total frame count for each movie in order to
maintain consistency in the number of frames. The absence of computer power is another
justification for choosing a threshold number. It is quite difficult computationally to analyze all
300 frames at once in the experimental setting since a 10-second video at 30 frames per second
(fps) comprises 300 frames. Due to the processing capacity of our Graphic Processing Unit
(GPU) in an experimental setting, we decided 150 frames as the threshold value. While saving
the frames to the new dataset, we only included the first 150 frames of the video. to demonstrate
proper use.
To show the proper usage of Long Short-Term Memory (LSTM), we analysed the frames
sequentially, i.e. the first 150 frames, rather than randomly. The freshly generated video has a
frame rate of 30 frames per second and a resolution of 112 x 112.
25
9. PROJECT IMPLEMENTATION
INTRODUCTION
There are many examples where deepfake creation technology is used to mis- lead the people on
social media platforms by sharing the false deepfake videos of the famous personalities like
Mark Zuckerberg Eve of House A.I. Hearing, Donald Trump’s Breaking Bad series where he was
introduced as James McGill, Barack Obama’s public service announcement and many more.
These types of deep fakes create a huge panic among the normal people, which raises the need to
spot these deepfakes accurately so that they can be distinguished from the real videos.
Recent advances in technology have changed the field of video manipulation. The advances in
the modern open-source deep learning frameworks like TensorFlow, Keras, PyTorch along with
cheap access to the high computation power has driven the paradigm shift. The Conventional
autoencoders and Generative Adversarial Network (GAN) pretrained models have made the
tampering of the realistic videos and images very easy. Moreover, access to these pretrained
models through the smartphones and desktop applications like FaceApp and Face Swap has
made the deepfake creation a childish thing. These applications generate a highly realistic
synthesized transformation of faces in real videos. These apps also provide the user with more
functionalities like changing the face hair style, gender, age and other attributes. These apps also
allow the user to create a very high quality and indistinguishable deepfakes. Although some
malignant deepfake videos exist, they remain a minority.
So far, the released tools that generate deepfake videos are being extensively used to create fake
celebrity pornographic videos. Some of the examples are Brad Pitt, Angelina Jolie nude videos.
The real looking nature of the deepfake videos makes the celebrities and other famous
personalities the target of pornographic material, fake surveillance videos, fake news and
malicious hoaxes.
The Deepfakes are very much popular in creating political tension. Due to which it becomes very
important to detect the deepfake videos and avoid the percolation of the deepfakes on the social
media platforms.
26
TOOLS AND TECHNOLOGIES USED
PLANNING
1. Open Project
UML TOOLS
2. draw.io
PROGRAMMING LANGUAGES
3. Python3
4. JavaScript
PROGRAMMING FRAMEWORKS
5. PyTorch
6. Django
IDE
7. Google Collab
8. Jupyter Notebook
9. Visual Studio Code
VERSIONING CONTROL
10. Git
LIBRARIES
12. torch
13. torch vision
14. os
15. NumPy
16. cv2
17. matplotlib
18. face recognition
27
19. Json
20. pandas
21. copy, glob, random, sklearn
ALGORITHM DETAILS
DATASET DETAILS
Face Forensics++
Collab-DF, Deepfake Detection Challenge
PREPROCESSING DETAILS
• Using glob, we imported all the videos in the directory in a python list.
• To read the clips and get the average number of frames, we use cv2.VideoCapture.
• A value of 150 is chosen as the optimal number for the new dataset to be created in order
to ensure consistency.
• The film is split into frames and the frames are cropped on face location.
• The face cropped frames are again written to new video using Video Writer.
• The latest film is an mp4 file with a resolution of 112 × 112 pixels and is produced at 30
frames per second.
• Instead of selecting the random videos, to make the proper use of LSTM for temporal
sequence analysis the first 150 frames are written to the new video.
PRE-PROCESSING STEPS:
MODEL CREATION:
You will be able to preprocess the dataset, train a PyTorch model of your own, predict on new
unseen data using your model.
Note: We Recommend using Google Collab for running the above code.
DATASET:
Some of the dataset we used are listed below:
• Face Forensics++
• Celeb-DF
• Deepfake Detection Challenge
28
PREPROCESSING:
• Loading the dataset
• Divide the video into frames.
• Crop the face out of each frame
• Keep the face cropped video.
CODE OUTPUT
29
SPLIT THE VIDEO INTO FRAMES:
30
31
32
33
10. SOFTWARE TESTING
FUNCTIONAL TESTING:
1. Unit Testing
2. Integration Testing
3. System Testing
4. Interface Testing
NON-FUNCTIONAL TESTING:
1. Performance Testing
2. Load Testing
3. Compatibility Testing
34
11. CONCLUSIONS AND FUTURE PERSPECTIVE
RESULTS
We offered a neural network-based method for determining if a video is a deep fake or the
genuine deal, along with the model's level of confidence. After analyzing 1 second of video (10
fps), our method can predict the outcome with excellent accuracy. To extract frame-level features
and analyse the temporal sequence to identify variations between the t and t-1 frames, we
employed a pre-trained ResNext CNN model and an LSTM. Our model can handle video frames
in the following sequences: 10, 20, 40, 60, 80, 100.
FUTURE PERSPECTIVE
Any system that has been constructed may always be improved, especially if it was created
utilizing the most recent technology and has a promising future.
• Web based platforms can be upscaled to a browser plugin for ease of access to the user.
• Currently only Face Deep Fakes are being detected by the algorithm, but the algorithm
can be enhanced in detecting full body deep fake.
35
12. REFERENCES
1) Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus
Thies, Matthias Nießner, “FaceForensics++: Learning to Detect Manipulated
Facial Images” in arXiv:1901.08971.
2) Deepfake detection challenge dataset : https://www.kaggle.com/c/deepfake-detection-
challenge/data Accessed on 26 March, 2020
3) Yuezun Li , Xin Yang , Pu Sun , Honggang Qi and Siwei Lyu “Celeb-DF: A
Large-scale Challenging Dataset for DeepFake Forensics” in arXiv:1909.12962
4) Deepfake Video of Mark Zuckerberg Goes Viral on Eve of House A.I. Hearing :
https://fortune.com/2019/06/12/deepfake-mark-zuckerberg/ Accessed on 26
March, 2020
5) 10 deepfake examples that terrified and amused the internet :
https://www.creativebloq.com/features/deepfake-examples Accessed on 26
March, 2020
6) TensorFlow: https://www.tensorflow.org/ (Accessed on 26 March, 2020)
7) Keras: https://keras.io/ (Accessed on 26 March, 2020)
8) PyTorch : https://pytorch.org/ (Accessed on 26 March, 2020)
9) G. Antipov, M. Baccouche, and J.-L. Dugelay. Face aging with conditional
generative adversarial networks. arXiv:1702.01983, Feb. 2017
10) J. Thies et al. Face2Face: Real-time face capture and reenactment of rgb videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 2387–2395, June 2016. Las Vegas, NV.
11) Face app: https://www.faceapp.com/ (Accessed on 26 March, 2020)
12) Face Swap : https://faceswaponline.com/ (Accessed on 26 March, 2020)
13) Deepfakes, Revenge Porn, And The Impact On Women :
https://www.forbes.com/sites/chenxiwang/2019/11/01/deepfakes-revenge-porn-
and- the-impact-on-women/
14) The rise of the deepfake and the threat to democracy :
https://www.theguardian.com/technology/ng-interactive/2019/jun/22/the-rise-of- the-
deepfake-and-the-threat-to-democracy(Accessed on 26 March, 2020)
36
15) https://www.theguardian.com/technology/ng-interactive/2019/jun/22/the-rise-of-
the-deepfake-and-the-threat-to-democracy(Accessed on 26 March, 2020)
16) Yuezun Li, Siwei Lyu, “ExposingDF Videos By Detecting Face Warping
Artifacts,” in arXiv:1811.00656v3.
17) Yuezun Li, Ming-Ching Chang and Siwei Lyu “Exposing AI Created Fake Videos
by Detecting Eye Blinking” in arXiv:1806.02877v2.
18)Huy H. Nguyen , Junichi Yamagishi, and Isao Echizen “ Using capsule net- works
to detect forged images and videos ” in arXiv:1810.11215.
19) D. Güera and E. J. Delp, "Deepfake Video Detection Using Recurrent Neural
Networks," 2018 15th IEEE International Conference on Advanced Video and
Sig- nal Based Surveillance (AVSS), Auckland, New Zealand, 2018, pp. 1-6.
20) I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic hu- man
actions from movies. Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 1–8, June 2008. Anchorage, AK
21)Umur Aybars Ciftci, ˙Ilke Demir, Lijun Yin “Detection of Synthetic Portrait
Videos using Biological Signals” in arXiv:1901.02212v2
22) D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.
arXiv:1412.6980, Dec. 2014.
23) ResNext Model : https://pytorch.org/hub/pytorch_vision_resnext/ accessed on 06
April 2020
24) https://www.geeksforgeeks.org/software-engineering-cocomo-model/ Accessed on
15 April 2020
25) Deepfake Video Detection using Neural
Networks
http://www.ijsrd.com/articles/IJSRDV8I10860.
pdf
26) International Journal for Scientific Research and Development http://ijsrd.com/
37