0% found this document useful (0 votes)

57 views76 pages

Ilovepdf Merged1 Merged

The project report focuses on the development of a deepfake detection system using AI, highlighting the challenges posed by advanced media manipulation techniques. It outlines the methodologies for detecting deepfakes, including the use of deep learning algorithms and the importance of dataset curation. The report emphasizes the need for effective detection systems to combat misinformation and maintain public trust in media.

Uploaded by

Ayush choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views76 pages

Ilovepdf Merged1 Merged

Uploaded by

Ayush choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 76

PROJECT REPORT ON

Deepfake Detection using AI

Submitted by

Ayush Choudhary: 1/21/FET/BCS/284

Vani Jain: 1/21/FET/BCS/262
Nishant: 1/21/FET/BCS/302

Under the Guidance of

Krishan Kumar
Associate Professor

in partial fulfillment for the award of the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

School of Engineering & Technology

Manav Rachna International Institute of Research and

Studies, Faridabad

July-Dec 2024
Acknowledgement

The successful realization of project is an outgrowth of a consolidated effort of people from

desperate fronts. We are thankful to Krishan Kumar, Associate Professor for his variable
advice and support extended to us without which we could not be able to complete our project
for a success.

We are thankful to Dr. Supriya Panda, Project Coordinator, Professor, CSE department for
her guidance and support.

We express our deep gratitude to Dr. Tapas Kumar, Head of Department (CSE-spl) for his
endless support and affection towards us. His constant encouragement has helped to widen the
horizon of our knowledge and inculcate the spirit of dedication to the purpose.

We would like to express our sincere gratitude to Dr. Geeta Nijhawan, Associate Dean SET,
MRIIRS for providing us the facilities in the Institute for completion of our work.

Words cannot express our gratitude for all those people who helped us directly or indirectly in
our Endeavour. We take this opportunity to express our sincere thanks to all staff members of
CSE-spl department for the valuable suggestion and also to our family and friends for their
support.

Ayush Choudhary: 1/21/FET/BCS/284

Vani Jain: 1/21/FET/BCS/262
Nishant: 1/21/FET/BCS/302

i
Declaration

We hereby declare that this project report entitled “ Deepfake detection using AI” by Ayush
Choudhary (1/21/FET/BCS/284), Vani Jain (1/21/FET/BCS/262), Nishant
(1/21/FET/BCS/302), being submitted in partial fulfilment of the requirements for the degree of
Bachelor of Technology in Computer Science & Engineering under School of Engineering &
Technology of Manav Rachna International Institute of Research and Studies, Faridabad, during
the academic year 2024-2025, is a bonafide record of our original work carried out under the
guidance of Krishan Kumar , Associate Professor , CSE department.

We further declare that we have not submitted the matter presented in this Project for the award
of any other Degree/Diploma of this University or any other University/Institute.

Ayush Choudhary: 1/21/FET/BCS/284

Vani Jain: 1/21/FET/BCS/262
Nishant: 1/21/FET/BCS/302

ii
Manav Rachna International Institute of Research and Studies,
Faridabad
School of Engineering & Technology

Department of Computer Science and Engineering

July-Dec 2024

Certificate

This is to certify that this project report entitled “Deepfake detection using AI” by Ayush
Choudhary (1/21/FET/BCS/284) , Vani Jain (1/21/FET/BCS/262) , Nishant
(1/21/FET/BCS/302), submitted in partial fulfillment of the requirements for the degree of
Bachelor of Technology in Computer Science and Engineering under School of Engineering &
Technology of Manav Rachna International Institute of Research and Studies Faridabad, during
the academic year 2024-2025 , is a bonafide record of work carried out under my guidance and
supervision.

Dr.Krishan Kumar Dr. Tapas Kumar

Associate Professor Professor & Head of Department
Department. of Computer Science & Department. of Computer Science &
Engineering Engineering
SET, MRIIRS, Faridabad SET, MRIIRS, Faridabad

iii
TABLE OF CONTENTS

Acknowledgement i

Declaration ii

Certificate iii

Table of Contents iv-v

List of Tables vi

List of Figures vii

Abstract viii

Chapters

Chapter Page No.

1. Introduction 1-14
1.1. Introduction 1
1.2. Problem Statement 2-6
1.3. Objectives 7-9
1.4. Methodology 10-13
1.5. Organization 14

2. Literature Review 15-16

3. System Development 17-41

3.1. Design 17-21
3.2. Algorithm 21
3.3. Model Development 21-41

4. Performance Analysis 42-54

4.1. Analysis 42
4.2. Results 43-44
4.3. Screenshots of Various Stages of Project 45-53
4.4. Comparisons 54

5. Conclusions 55-57
5.1. Conclusions 55
5.2. Future Work 56
5.3. Applications 57

6.Appendix:Code

7.Referance
iv
List of tables

Table Page No.

Table 1 Method of Deepfake Creation 4

Table 2 Model Comparison 54

v
List of Figures

Figure No. Figure Caption Page No.

Figure 1 Deeepfake Principle 2

Figure 2 Example of Deepfake 3

Figure 3 System Architecture 10

Figure 4 Input Sequence through model 28

Figure 5 Data Flow Diagram 37

Figure 6 Training and Validation Acuuracy 44

Figure 7 Training and Validation Loss 52

vi
Abstract

Audio, video, and image manipulation techniques grow with the advancements in artificial
intelligence and cloud computing. These are referred to as deepfakes, in regard to this type of
media content. The more believable ways in which computers can exercise control over media,
like duplicating a public figure's voice or placing someone's face on another, makes this
possible. The three types of countermeasures used to fight back against deepfakes are media
confirmation, media provenance, and deepfake discovery. Deepfake detection solutions rely on
multi-modal detection to identify any kind of alterations or synthesizing of target media. There
are two kinds of techniques currently in use, which include manual and algorithmic techniques.
Techniques using human media analysts with access to software are known as traditional
manual techniques. Algorithmic detection involves using AI-based algorithms to find
manipulated media.

We are developing a deep learning binary classifier that is hoped to serve one application in
resolving several of the real-world issues created by deepfakes, which involve distortion of
democratic discourse, election manipulation, a weakening of institutional credibility, a
weakening of journalism, social division, The first segment examines the technical architecture
of deepfake generation, primarily focusing on Generative Adversarial Networks (GANs) and
autoencoders. These tools enable the creation of deceptive videos and images that can challenge
human perception and traditional detection methods. Understanding these architectures is vital
to designing countermeasures. Additionally, we discuss the evolution of deepfake creation
techniques, highlighting how improvements in generative AI have exacerbated detection
challenges.
Next, we delve into AI-based deepfake detection methodologies, categorizing them into image-
based, video- based, and audio-based detection. Techniques such as convolutional neural
networks (CNNs) for image analysis, recurrent neural networks (RNNs) for temporal
consistency checks in videos, and spectrogram analysis for audio forgery detection are
explored. Emphasis is placed on identifying anomalies, such as unnatural blinking patterns,
inconsistencies in lighting, or spectral artifacts, that signal manipulation. Moreover, hybrid
models that combine multiple modalities for enhanced detection accuracy are discussed.
The discussion also addresses dataset curation for deepfake detection. Quality datasets, such as
FaceForensics++, DFDC (Deepfake Detection Challenge), and Celeb-DF, play a pivotal role
in training robust AI models. Challenges associated with dataset bias, generalization across
different types of deepfakes, and ethical considerations in dataset creation are examined.
Strategies for generating synthetic datasets to improve model robustness are also considered.
Deepfake technology, leveraging advancements in artificial intelligence and deep learning,
has emerged as a significant tool for creating hyper-realistic but fabricated audio, video, and
image content. While offering potential for creative applications, deepfakes pose severe
ethical, security, and societal threats, particularly in misinformation campaigns, identity theft,
and the erosion of public trust. This paper explores the methodologies for detecting
deepfakes, focusing on machine learning techniques, forensic analysis, and hybrid
approaches.

vii
Chapter-1: INTRODUCTION

1.1 Introduction:

Technologies of image, video, and audio editing are developing at a high pace. There has
been an increase in innovation in the areas of changing images, sound recordings, and
recordings. A wide range of methods for making and controlling advanced substances is
also available. Today, it is possible to generate hyper-reasonable advanced pictures with a
little asset and a simple how-to guide available on the web. It refers to a technique by
which essence of any one in the video is used in replacement of somebody else's essence.
It implies taking one merged portrait of the people and then introducing that image of an
individual person's face into his body. Therefore, that word refers to both product itself,
the reasonably promoted video final product. In addition to creating extraordinary
Computer-Generated Imagery (CGI), Virtual Reality (VR), and Augmented Reality (A),
Deepfakes can be used for educational purposes, animation, art, and cinema.

As the smartphone has become more sophisticated and good internet access is more the
rule, even more social media and media sharing sites have enhanced the creation and
distribution of digital videos. With increasing steady low-cost computing power, deep
learning is more powerful than ever. Such advances will have inevitably brought new
challenges. DeepFake is the manipulation of video and audio by using deep generative
adversarial models. There are many instances where DF spreads over social media
platforms that cause spamming and giving wrong information.

Such DFs are horrible and can scare people into them or mislead people into them.

An autoencoder EA replicates the essences of A from the dataset of pictures of the face of
A and an autoencoder EB replicates the essences of B from the dataset of pictures of the
face of B. To create Deepfake images, you compile changed features of two morphed
persons, An and B, you then train an autoencoder EA on replicas of the essences of A
from the set of images of the face of A, and another encoder EB. We use shared

8
encoding loads of EA and EB while keeping the decoding parts of every encoder
individual. By improving this coder, any picture with face of A can be decoded using
the decoder of EB rather than this common encoder. This principle is summarized in
figures 1 and 2. The ability to detect and counter deepfakes has become a critical area of
research to mitigate these risks. Deepfake detection focuses on identifying subtle
artifacts, inconsistencies, or anomalies in generated media that distinguish it from
authentic content. Despite advancements in detection, evolving deepfake generation
techniques continue to challenge existing models, highlighting the need for robust,
adaptive, and scalable detection systems. This paper explores the methodologies,
challenges, and emerging trends in deepfake detection, emphasizing the importance of
proactive research and collaborative efforts to safeguard against misuse.

Figure 1. Deepfake principle.

This methodology includes an encoder that combines general information regarding the
brightness, position, and appearance of the face, but a specific decoder puts together each
detail of a face and recollects its stable characteristics and features. The concerned
information must be differentiated from the morphological one, otherwise, it is not valid.
Finally, the results are excellent and hence the process is worth doing. The final step
would be to take the objective video, take the objective face from each frame, rectify it so
that it is the same illumination and expression, then use the modified autoencoder to
produce yet another face and then merge it with the objective face.

9
Figure 2. Illustration of a picture (left) being manufactured (right) utilizing the
Deepfake procedure. Note that the manufactured face comes up short on the
expressiveness of the first.

Since the Deepfake peculiarity, different creators have proposed various systems to
distinguish between actual recordings and fake ones. Although each proposed system
has its own solidity, current discovery techniques require generalizability. According to
[1], although each proposed system has its own solidity, current discovery techniques
require generalizability. The developers also accept the fact that current models appear
to be concentrating on Deepfake creation machines to take care of and focus on their
supposed practices. For example, Yuezun et al. [2] and TackHyun et al. [3] worked for
anomalies of the eye blinking in Deepfakes detection. But this can also be replicated as
Konstantinos et al. [4] and Hai et al. [5] had worked for it. A system they proposed in [4]
uses natural facial expressions such as blinking eyes for producing videos of talking
heads. Authors in [5] have come up with a model, which can generate facial expression
from a portrait. Their framework can composite a still picture to express feelings; it
includes a mind flight of eye-flickering movements. Such progressions of Deepfake
innovation make hard to identify the manipulated recording. This is a situation that
demands the acknowledgment of DF.

10
We now suggest another deep learning-based approach that would help to effectively
distinguish AI-generated fake videos, i.e., DF Videos, from the real videos. Our aim is to
develop a CNN model that would effectively classify and label deepfake videos with very
high accuracy. Based on the accuracy metric, we further establish a methodology based
on semi-supervised learning that outperforms CNN. The ResNet50 + LSTM model
developed by us has been experimented on a subset of Deepfake Detection Challenge
data. Since there were many features in the original dataset, it was impossible to train on
it. A subset of the data was taken but the same data and splits were kept.

11
1.2 Problem Statement:

The task involves designing and developing a deep learning algorithm to classify the
video as deepfake or pristine. It predicts the probability that the video is fake or not by
using DF detection, mostly a binary classification, where the input video is mp4 and the
output is a label L E ["REAL, "FAKE, and we will analyze the input video.

We can treat the image classification problem to detect deepfakes as a binary problem of
image classification. The video is 150 frames by input, where each one is 1920x1080
pixels or 1080x2040 if done vertically. We can now define the output as an indicator of
the presence or absence of DF videos, which is a binary class V * [0,1].
The formula p(Y = i|V) is the probability with which the network will tag i.

1.3 Objectives:

Deepfake detection objectives are complex and reflect the increasingly critical nature of
identifying and managing manipulated media in the context of an increasingly digital and
connected world. The ability to effectively identify and mark synthetic media will
become an imperative in a number of different sectors, from social media and news
agencies to law enforcement agencies and cybersecurity professionals. We can break
down the top objectives of deepfake detection into several key goals: identification of
manipulated content, accuracy, and reliability, improvement of generalization,
robustness, real-time detection capacity, increased trust, and responsible AI use. These set
the foundation for effective detection systems that will be used to wage the battle against
the malevolent effects of deepfakes.

One of the most apparent goals of deepfake detection is the identification of manipulated
content. Deepfake algorithms are supported by very advanced machine learning models
that can create really realistic videos and images and mimic the original. These videos,
whether they be for face-swapping, voice synthesis, or puppet-based manipulation, tend
to have some minor inconsistency that humans will

12
It is not easily detected. Central objectives for deepfake detection would include building
systems which could automatically and accurately determine if a video or image had been
tampered with, therefore telling whether content is authentic or artificial. The critical
scenario would be within the arena of news reports, political campaigns, or personalities
because dissemination can lead to great and harmful impacts: spreading misinformation,
defamation, or political maneuvering. It identifies manipulated media, which helps
prevent the spread of false information and maintains the integrity of digital content.

The other critical objective is the accuracy and reliability of detection. Deepfakes'
detection models should be very accurate about both real and fake media with a minimal
occurrence of false positives-that are real videos identified as fake and false negatives-the
fake videos misinterpreted as real. This would be achievable through several performance
metrics-precision, recall, the F1-score, and even the Area Under the Receiver Operating
Characteristic curve, referred to as AUC-ROC-through which deepfake detection models
can measure how well such a model would perform and adapt it for the best possible
efficiency. Evidently, the accuracy in the higher end will not be very close with this
version of the new variations of deepfake generation. However, it does stand as an
objective where such detection systems would then flag the content reliably with an
incredibly wide range of usages that may then potentially keep the digital media
trustworthy.

Another important objective is generalization where the model will detect deepfakes for
other types of datasets, manipulation techniques, and types of videos. Deepfake
generation techniques are highly diverse, and novel techniques are being developed
exponentially in most cases to try and avoid common detection methods. It should be able
to generalize among many manipulation techniques, that include face swapping, lip-
syncing and puppet manipulation so as not to rely greatly on any single dataset and/or
manipulation type. Generally speaking, generalization is extremely important because
tools for generation through deepfakes are no static, therefore, after change happens in
generation methods, detecting algorithms have to respond and stick well with the
methods. That deepfake detection systems can generalize so they stay effective over time
too and find new forms of deepfakes that could never have existed at training. That's an
important goal to make sure deepfake detection systems can be deployed in the dynamic

13
environment . So, there are new forms of manipulations in it.

Another important goal of deepfake detection is robustness against adversarial attacks

and variability in the quality of input. The deepfake detection model has to be robust
against different distortions such as compression artifacts, noise, changes in illumination,
angle, or resolution. Deepfake videos, in the majority of practical settings, would have
been compressed or distorted in the process of being transferred over the internet or
subjected to restrictions from a platform. The other attacks are adversarial, whose aim is
to cheat the deepfake detection system, which is expected to modify the input data in
such a way that it would result in misclassification. A detection system has to handle this
without losing performance. Robustness is extremely important so that deepfake
detection works well in real life, for example in live-stream monitoring, in which video
quality can be very variable and changeable. It means the detection systems are stronger
and more reliable in responding to adversarial attacks and so bad actors cannot somehow
bypass the detection process.

It becomes extremely more significant in real-time detection applications, where videos

should be monitored and analyzed shortly after their creation or dissemination via live
streaming, video conferencing, or uploads through social media. In this regard, a
detection system should process the content of videos in real-time to identify manipulated
media and flag it before the content propagates further. It should also be optimized for
speed and efficiency so that it could detect deepfakes in real time, meaning high-quality
video frames at a good frame rate without sacrificing precision. Pruning, quantization, or
other techniques that help models like using specialized hardware to speed things up,
such as a GPU, would be helpful. Real-time detection further requires that the system be
able to process large amounts of data in a very short time, a relevant fact of environments
where many video contents are uploaded and shared continuously.

The other goal is to increase the users' trust and confidence in a detection system.
Deepfakes break public trust in digital content. The more things the user cannot tell
between what is real and fake, the more reasons the user will have for distrust in the
digital content. Deepfake detection systems are really crucial in restoring trust by
providing transparency at the same time as guaranteeing accuracy in the media they
consume. This will be very important in fields like journalism, law enforcement, and
14
social media . Dealing out deepfake content has critical effects. The accuracy of
explanation with results from detection systems provides assurance to the user to
ascertain that any piece of content in interaction has not been meddled with and
therefore maintains originality. Above all, detection transparency is of greater
importance in developing a raised awareness of the public's danger posed by deepfakes
and of detection technology, in particular, that offers solutions to it.

Responsible use of AI then is another huge goal of deepfake detection systems. This is
because a rising power in AI in developing hyper-realistic deepfakes calls for an ethical
code and responsible use of such technology for deepfake detection. Indeed, the
detection systems of deepfakes must be designed with a social sense of ethics-that is, not
infringe on privacy and free speech rights or unduly suppress legitimate content. Ethical
issues also include detection of the fact that the detection systems are unbiased to be
used in a balanced manner over the categories of content and hence the varied altered
media from various sources or platforms can be detected without promoting any specific
group or agenda. Developing detection systems along with ethical values helps
stakeholders in building responsible usage of AI systems to detect and prevent malicious
manipulations of digital media.

Finally, such continuous research and development in the subject are behind the
incentives that promote cooperation and innovation in deepfake detection. Because
things evolve, deepfake generation has to move, and ahead of its manipulation
techniques, the method of its detection should advance as well. Therefore, such a need
for the approach calls for collaboration between researchers, developers, and
practitioners from all concerned disciplines of computer vision, machine learning,
cybersecurity, and ethics. That brings into play open-source initiatives, shared datasets,
and even collaborations between institutions, all in the name of furthering the
advancement in deepfakes detection, because constantly evolving systems would allow
more inventive solutions to such an increasingly rampant trend. For a challenging
environment such as deepfake challenges, collaboration is also needed.

Conclusion In a nutshell, the objectives of deepfake detection are as diverse as

complicated. Some of them are content identification, accuracy and reliability,
generalization, robustness, real-time detection, trust in user interaction, responsible AI,

15
and spurring innovation in the field. All these components are integral parts of the
efficient deepfake detection systems that counter the increased danger in synthetic media
with the help of advancing deepfakes.

and countermeasures to overcome them, so that society can continue believing in the
authenticity of digital content despite the growing complexity and pervasiveness of
digital mediation.

1.4 Methodology:

There is a host of tools existing for the generation of Deepfakes (DF) but surprisingly
very few tools are developed for detection. We expect that our approach toward the
detection of DFs would play a huge role toward stopping the "percolation" of deepfakes
across the worldwide web. We developed the approach, which detects all types of
Deepfakes: replacement deepfakes, retrenchment deepfakes, and interpersonal
deepfakes.

Our proposed method Figure 3 eliminates the presence of transient elements in faces by
using a combination of a CNN with an RNN because all the visual controls are found
within regions of faces and countenances often appear within a narrow region of the
frames.

Considering the same, we focus strictly on deleting highlights of faces in video areas
where face visibility has been confirmed. So, we first split the video into frames. Then
follows face detection, editing of video frames with that detected face. Finally, it collates
all those newly edited faces into making a brand new video. We then apply the
ResNet50 CNN model to segment highlights from video frames followed by an LSTM
layer for succession handling. We then conduct test-time augmentation and make
predictions. The next subsections illustrate our methodology in full detail, which
includes a helpful test augmentation approach we included within our DFDC
submission.

16
Figure 3. System Architecture

1.5 Organization:

This organization of deepfake detection systems puts together various processes,

methodologies, and technologies in place for the identification of manipulated media.
Deepfake detection indeed is a field that draws computer vision, digital forensics,
cybersecurity, and ethics in developing reliable systems that can distinguish
and automatically and correctly classify whether a piece of content is
either real or fake. The reasons are that AI and ML have been quite trending in recent
times. The normal structure for most deepfakes detection systems has several stages: data
collection and preprocessing, feature extraction, model building, training, evaluation,
and then finally, deployment. Such are the essential steps involved in
the deepfakes detection system as they will help ensure that deepfakes are
effective in every possible manipulation technique, context, and media format.

The first step in organizing deepfake detection is data collection and preprocessing. The
Data collection and preprocessing are the first steps that lead to organizing a deepfake

17
detection.
The quality and diversity of the data on which a deepfake detection system is trained
determine its success for the most part. Deepfakes for the most part have to be detected
based on video and image data, which is relatively large and comprises real as well as
synthetic media. The FaceForensics++, CelebDF, the Kaggle Deepfake Detection
Dataset, and the dataset of the DeepFake Detection Challenge are some of the most
recognized datasets for deepfake research. These datasets contain a lot of tampered
content and involve various kinds of deepfakes while generating them. Preprocessing
merely involves cleaning raw data before sending it to detect models for training. These
are image and video resizing to the same dimension, normalization of pixel values, and
video to frames or sequences which can be processed by the model. Noise reduction,
enhancement of images, and incorporating more data in the dataset through rotation,
flipping, and scaling help diversify the data and hence make it easier to apply to real life.
Feature extraction follows data preprocessing. Feature extraction is the process through
which raw data is transformed into a format that machine learning algorithms can use for
classification within deepfake detection. In the context of deepfake detection, feature
extraction is about focusing on the most subtle artifacts and inconsistencies that deepfake
algorithms inject into media. Such features might include facial expressions, inconsistent
lighting, or even erratic blinking. In feature extraction, most deepfake detection
algorithms utilize CNN's that are specifically designed for image and video analysis.
CNN's automatically learn spatial hierarchies of features like edges, textures, and
patterns, which helps in the detection of inconsistencies in manipulated content. Other
techniques used in advanced systems include RNNs and LSTM networks, particularly
with sequential data, like videos. The temporal features, for instance, the change in facial
expression or lip movement over time, play a significant role in these systems in
distinguishing between the real and fake content. Besides, we are likely to use ResNet
and XceptionNet, which are two types of CNNs, to extract multi-dimensional and
abstract features from video and image.
After feature extraction comes model development in an organization of a deepfake
detection system. This is where we would train some of the machine learning algorithms
to classify data based on features. This can also be done through various approaches that
range from more traditional approaches using machine learning models.
18
Such as support vector machines and random forests to more complicated deep learning
models, such as CNNs, RNNs, and hybrid, or combination of multiple kinds of neural
network architectures. In fact, among several models, the selection process is influenced
by some factors like data characteristics to be evaluated, complexity of targeted
deepfakes, and the massive amount of computational resources available. Deepfake
detection models are mostly used in supervised learning, where the model is trained on
labeled datasets of real and fake media. The model learns which features correspond to
which label during the training process, based on patterns observed in the data. That takes
advantage of the knowledge other images in terms of their categorization can also acquire
pre-trained models fine-tuned into the knowledge base of deepfake datasets; that is,
transfer learning in application.
The last step in organizing deepfake detection systems is training. It tunes the model's
inner parameters to get as close as it can to the true predictions over the training data.
One approach of optimization algorithms like stochastic gradient descent or Adam can
work by iteratively updating model weights based on the loss function. Overfitting is a
training issue in which the model learns the detailed characteristics of the training data
and fails to generalize to new, unseen data. Techniques used to prevent overfitting
include dropout, early stopping, and cross-validation, along with data augmentation and
using diverse datasets in order to train the model in a way that it learns to generalize
better to the real world. We then test the model against a different validation dataset,
taking into account its performance on unseen data and tune the hyperparameters
appropriately. In the overall framework of the training phase, it is aimed to produce a
model that can classify real and fake media accurately even when new techniques of
manipulation are thrown upon it or the quality of input varies.
The second step after training is model evaluation. Model evaluation is a critical
component in organizing a deepfake detection system as it helps evaluate performance
along several dimensions. Some common performance metrics include accuracy,
precision, recall, F1-score, and AUC-ROC. These help assess how well the model is at
detecting fake media while preventing false positives and false negatives. Since
deepfakes, by their very nature, will never be progressive, the old set of parameters just
can't be used to judge the assessment. The generalized application of the system across as
vast a variety
19
Datasets, manipulation techniques, and video qualities. In the case of a model trained on
one particular data, it may not generalize that well to another dataset with the other
manipulation technique. The model tests a number of datasets to ensure they are strong
and applicable in different deepfake generation methods. Adversarial testing is another
evaluation strategy. This is important in determining the vulnerability of the model to
attacks that are designed to deceive it, thus ensuring its reliability and effectiveness in
real-world applications.
The last phase of deepfake detection is how it's organized: deployment and real-world
application. Deployment means placing the trained and tested model available for use in
practical scenarios. Depending on the application, how the deployment might happen
could include its integration into other platforms-one might be social media websites, or
maybe a new news website, police or law enforcement agencies' respective sites, or
streaming. For example, social media can have deep fake detection systems that
determine a manipulated video automatically before spreading virally. Such applications
need to work on real-time processing of videos while having high efficiency and not
causing latency in moderation of content. We often rely on models that are distributed
based on the deployment of both edge computing and the cloud to spread out the work for
real-time feedback. For instance, video data being streamed in from cameras or
smartphones in live streaming can be preprocessed on site before it hits the cloud for
further inspection. Even deployment calls for constant monitoring and maintenance since
the detection system needs to constantly change to keep in step with new deep fake
techniques and latest approaches in media manipulation. It can be achieved either through
periodic retraining of the model using new datasets or fine-tuning it based on feedback
from users or stakeholders.
In conclusion, we outline deepfake detection as a process of multiple steps that begin
with data collection and preprocessing. It then continues to feature extraction, model
development, training, evaluation, and deployment. Every step is crucial in the
effectiveness of the system to detect manipulated media. As methods for making
deepfakes evolve, so must deepfake detection systems. They must accommodate the new,
creative ways to alter the media to find accuracy, reliability, and efficiency. It is likely
that the multiple applications of machine learning techniques will include CNNs, LSTMs,
and hybrid models along with strong evaluation frameworks that will be able to maintain
20
a deepfake detection system efficient enough for real-world use cases.
Among its applications such as content moderation and enforcement law there is an
organized manner in which the detection has to be presented in which developers and
researchers find a way to combat the spreading menace of deepfakes, in the effort to
preserve the integrity of digital media.

Methodology Applied:

• \data Augmentation

• Convolutional Neural Networks

• \transfer Learning

• \tResNet50

• \tLSTM

• Test Time Augmentation

Technologies Implemented:

•Python, matplotlib

•Pandas; Jupyter Notebook

• SKlearn

• Keras

• \TensorFlow Any one of them is free and not needing technical degree. Because of the
time constraint in the development cycle, these technologies are even more easy to use
and faster. Thus, we can easily implement the project.

21
Chapter-2: LITERATURE SURVEY

Deep fake video is emerging everywhere; then it threatens democracy, justice, and public trust.
This makes there a growing demand for video analysis, detection, and intervention. A few
examples are listed below:

• The trained Convolutional Neural Network model in its specific mode to compare the
synthesized faces against their context, Exposing DF Videos by Detecting Face Warping
Artifacts

• The authors are able to separate the artifacts. This paper presents two classes of facial artifacts.
According to the strategy, the currently available implementations of the DF algorithm are
capable only of generating images with a resolution level limited or restricted, which should
then be transformed to suit the size of the faces in the video source.

• Uncover AI-Generated Fake Videos by Detecting Eye Blinking [7] is another approach
that could be used to unveil the forged face recordings produced by deep neural network
models. The approach relies on detecting eye blinking in the recording, which is a
physiological cue that is not high-class in the scripted fake recordings. It has been
experimentally tested over benchmark datasets that evaluate eye-blinking techniques and
shows promising results when applied to recordings created using deep neural network-
based programming. Their technique uses only the absence of flickering as a cue for
discovery. Anyway, there are some other parameters too to be considered for
identification of deep artificial like teeth appeal, wrinkles on faces etc. We present an
approach that considers these many parameters.

• Capsule Network is another way to reach manipulated or forged video and image data
detection. This discovers manipulated and forged videos also images in the replay attack
and computer-generated video scenario. Here, they utilized their method with random
noise that must not have. Their model had a positive output on their dataset, though still
hard to survive with it in real-time data due to noise in training. For training our method,
we propose a noiseless and real-time dataset.
22
• Synthetic Portrait Videos Detection [9] detects and identifies the fake portrait videos
containing biological signals by extracting biological signals from real and fake portrait
video pairs. Change the highlight sets and PPG guides to catch sign attributes and ensure
spatial soundness and transient consistency for training a probabilistic SVM and a CNN.
Once the video has been confirmed to be probably false or true, the next verification is
carried out. This program specifically determines with accuracy whether a commodity
has counterfeit parts regardless of the generator, content, intention, or nature of the
video. It proves to be inconvenient to formulate a differentiable loss function based on
the proposed signal processing steps when lacking a discriminator results in the loss
findings.

23
Chapter-3: SYSTEM DEVELOPMENT

Deepfakes is one of the current and growing threats to security, governance majoritarianism, and
protection. Some of the credible applications of face swapping in video composing,
representation changes as well as character verification include: this is because it's possible to
replace faces within photographs with those selected from a variety of stock pictures. Digital
attackers use face swapping as a tactic where they gain unauthorized access to infiltration
verification frameworks. It becomes really difficult while using deep learning algorithms like
CNN and GAN for forensic modeling because residual images could retain the pose, facial
expression, and lighting of the originals.
Probably, images from GANs are the most difficult of all generated images by deep learning
algorithms since they may be able to be even more realistic and high-quality images as possible,
due to the fact that GAN can learn distributions of inputs and then output distributions that match
those inputs.

3.1 Design:

This makes the design of a deepfake detection system very complex and multi-faceted so as
to effectively identify and differentiate manipulated media from authentic content. At this
rate where the media produced with deep fake technology have been made nearly
indistinguishable from reality at other times impossible to spot, there's need for changes to
happen in the existing systems as a response to new challenges posed by such sophisticated
manipulations. The model design process involves several stages, such as the selection of
appropriate methodologies, model architecture, data preprocessing, feature extraction, model
training, evaluation, and deployment. Each of these stages will contribute to achieving the
goal of developing a robust, accurate, and reliable system that can detect deepfakes across a
wide range of manipulation techniques, content types, and media formats. A system like this
would therefore necessitate deep knowledge in machine learning, computer vision, digital
forensics, and ethics for a perfect design.

The methodologies coupled with models that are used for the deep fake detection systems are
critical in design. In most cases, deepfakes are detected using machine learning but deep
24
learning is the only viable option since it is capable of learning Complex patterns from big
data. Convolutional neural networks (CNNs) are the most widely used models in deepfake
detection, especially for static image and frame-level analysis. The CNNs identify subtle
visual artifacts introduced by deepfake generation methods by detecting spatial hierarchies in
images, including textures, edges, and facial landmarks. More complex models, like RNNs or
LSTM networks, can capture the dependencies and motion patterns across frames in a video
to analyze them. Such models are better suited for sequential data such as expressions, head
movements, and lip synchronization that often break in deepfakes.

Another highly exploited area in deepfake detectors is hybrid architectures. In these, CNN-
LSTM networks take the best from CNN feature extractors and LSTMs for sequential data.
This helps the detector system view video in aspects both spatial and temporal, which in turn
helps catch many of the deepfakes. This hybrid approach makes the detector system aware
not only of single anomalies in frames but also, over time, the temporal inconsistency such as
awkward facial movements or unnatural movement-also a common characteristic for most
deepfake videos. Advanced systems even probe about the possibility of transformer models
and the use of attention for feature extraction for subtle detection about both spatial and
temporal features to ensure a high level of accuracy in their findings.

Apart from the above selection process of suitable models, data collection and processing is
equally important. The success of the deep fake detection system majorly depends on quality
and diversity in training data. For deepfake detection, big-sized high-quality datasets are
necessary, both in authentic and synthetic media. Sometimes, the basis of research and
development would be other prime datasets, such as FaceForensics++, DeepFake Detection
Challenge, and Deepfake Detection Dataset on Kaggle. Such datasets consist of videos over a
very wide range of subjects - from face-swapped celebrity faces to voice synthesis, which
also helps train the models on the various types of media-manipulation.

At the preprocessing stage, raw video and image data will be prepared through various steps
so that it will be appropriate for training the deepfake detection model. In preprocessing, it
involves some tasks like resizing images and videos into uniform sizes, converting video files
into frames, normalizing pixel values, and may involve the change of video format for proper
use with deep

Learning frameworks. Preprocessing would include data augmentation: rotation, flipping,

25
cropping images-to make a dataset look like it is much larger than the actual size, providing
the model with more chances to experience many different types of transformation, making it
more generic. For video content, temporal consistency must be taken into account because
the differences in the frame rates and lengths of videos may significantly impact the analysis
of deepfakes' characteristics.

Feature extraction is the most important part of deepfake detection. After pre-processing
data, meaningful features should be extracted to find whether content is real or fake. Feature
extraction in the traditional image analysis usually deals with low-level features such as
edges, textures, and color histograms. But the detection systems based on deepfakes detect
more complex, higher-level features that are probably unnatural facial movements,
inconsistent lighting, or artifacts coming from the algorithms themselves. Perhaps one or a
combination of those methods might be applied in order to extract that: CNN is applied in
recognizing pixel-level inconsistency, auto-encoders/GANs to establish models about normal
facial and motion patterns. At other times, the systems make use of optical flow analysis in
determining movements of facial muscles or eyes, which are altered in most deepfake videos.

The model gets trained after feature extraction during the design process. In this training
phase, the model learns to identify the difference between real and manipulated media.
Deepfake detection systems are very commonly trained through supervised learning, which is
training a model on a labeled dataset containing both real and fake samples. This model
learns to associate specific features with the real or fake label by using a technique called
backpropagation. In this technique, the model's weights are updated so that the difference
between the predicted and actual labels decreases. Dropout, early stopping, and cross-
validation techniques are applied during training to avoid overfitting. Deepfake datasets are
very large and complex, which means this step demands much computing power.
Accelerated processing can be achieved using Graphics Processing Units or Tensor
Processing Units.

This is done through a process called transfer learning, whereby the model obtained is fine-
tuned, applying a pre-trained model for some task, for instance one trained on a massive
repository of real-world images over a deepfake-specific set, making use of learned
characteristics to adapt to deepfakes.

More efficiently. Transfer learning has especially proved very helpful when labeled deepfake
26
data are scanty as it reduces the volume of the needed training data and increases the speed
with which the model reaches a high accuracy level.

The training process of the system follows the model's evaluation for the second stage; this
will ensure that the system models are actually reliable for distinguishing a real image from
the deepfake ones. Main evaluation metrics include: Accuracy Precision Recall F1-score and
Area Under the Receiver Operating Characteristic Curve. These metrics give insight to the
performance of the model, especially considering how well it identifies fake media without
incorrectly flagging real media as fake. Testing also is performed on the model with various
different datasets, manipulation techniques, and video qualities for testing its generalization
abilities. Because the algorithms behind deepfakes are evolving constantly, the detection
mechanisms need to be flexible enough to adapt to new and unseen manipulation techniques.

We should measure the deepfake detection systems in terms of their robustness against
adversarial attacks, besides other traditional performance metrics. Because the attacker might
try to build deepfakes which are particularly designed to evade the detection system, it
becomes an integral part of the design process to test the robustness of the system. The
robustness tests may include testing a model in conditions such as compressed video, low
resolution, or even noise, which prevail in most real-world scenarios.

The last step in the design process is deployment. In this phase of deployment, we embed the
trained model into the real-world applications to let it detect deepfakes across various media
formats: social media platforms, video streaming services, and news organizations. It
requires ensuring that the model can handle real-time detection, especially in live streaming
or fast-paced media consumption. This will usually be smoothened through the model and
ready to run with video streams holding as little compromise on speed as possible. This will
also call for edge computing and cloud-based systems to deal with volumes of data and
lessen latency.

In conclusion, designing deepfake detection systems has various stages which are important
to ensure that the system can detect manipulated media effectively and correctly. This
includes selecting an appropriate machine learning model, collecting data, preprocessing,
feature extraction, training, evaluation, and eventually deployment-all steps that need to be
thought through properly.

They are designed to address the emerging challenges of deepfake technology. Deepfake
27
technology is always evolving, and detection systems need to evolve as well. Therefore, it
requires the latest research in machine learning, computer vision, and artificial intelligence.
These are robust, scalable, and reliable systems that can help society identify and combat the
harmful impacts of deepfakes on society.

3.2 Algorithm:

1. Beginnings
2. Upload Deepfake Detection Challenge Image dataset
3. Importing all necessary Libraries
4.Data Preprocessing
5. Developing and Training ResNet50 + LSTM Model
6.ResNet50 extracts features
7.LSTM for sequence processing
8.Test Time Augmentation
9.Predict on Test Dataset
10.Conclusion

3.3 Model Development:

Dataset: -

Learning Learning from data is at the core of deep learning. To achieve excellent learning
quality and accurate predictions, careful dataset preparation is vital. MTCN, YouTube, and
Deep Fake Detection Challenge datasets are being used in equal quantities for our mixed
dataset. The nature of these recordings commonly incorporates standing or sitting
individuals, either confronting the camera or not, with a wide scope of foundations, light
conditions, and video quality. Preparation recordings aim at 1920 × 1080 pixels, or 1080 ×
1920 pixels when captured in vertical mode. The dataset is built by an all-out of 119,146
recordings with an extraordinary name, either valid or fake in a prep set, 400 recordings on
the validation
28
The test set contains 4000 private anonymous recordings. We should be able to train,
evaluate the model on that 4000 recordings of test set using the Kaggle infrastructure in spite
of the fact we cannot look at them directly. There are 1: 0.28 manipulated real recordings.
There are only names in the 119,245 prepared recordings, and for that reason, we use all this
dataset to prepare and train our approach. The prepared recordings are divided into 50
numbered sections. Our preparation process uses 30 sections, our validating process uses 10
sections, and our testing process uses 10 sections.

The private set, in testing, evaluates submitted strategies in the Kaggle environment and
returns a log-likelihood loss. Log-likelihood loss penalizes very heavily the phenomenon of
being simultaneously confident and wrong. In the worst case, the assumption that a video is
valid when it actually is manipulated or vice versa adds inﬁnity to your Every video gets an
interesting name telling if it has control or not. It doesn't say though whether that control is
on face or sound, or both. Since our method only is considering video evidence, fixed
recording using audio controls only results in labels being very noisy. Since the video gets
labelled as a fake while its faces will be real, this makes it loud since all those extra people
that get occasionally in the video have one on which is having face control attached. error
score. Given that, in real experiments assuming the worstcase happening the loss becomes
decreased into a super-massive value.

This is another test of this evaluation system because methods that perform well on measures
like accuracy can make some really humongous errors in the case of log-likelihood. Our
newly prepared dataset comprises half of the first video and half of the controlled deepfake
recordings. It was divided into 80% and 20% training and testing sets respectively.

Classification Setup:

In the following model, X is the input set, Y is the output set, and f is the prediction function
of the classifier that takes values in X as input to the action set A, the random variable pair
(X, Y) assumed to take values in X * Y. The chosen classification task for minimization is

the error E(f) = E[l(f(X), Y)], with l(a, y) = 1/2 (a − y) ^2.

29
It defines the problem, collects data, and develops a strong model to set up a classification
system for deepfake detection. Most of the tasks involve binary classification (real vs. fake)
or multi-class classification if it's necessary to classify the types of manipulation. Data
collection is the most important task; the most common datasets are FaceForensics++ or the
DeepFake Detection Challenge dataset. Preprocessing steps such as video frame extraction,
normalization of data, and augmentation will make the model better. The method applied to
the feature extraction process is based on identifying inconsistencies of facial landmarks,
lighting, or temporal patterns in videos. Models that are typically used are CNNs, like ResNet
or XceptionNet, to analyze on images or RNN/Transformers to analyze for temporal patterns
in videos. Training and splitting the dataset with appropriate loss functions and dropout or
early stopping is used to avoid overfitting. Accuracy, precision, recall, and ROC-AUC make
up the evaluation metrics. Once validated, it can then be used for real-time detection or an
application and will be further improved upon through adversarial training and data
collection to further improve its robustness against sophisticated deep fake attacks.
Model selection is important. Convolutional Neural Network such as XceptionNet
They include ResNet, popular in the field of image analysis because it could capture spatial
features. There are recurrent models, LSTMs, and even the newer Transformers, useful in
video analysis for detecting patterns that might be temporal. In a scenario where the dataset is
specific, transfer learning comes into use; most development accelerates by fine-tuning on a
pretrained model. This entails splitting the dataset into train, validation, and test. Loss
functions such as cross-entropy are then applied on classification tasks. Optimizers such as
Adam or SGD speed up convergence, and techniques like early stopping and dropout prevent
overfitting.
The metrics used to evaluate its performance are accuracy, precision, recall, F1-score, and
ROC-AUC. It tests for its generalization by taking its strong system and trying on unseen and
adversarially generated data. It gets deployed in real applications once validated. It often
packages APIs or tools with processes that run in batches or real-time. The system requires
continuous improvement after deployment with the use of more data and adversarial training
to follow the changing deepfake methods. Scalability, ethics, and low latency, in addition to
the above necessities, are also important requirements for practical use cases involving
content moderation, legal investigation, or cybersecurity.

30
These are then developed to use with PyTorch and Keras 2.1.5 using the latter as Python 3.5
and with default parameters for β1 = 0.9 and β2 = 0.999 ADAM used in the optimization of
the weights of the network through frames successive in size 224 × 224 × 3.

Preprocessing: -

First, the video is pre-processed. It splits the video into frames. Then faces are recognized,
and then trimming of video frames containing the recognized faces happens. We find a video
mean dataset and derive a new set of handled face trimmed one which contains the equivalent
of mean frames so that we can equally keep the number of the frames. At stages of pre-
processing frame without face is/ were ignored. Since all the visual controls are confined to
the face regions, and faces generally occur in a small region of the frame, using a system that
focuses features from an entire frames is not good. Instead, we focus on extracting features
only in regions in which a face exists. As processing the 10-second video at 30 frames each
second, for example full out 300 frames will require lots of computational power. Thus, we
propose using only 150 frames for training the

model for experimental use.

In general, preprocessing is the appropriate process for making a deepfake detection model.
The preprocessing ensures the input data to be consistent, clean, and ready for analysis. Such
preprocessing involves gathering a robust and diverse dataset with both manipulated and real
media. Some datasets used are FaceForensics++, CelebDF, and the DeepFake Detection
Challenge, with various contents of real and synthetic items for training. After collecting the
dataset, for video data, we extract frames such that we extract single frames at a fixed rate of
frames per second, here 1 frame per second to maintain coherency in time while also
reducing the computational load; through frame extraction, deep fake techniques may
concentrate their efforts on content areas mainly being faces.

The most important process is face detection and cropping. We often use algorithms, Haar
cascades or MTCNN. Sometimes, it's more about Dlib for single-face detection, isolated as
well in pictures as within video frames, so what we do in practice in such a context is
limiting the model's focus, not to process all unnecessary data from areas where actually
nothing of value would interest us. The other characteristic of standardization is when faces
31
that are detected are aligned with the help of landmarks that can be eyes or even the nose;
this further helps in standardizing data. This ensures uniformity in the orientation of the face
and thus easier to maintain model on detecting incongruence without variability from head
tilt or placement.

This scales the images to a size of 224 x 224 pixels. Coincidentally, this is very popular as an
input dimension for many deep models including CNNs. It simply ensures compatibility in
architectures on some models while conserving scale and consistency within the input data.
Pixel values are standardized to a range; an example might be within the bounds [0, 1] or in
[-1, 1]. Doing this has standardized input data for this problem which makes that faster
convergent when training this kind of model. For many of the standard deep-learning
libraries, standardizing this also has good prevention from being numerically unstable. End.

Another important pre-processing aspect is data augmentation that helps in enhancing the
model's generalization capability. Methods such as random rotation, flipping, scaling, and
translation bring variations to the data, making it more diverse in mimicking real-world
conditions. Flipping a face horizontally could be a simulation of multiple views, and rotation
and scaling

We introduce randomness in camera angles or distances. Color adjustments in brightness,

contrast, or saturation differences are also fed into the model so it doesn't care about the
lighting variation. The noise injection techniques with the inclusion of Gaussian noise
addition or blurring mimic the compression artefacts present commonly in internet videos.
This would add a level of robustness for the model detection of media manipulation.

Subtle yet noticeable deepfake artifacts are crucial in preprocessing. Compression artifacts,
such as pixel irregularities or unnatural textures, are much more prominent in synthetic
content because of the encoding processes involved in deepfakes. Among the most important
areas of attention involves temporal artifacts, especially in videos due to inconsistencies
between frames. For instance, unnatural blinking or abrupt motion transitions indicate
manipulation. We extract and highlight these artifacts during preprocessing so that the model
learns effective patterns to distinguish real content from the fake.

Temporal smoothing preprocesses adjacent video frames so that the model can identify
inconsistencies from one frame to another. It is very useful for the detection of manipulations
like lip-sync mismatches, which are hard to identify through single-frame analysis. Advanced
32
techniques, like spatiotemporal smoothing, combine spatial and temporal features to ensure
the preprocessing pipeline is more inclusive.

Feature extraction is an optional but highly powerful preprocessing step to clean up the
dataset further. We're going to be using OpenCV or Dlib for facial landmark extraction, so
we get a structural view of the face. Those areas are where the manipulation is going to take
place-the eyes, the mouth, and the nose-to focus the model's attention on. Another way to
detect some unusual patterns in an image's frequency is doing the frequency domain analysis
with Fourier transforms, often exposing mistakes made at the moment of creating such a
deepfake. These add extra features, which make the input data richer and contribute to the
improvement of the model for better manipulations detection.

Label encoding is one of the most easy yet basic preprocessing steps, in which all the data is
assigned specific labels. For example, in a binary classification task, real is assigned 0, and
fake is given 1. In multi-class classification, it is further extended by allowing more labels for
manipulation, such as face-swapping as well as lip-syncing. Proper labeling will help train
the model with the correct relation between the input data and their corresponding outputs.

The prepared data is divided into training, validation, and test sets. This ensures that the
model is tested on data it has never encountered before, thus avoiding overfitting and
ensuring generalizability. To prevent biased training, a balance must be achieved in real and
fake content across these sets. Data is also shuffled and organized into batches for improved
training pipeline. It's batching for efficient utilization of memory in training, and shuffling to
avoid unwanted patterns learned from ordering in the data.

Preprocessing is also concerned with pipeline optimization for real-time applications: low
latency and high throughput. Above techniques, including preloading of frames or efficient
face detection algorithms such as RetinaFace, can be used to speed up the preprocessing
time. The preprocessing pipeline is built modular, so that change in one step does not disturb
the entire workflow. Thus, new data augmentation techniques, improved face alignment
algorithms, etc., can be plugged into the workflow without affecting the workflow.

In summary, the preprocessing process for a deepfake detection model is multi-stage

preparation of the dataset. This would start with data acquisition to frame extraction,
followed by face detection and alignment, resizing, normalization, and augmentation. This
ensures the data is ready to learn from, always looking for artifacts, inconsistencies in the
33
domain of time, or even improved features such as facial landmarks or frequency domain
patterns. It equips the detection deepfake model with great reliability and accuracy to pick up
artificially manipulated media with careful stratification, splitting into sets: training,
validation, a test set, efficient batching with shuffling, and data preprocessing steps.
Currently, the pipeline enhances generalization capability of such models and gives it real
robust challenges toward challenges where scalability and speeds form a core part.

Model:

In deepfake videos, inconsistencies are found between frames and within the frames. LSTM
and CNN are utilized in the recognition of deepfake recordings using a temporal-aware
pipeline.

technique. CNN is used to remove frame-level elements which afterwards used in LSTM in
forming a succession descriptor. Model follows one LSTM layer after the residual ResNet50
network, at last a fully connected one for the task of grouping altered with the genuine ones
based on this succession descriptor. In pre-handling the data from recordings, Data Loader
places all face-trimmed pre-handled recordings that break them into train and test sets. In
addition, the frames of the processed recordings are submitted to the model for preparation
and testing in mini-batches. The recognition network formed by fully connected layers is

This is done to take on the category label as a variable, calculate the probabilities of the
frame sequence belonging to either the true or the deepfake class.

A deepfake detection model is an advanced kind of machine learning system specifically

designed to find out manipulated media. It finds visual, temporal, and even artifact-level
inconsistencies through analysis in determining real against synthetic content.

CNNs are essentially the backbones of image-based analysis, and three architectures named
XceptionNet, ResNet, and EfficientNet have proven to be very effective. The models achieve
high accuracy in identifying pixel-level anomalies, irregular lighting, unnatural facial
expressions, and inconsistencies in facial landmarks. Video-based detection uses RNNs and
LSTM networks to scan for anomalies in sequential patterns like blinking, lip-syncing, or

34
motion dynamics that are inconsistent from frame to frame. Hybrid models combining CNNs
with LSTMs or transformers effectively address the problem, because both spatial and
temporal features can be captured, which means manipulations of this complexity can be
detected.

Emerging techniques such as Vision Transformers (ViT) along with spatio-temporal attention
mechanisms are revolutionizing the game by improving the analysis on multi-dimensional
data, giving them the acumen to uncover very subtle and adaptive deepfake methods.

An unsupervised learning model form is autoencoders, pointing out anomalies in

manipulated media through reconstruction of inputs, by which detection mechanisms take
place through reconstruction errors. Another category is adversarial models, such as GAN-
based models; on one hand, GANs are used to make deepfakes, and adversarial training with
GAN-based detectors makes models learn to identify generated artifacts more.

Examples include models like XceptionNet, EfficientNet, fine-tuned to specific datasets such

The strength of these approaches lies in their robustness to the limitations of computational
resources or labeled data. In ensemble learning, predictions are aggregated from multiple
models, hence improving detection accuracy based on the strengths of different architectures.
Real-world applications often demand low-latency, high-throughput systems for real-time
content moderation or forensic analysis. Moreover, adversarial attacks and generalizability
are two major challenges requiring ongoing training with different datasets and evolving
techniques. Last but not least, the final challenge would be to produce systems that are
scalable, ethical, and resilient enough to deal with the continually advancing threat from
deepfake technologies in domains like cybersecurity, media verification, and legal
investigations.

Figure 4. Input Sequence through model

35
ResNet CNN for Feature Extraction:

ResNet is a deep network architecture highly effective for CNN within feature extraction in
deepfake detection because it can model complex patterns within image data and mitigate the
vanishing gradient problem within a deep neural network. ResNet is an innovation of
Microsoft Research. In residual learning, the network learns to map the differences or
residuals between the input and output rather than the full transformation by applying
shortcut connections, where input skips one or more layers directly to output. With shortcut
connections, the model allows training much deeper networks with no degradation in
performance. Here, ResNet is employed owing to its good spatial feature extraction and
detection capability in the context of deepfake detection.

Detect subtle anomalies, and identify artifacts that distinguish real images from manipulated
content.

With hierarchical architecture design, ResNet can have the power of feature extraction: it is
essentially a stack of residual blocks. Each residual block consists of convolutional layers,
batch normalization layers, and ReLU activation layers connected with a shortcut that
directly adds the input and the output of the block. So it learns low-level features like edges,
texture, and high-level ones like facial structures and patterns. These capabilities are
particularly important in the application of deepfake detection where such minute
inconsistencies, including the irregular textures, unnatural lighting conditions, or mismatched
facial landmarks, could differentiate between real and fake images. For example, artifacts
introduced by the generative models like GANs may show themselves as slight pixel-level
inconsistency in which ResNet is well-equipped to detect from hierarchical feature maps.

In practical application, ResNet is mainly used as a backbone model in deepfake detection

systems. Video frames or images are normalized and resized during preprocessing to the
fixed input size, that is 224x224 pixels, for compliance with ResNet architectures. Then it
feeds into the network with convolutional layers extracting spatial features. The deeper layers
of ResNet contain more abstract patterns, such as facial symmetries, lighting consistency, or
object relationships.

One of the key advantages of using ResNet for feature extraction is that it is scalable. There

36
are variants of ResNet-18, ResNet-50, and ResNet-101, and these offer different depths to
help researchers balance computational efficiency with feature complexity according to the
requirements of the detection task. For example, ResNet-50 is a very popular choice, which
has 50 layers and is good enough for detecting deepfake artifacts in moderate-sized datasets.
Conversely, if finer details have to be extracted in a scene then it is used, namely, ResNet-
101. Secondly, pre-trained ResNets can be further used as for deepfake detection which
happens to be pre-trained versions based on the ImageNet but on much lesser computational
time.

Transfer learning for feature extraction from ResNet considers yet another approach that uses
the weights learned from an existing network to fine-tune to work with the task at hand. For
deepfakes,

It freezes the early layers that are used to capture the generic features and fine-tunes the
deeper layers to specialize in catching the deepfake-specific patterns. The idea is that since
there is a massive amount of knowledge encoded within ResNet's pre-trained weights, this
would help the network detect both the general anomalies and the deepfake-specific
anomalies. Transfer learning has proven particularly effective for this type of scenario: a very
common issue for any research into deepfakes.

Apart from the spatial features, ResNet can also be integrated into a temporal model for
deepfake detection on videos. The systems that merge ResNet with an RNN or LSTM
network are capable of evaluating the sequence of frames for identifying temporal
inconsistencies which occur in unnatural blinking patterns as well as abrupt motion shifts. In
these setups, ResNet would work as a feature extractor for independent frames and the high
dimensional feature vectors are passed forward into the temporal model. This hybrid
approach would integrate the strengths of spatial analysis by ResNet with those of RNNs or
LSTMs to provide holistic detection capabilities for manipulations in video data.

The output of feature extraction using ResNet is normally a high-dimensional feature map
applied as input to a classification head, which is a fully connected layer followed by softmax
that assigns probabilities to all classes. For example, applications for multiple-class deepfake
detection shall distinguish different manipulation types, thus, the head of classification may

be taken with the number of output nodes corresponding to each type of class.

37
It takes care of the adversarial examples and other artifacts of deepfake techniques as well,
which the advance deepfake methods come with. For instance, many of the deepfake
algorithms carried compression artifacts or even the error of blending, almost impossible to
detect by traditional ways. It features a deep architecture mechanism based on the residual
learning that is good enough to capture these minor inconsistencies, making it highly
dependable for detecting even some quality deepfakes. The architecture of ResNet further
supports data augmentation techniques such as random cropping, flipping, and rotation,
which improves generalization capabilities and robustness to variations in input data.

The practical applications of ResNet-based models apply in content moderation, legal

investigations, and cybersecurity. They are versatile and highly accurate to apply in
deepfakes detection across all domains such as social media and digital forensic applications.
However, real-time applications of ResNet models necessitate the optimization of these
models regarding computational efficiency. This can be achieved with model pruning,
quantization, and edge inference.

Although ResNet is very robust, it still has weakness: the quality and diversity of training
data determine its strong performance. Model generalization is typically diminished through
overfitting whenever training data have bias or high dependence on deepfakes. Researchers
always incorporate new elements in the training set, including adversarial training to improve
the robust architectures against these constantly evolving algorithms of deepfakes.

In conclusion, ResNet is a very powerful CNN architecture for feature extraction in deepfake
detection. Techniques such as transfer learning, hybrid modeling, and data augmentation can
be used with ResNet-based systems to achieve high performance in the detection of
deepfakes, which address both spatial and temporal inconsistencies. The above problems still
exist, including limited data and computational demands; model optimization and dataset
curation are rapidly evolving and help push ResNet in its ability to provide timely support
against deepfakes.

LSTM for Sequence Processing:

LSTMs, being a subset of Recurrent Neural Networks, have drawn much attention lately in
the area of deepfake detection due to the very fact that they are perfect for sequential data

38
processing. LSTMs don't process the inputs independently in the same way feedforward
networks do; they capture the dependencies in time by holding over some memory for long
sequences. That comes very handy in video-based deepfake detection, because temporal
consistency between

Such sequence is really needed for the detection of content manipulated with regard to
consecutive frames. This, actually allows LSTMs to recognize such inconsistencies that
might not appear if being studied individually, like unnatural face expressions, inconsistent
patterns for blinking eyes, rapid motions and unrealistic movements -all often artifacts of
generation processes at the deepfake stage.

A strong strength is provided through their architecture. LSTMs have memory cells that
assist in retaining information for long time periods; they eliminate the problem known as
vanishing gradients where the traditional RNNs lose the ability to handle long sequences in
the past. There is a cell state within an LSTM and there are three gates, each of input, forget,
and output gates that dictate information flow. The input gate decides what should be fed
into the cell state, the forget gate determines which to forget, and the output gate determines
which should be forwarded to the next layer. This architecture captures very aptly the long-
term dependencies and neglects the unnecessary data: LSTMs are more appropriate for
applications in which the context of what has been fed in previously drives the interpretation
of what goes into the system later on. This means an LSTM can track facial movements or
background elements changes over several frames that may indicate a tampering pattern in
deepfake detection.

LSTMs can, therefore, be used in deepfake detection and process video data frame by
frame. This means that the model learns temporal patterns and detects inconsistencies which
may emerge over time. For instance, in the creation of videos, algorithms associated with
deepfakes cannot create natural, synchronized facial expressions. This creates difficulties in
maintaining constant eye movement or lip-syncing between frames. This will allow the
LSTM to identify anomalies like unnatural blinking patterns, inconsistent eye movement, or

lip movements that do not conform to the temporal characteristics of a real video. In
addition, the background, lighting, and motion patterns of deepfakes will have some subtle
inconsistencies across frames.

In deepfake detection systems, LSTMs usually collaborate with CNNs on feature

39
extraction. Although the CNN is highly efficient with spatial features of individual frames,
the LSTMs are highly efficient in capturing temporal relationships between those frames.
This hybrid approach makes it possible for the system to take Advantages: uses spatial
information and temporal information. In the normal process, a CNN will first extract spatial
features from each frame of video data and then feed these spatial features to an LSTM
network that further processes the temporal sequence. The output of this LSTM will then
be fed into a fully connected layer or a classification head that determines whether the video
is real or fake.

The advantage LSTMs add to deepfake detection is the fact that they can model dynamic
change over more than one frame. Any true video will show smooth changes over frames
with consistent motion and realistic dynamics. Anomalies in a deep fake may include facial
expressions suddenly changed, unnatural motion, and inconsistent illumination-things which
are exactly what the LSTM model will detect. Besides this, in video manipulation,
algorithms may fail to model the relations of a subject's movement and expressions
precisely, therefore creating unnatural sequences. A distorted sequence may be captured
because LSTMs learn those patterns that should have otherwise occurred in the case of
genuine videos and know when that pattern has been broken down by the process of
tampering.

The quality of training data largely determines the success of LSTMs in the task of
deepfake detection. Since LSTMs rely significantly on learning sequential patterns, the
model needs to be exposed to a diverse set of videos, both authentic and deepfake content,
so that it captures temporal dependencies of real-world video sequences. Moreover,
preprocessing of data is high in importance for the improvement of the LSTM model. Video
frames have to be extracted, resized appropriately and normalized, so that the model is
getting input in a similar way. Face detection and alignment can help the model concentrate

on the parts of the video containing the face-most frequently manipulated parts of
deepfakes.

In practice, a deepfake detection system might be composed of multiple layers of LSTM

units, the more abstract temporal patterns and patterns being captured at deeper layers. For
example, the lowest level of abstraction might simply capture frame-to-frame motion but
progressively deeper layers may focus higher-level patterns like transitions from facial
40
expression or long continuity of motion. A common end layer is softmax, which produces
output classification probabilities, for example, on whether a video being shown is real or a
computer-generated fake, sometimes in multi-class format which involves more than two
potential solutions.

It detects various types of deepfakes including face swaps, voice impersonations, and many
more.

It is very difficult to train an LSTM on the deepfakes detection, especially by overfitting in

the model and the fact that the dataset is imbalanced. The majority of datasets for deepfakes
are imbalanced because there tend to be significantly more videos showing real cases than
manipulated ones. Oversampling, undersampling, or using a weighted loss function can
reduce the imbalance. The other issue is overfitting because a model learns its training data
rather than generalizing over new, unseen videos. Regularization is applied to avoid
overfitting for instance dropout or L2 regularization. Moreover, the model generalizes better
if the dataset is augmented using different transformations for instance a rotation, scaling or
variation of brightness, etc.

Even for anomaly-based detectors, LSTMs apply this logic to deepfake detection because
such logic can be used to make the detection of temporal anomalies of facial expressions.
They might also be integrated into deeper, more complex detection architectures like a
hybrid system that blends LSTM networks with other popular machine learning models such
as CNNs, Vision Transformers, and even Generative Adversarial Networks. Especially
when applied to the detection of high-level deepfake manipulations such as voice synthesis
and face swapping, multi-model approaches that contribute either to spatial or semantic
feature extraction as well as LSTM-based hybrid systems extracting temporal relations
among frames are particularly useful.

Recently, LSTMs have also been used in detecting other types of media manipulation that
include audio deepfakes. If used on audio, LSTMs will process the sequential nature of
sound waves and identify unnatural speech patterns different from those found in authentic
records. Hence, by analyzing dependencies in both visual and audio modalities, LSTM-
based systems can identify multimodal deepfakes, including video and audio manipulations.

These LSTM networks can identify fine grain variations that emerge due to deepfakes by
taking into consideration the temporal dependencies that are present in the video frames
41
across different time steps. LSTMs are useful because of its ability to capture the dynamics
that describe real video sequences and the detection of anomalies like unnatural facial
movement or blinking or even lighting.

This is in distinguishing between the actual content and manipulated media. Used with
other models, such as CNNs, LSTMs provide a comprehensive solution to the problems that
appear from deepfakes detection; this allows systems to analyze both spatial and temporal
patterns effectively. Despite the overfitting and optimization problems coupled with poor-
quality training data, the flexibility and efficiency of LSTMs make it a great addition to the
growing area of deepfake detection, especially for video-based applications in which
temporal coherence is critical.

Training Process: -

The essential components of Deepfake detection include training, validation, and testing.
Training is the core of the proposed model. This is where learning actually happens. In order
for DL models to fit specific domains of problems, designs and fine-tuning become
necessary. We need to search for parameters that are optimal to train our dataset. The training
and validation components are also the same. We fine-tune our model during the validation
process. The validation module monitors the performance and accuracy of DeepFake
detection in training. A specific video is categorized and declared by the testing module
through finding the class of the extracted faces. The testing module helps to aid the research
goals.
Thus, feature learning forms part of the model, while classification is the other. FL is
essentially feature extraction that can be learned from face images through analysis.
Classification will take the FL input and transform it into a sequence of pixels that the final
process of detection can then use. FL is basically a method that implies convolutional
operations stacked in a stack-like fashion; the feature learning component uses an
architecture inspired by the ResNet50.

We begin with the pre-trained ResNet50. We then add in the LSTM and Fully Connected
layers with an arbitrary weight system. The network is good to roll from top down with the

42
parallel cross-entropy loss (BCE) work with the LSTM expectation. The BCE loss is run on
trimmed countenances from casings of a randomly chosen video
Note that this loss is controlled by the result probabilities of recordings video level forecast.
The BCE applied to refreshes the loads. The BCE applied to refreshes all weights of the
outfit except ResNet50.
While training the complete group from start to finish, we initialize the process with an
optional pre-training step consisting of 2000 epochs of random crops just to have a
preliminary set of model's parameters. In our experiments, this did not lead to an increased
discovery accuracy but rather to faster convergence and significantly more stable training
procedure. Because of the computational constraint of GPUs, the network size, and the
number of info frames, only one video can be processed at one time. However, after
processing every 64 recordings, the network parameters are updated for the binary cross-
entropy loss. With a learning rate of 0.001, Adam is used as the optimization technique. The
training process for deepfake detection involves a systematic approach to ensure the model
can accurately identify synthetic media. It begins with the collection of diverse datasets
containing real and deepfake content, sourced from publicly available repositories like
FaceForensics++ or Celeb-DF or custom-curated datasets. Preprocessing follows, where data
is optimized through resizing, normalization, and augmentation to enhance variations in
lighting, scaling, and occlusions. Next, the model architecture is selected, often involving
convolutional neural networks (CNNs) for spatial analysis and recurrent neural networks
(RNNs) or transformers for temporal inconsistencies in videos.
The training phase employs supervised learning, where labeled real and fake data help the
model learn distinguishing features. Techniques like cross-entropy loss and backpropagation
refine the model's weights. Validation using a separate dataset ensures performance
monitoring and helps prevent overfitting, employing methods such as early stopping and
regularization. Post-training, the model’s effectiveness is evaluated against unseen data using
metrics like accuracy, precision, recall, and F1 score. To ensure robustness, adversarial
examples and variations in quality and compression are tested.

43
Figure 5. Data Flow Diagram

44
Evaluation: -

The evaluation of deepfake detection systems is an important process in considering the

effectiveness of detection in varied contexts and realistic real-world settings. Deepfakes are
rapidly gaining importance due to the growth of this technology in synthetic media, from
moderation of social media to legal forensic application. Performance metrics, dataset used,
robustness against manipulation techniques, generalization to unseen data, and computational
efficiency constitute the evaluation process. Along with these advancements in
sophistication, there is a critical need for effectiveness under practical conditions which
necessitates evaluating the detection systems in a thorough and comprehensive manner.

One of the important metrics to evaluate a deepfake detection system is accuracy-that is, the
number of true positives and false positives and true negatives and false negatives. It usually
proves inadequate for many applications, however. With such highly imbalanced datasets as
might typically be expected, where one class, this will simply result in most inputs being
classified as real because it's the larger class. That's not terribly helpful for practical deepfake
detection. Thus, precision, recall, and F1-score are highly used as more informative
measures.

• Precision: The measure of True Positives meaning correctly tagged fake videos out of all
the videos tagged as fake. The higher value of precision reflects that the system does not
classify many real videos as fake ones.

• Recall, or sensitivity measures true positives divided by all the true false ones. High recall
points out that most of the correct fakes have been identified; at the same time, a number of
irrelevant or wrong outputs could occur.

• F1-score is the harmonic mean of precision and recall, giving false positives and false
negatives the same weight. It is, therefore, very useful in imbalanced datasets because it does
not bias toward any one class.

Another very good performance evaluation measure is the Area Under the Receiver
Operating Characteristic curve, or AUC-ROC. A ROC curve is plot between true positive
rate vs false positive rate and it tells that what is the possibility that a randomly selected true
instance would be ranked greater than that of a randomly chosen fake one by this model.
Thus a greater
45
AUC stands for better discriminative power; 1 represents perfect separation between actual
and fake content.

Another intriguing generalization of the evaluation criterion of a detection model is how well
the model will work on unseen data or other manipulation techniques. Detection models are
typically trained on specific datasets like FaceForensics++ or CelebDF, with the most varied
range of synthetic media including face-swapping, lip-syncing, and puppet-based
manipulations. However, with the rise of the new emerging deepfake techniques, this
detection system must generalise and effectively detect the new manipulations. One could
check this model by conducting cross-validation over different data sets using varied
generation techniques for deepfakes, or introducing some data augmentation strategies at the
training stages to make sure that the model will not be sensitive to variation in input data.

Robustness is another important assessment parameter of the deepfake detection system.

Deepfakes typically inject some sort of artifacts, like unnatural illumination, facial
movements that don't look like the normal situation, or compression artifacts. Detection
models can then concentrate on the specific artifacts. Yet again, detection systems should be
robust against the artifacts as well when methods of manipulation evolve to outsmart the
traditional detection approach. A good model must be able to detect manipulated content
even in a noisy environment or in an adversarial attack or through video compression,
wherein the appearance distortion causes artifacts. Usually, robustness is checked by testing a
model on different datasets respecting different types of deep fake techniques, noise levels, or
distortions.

With the advancement of deepfakes, adversarial attacks raise considerable concerns for
deepfake detection systems. An adversarial attack can be defined as a set of specially
constructed inputs that could mislead machine learning models to classify real or artificial
content. To ensure deepfakes are reliable enough in real-world applications, susceptibility
towards adversarial manipulation has to be tested for these deepfake detection systems.
Generally, the common strategy applies perturbations directly in the input data and using
adversarial deepfakes, which are built purely to evade detection. The performance of these
systems is then evaluated under these attacks to assess how robust they are and to provide
improvements in their security.

One major issue when applying the technique to assess real data is that a large effect comes

46
from the quality of a particular dataset. Of all popular ones include DeepFake Detection
Challenge, FaceForensics++, CelebDF, and Kaggle's Deepfake Detection Dataset.

These datasets are filled mostly with a combination of real and synthetic media with different
manipulations such as face swaps, lip-syncing, and facial reenactments. This will evaluate the
models on a range of datasets, so it should generalize in cases where deepfake techniques
would be pretty dissimilar from the real ones. In addition, a diverse set of datasets that have
more representative examples of the manipulation techniques help to prevent biases in model
performance and might ensure fairness and robustness.

Regardless, cross-domain generalization is the greatest challenge of the deep fake detection
method. Variations of the background features and changes in the illumination could affect
the outcome of the model. Various domains including films, news broadcasts, social media,
etc., most times have the need to test whether a model can generalize across other domains
when determining if the models work practically since it might have a deep fake presence
across very different scenarios.

It is also very much concerned with computational efficiency, especially in real-time

applications. Deepfake detection requires processing large amounts of video data, often at a
rate of frames per second, which stretches computation. The evaluation of computational
efficiency involves inference time, memory usage, and other factors related to speed in
processing. The ability to run with very little latency in real-time will make systems like live-
stream monitoring, social media moderation, or even the police and enforcement really
crucial. Speed optimizations without impacting accuracy are a good evaluation benchmark
for these detection models. Pruning, quantization, and adding hardware acceleration (like a
GPU or chip) are good to add on later for optimization.

The other area of growing interest in the study of deepfake detection models is
interpretability. Most of the deep learning models, such as CNNs and LSTMs, are black-box
systems that do not explain why a video has been classified as real or fake. This may become
problematic in high-stakes applications such as legal or regulatory use cases. E.g.,
interpretation of such a system: Evaluating deepfakes' detector by providing saliency, Grad-
CAM and attention weights mapping over such a regions for video upon which does model
rely its decision at decision stage. Interpretablility Improvements would in this case assist
developers by fostering the trust upon the developer and other entities towards this

47
application model and enable its users to understand its reasoning process in order to be
adopted into actual practice.

Real-world deployment and user feedback are quite critical in the assessment of deepfake
detection systems as well. Models that look to be performing excellent in the benchmark
datasets may face a variation in real-world deployments because of several reasons such as
video quality, manipulation techniques, and some domain-specific challenges. Feedback
from users within real-world environments can also identify weaknesses in a model and
obtain useful insights which might be needed for further refinement. In addition to these
metrics, the Receiver Operating Characteristic (ROC) curve and the Area Under the
Curve (AUC) are used to analyze the model's ability to distinguish between real and fake
samples across different thresholds. Robustness tests are also conducted, exposing the model
to adversarial examples, varying video resolutions, compression artifacts, and unseen
deepfake techniques to evaluate its adaptability. Generalizability is another critical aspect,
ensuring the model performs well across datasets it was not trained on. A thorough
evaluation process helps identify areas for improvement and ensures the model is reliable for
deployment in real-world scenarios.

Multiple considerations include performance metrics, robustness, generalization,

computational efficiency, and interpretability in overall assessment of deepfake detection
systems. The real-world performance of the detection systems, however, needs to indicate a
high precision not only on the benchmark datasets but also their robust performance across
different manipulation techniques, domains, and environments. The particular interest is in its
models, deployed in highly critical areas like law enforcement, social media, and
cybersecurity, which could be robust to noise, and are real-time capable and work with the
detection of adversarial attacks. In turn, these have evolved so rapidly that assessing and
updating the detection algorithms on an ongoing basis may ensure the newly developed ways
of manipulating them are adapted into them so that this whole concept remains effective and
stable.

48
Chapter-4: PERFORMANCE ANALYSIS

4.1 Analysis:

Deepfakes give you a possibility to open new avenues in computerized media, virtual reality,
mechanics, education, and more. In one way or another, they are innovations that can destroy
and undermine the whole society. This thought in our minds is how we came up with the
model design that combined CNNs and LSTMs for the task of DF video identification.

LSTMs can handle the sequence of sequential frames while CNNs are good in learning local
highlights. Our model exploits the bound in associating each pixel of a given image and
understanding non-local highlights. We have attached equal importance to preprocessing data
in training and aggregation.

Network classification: Networks have been investigated to know how they would classify.
We can do this by applying the weights of different types of convolutional kernels and also
neurons as descriptions to images. Inferences could be interpreted as discrete second orders,
for example using a positive weight, a negative one, and a positive weight again. This indeed
is only an indication of the topmost layer but does not relate much during appearances. To
know what kind of signal is being received by a particular channel [6] another method is to
generate an information picture amplifying the initiation of that channel. The last secret layer
of ResNet50 has been actuated so much for such a long time, as shown in the following
figure:. We can distinguish the neurons, based on weight assigned to their result for the last
arrangement choice, that push toward a negative or positive score, and thereby influencing
either the genuine or produced class. In contrast, induction of positive-weighted neurons
shows photos with an exceptionally clear eye, nose, and mouth areas as compared to the
negative-weighted neurons which contained "differences" on the establishment portion
leaving the face area "smooth." Since Deepfake-made faces are generally blurry, or else
somehow opaque, unless there is focus on details, are otherwise presented differently to the
rest of the photograph that remains the same.

49
The underlying result of a layer can, at the same time, be regarded as a mean after-effect for
groups of certified and manufactured images.

And obviously, differences among the prompts can be considered and interpreted, as well,
are that information photographs which are connected with the course of action. Since in any
case institution actually puts the most essential apices on actual photographs, it is opened
naturally with opened eyes on an actual photograph.

It is quite obvious for us that once again the problem is vagueness, since in real images it is
the most sharp image of the eye, in synthetic images, the eye appears as the first part that due
to the aspect reduction brought by the face.

4.2 Results:
We employ the combination of Conv+LSTMs and test-time augmentation. We apply
Transfer Learning over the pre-trained models like ResNet50, MesoNet and DenseNet121.
The preparation and evaluation for our strategy are carried out by the dataset known as
DFDC. Our comparisons depict that our approach is quite relatively superior to the other
three approaches. In case the networks are involved, then it is possible to normalize
expectations on each frame in order to calculate a video-level forecast. With the help of the
validation set, the configuration that produces the highest adjusted precision is decided.
According to balanced accuracy, Table 1 can be seen. If face regions are only being
preprocessed, the accuracy jumped dramatically. Almost all models were overfitting, as
shown in Figure 3, the loss in the train sets and in the validation sets for ResNet+LSTM. The
validation loss begins to increase at the 5th epoch and that is when the model starts
overfitting. Testing was enhanced using test-time augmentation (TTA). Using TTA, one can
apply data augmentation to a test image and average predictions for multiple versions of it. In
our test of TTA, different transformations were used than that used for the ResNet model.
The model with the highest accuracy is ResNet50 + LSTM, with an accuracy of 94.63%.

50
.
Train and Validation Accuracy Train and Validation loss

4.1 Screen shots of the various stages of the Project

Importing Packages :

51
Model Architecture :

52
Pre-Process:

53
Training Model:

54
55
56
57
Model Accuracy & Loss :

58
4.1 Comparisons:

Our dataset was used to train multiple models. From those, ResNet50 + LSTM offers the most
accurate testing and training results. Our observations have shown that the network regularly
fails to engage in exceptionally practical and effective manipulation based on poorly-lit or
foggy images that are inferior in quality. Despite their difficulty, manipulations made in
excellent recordings appear to be accurately identified.

Table 2. Evaluation Results

Model TrainData Test
Data

Custom Model 0.852 0.8056

3
ResNet50+LST 0.979 0.9464
M 4
MesoNet 0.956 0.8996
7
DenseNet121 0.969 0.9183
5

59
Chapter-5: CONCLUSIONS

5.1 Conclusions:

Generally speaking, today everybody is well aware of face-tampering risks of video. DF

has thrown open to the world of advanced media, virtual reality, mechanical technology,
and much more. In other words, they represent innovative stuff which could ruin and
belittle everything there is in this society.

In the design, neural network-based approach is adopted that classifies video to be deep
fake or real and the confidence of proposed model. The inspiration behind this proposed
method comes from the way GAN creates deep fakes with autoencoders. ResNet50 CNN
has been used for the frame-level detection, followed by RNN and LSTM for video
classification. Therefore, this proposed method is differentiated based on the parameters
of the paper between the fake video and actual video. In our analysis, it depicts that our
method can very surely identify DF accurately within an average of 94.63% in a web
scenario considering actual conditions of dispersion within an average. Real-time
information will rely upon real-time high expectancy. One of the fundamental things with
deep learning is that, the possibility of forming a solution to a presented problem does not
require an earlier hypothetical review. However we also have the option of understanding
this arrangement's outset in order to evaluate the traits and constraints of that, so we spent
considerable amount of time imagining the channels within our network. Most striking
empirical evidence has indicated the significant role played by eyes and mouth while
identifying the aspects or perceptions with the aid of DF. Next future machines will make
our organizations strong and effective with the capabilities to make profound businesses
understandable.

5.2 Future work:

Model Ensembling:
Diverse modeling algorithms or diverse training data set used by ensemble modeling
60
enables various models to predict the outcome. After all the basic models are combined,
an ensemble model then computes a single final prediction from the inconspicuous data.
Ensemble multiple models may be further enhanced our prediction process like
(ResNet50, MesoNet, DenseNet and Custom Model).

Data Augmentation:
It is a technique wherein newer information integrated into prior ones over time or simply
less varied versions of existing ones serve as a method. Also it acts as a general purpose
regularizer of an artificial intelligent model and reduces the problem known to occur as
overfitting in the model.

Early Stopping:

Early stopping can make the model better in training. You can state one of the individual
levels of epochs by training after which preparation of a model will be stopped improving
a holdout validation dataset.

BlockChain:

Researchers According to researchers, this blockchain technology can potentially improve

deepfake detection more effectively. To this day, there is a few study on using blockchain
to detect deepfakes, partly because it was successfully adopted in most industries. The holy
grail of any computerized provenance system, blockchain allows you to create an indelible
chain of individual blocks of metadata.

Image Aggregation:
Videos tend to compress in most cases, especially online-viewed videos; therefore, large
pieces of lost information are met. However the same face presented in frames may aid in

61
the increasing accuracy of the video by its overall score if this allows multiple
experiences. This is the average of the network prediction over the video

This can be achieved through using such elements as theoretically, the relation of frames
of a single

video is so robust that a score improvement or CI indicator cannot be justified. In

practice, Most filmed faces have stable clear frames to alleviate the viewer's discomfort.
Most good predictions on a sample of frames taken from the video suggests that
movement blur, face occlusion, and random misprediction over a sample of frames can be
outweighed.

Audio Tampering:- Until now, our model was only able to work with deepfakes that were
manipulated on the video side. However, due to the fast development of CNNs [4, 20],
the GANs, and their variants, it is now possible to create deepfakes manipulated on the
audio side also. Video with manipulated audio will be declared fake, but faces will
remain genuine, This makes it much more difficult to distinguish between true,
untampered audiovisuals and altered ones. Because of this ability to create a real sound,
video, and images, various stakeholders took action to deter such developments from
being used maliciously.

Therefore, researchers are trying to create deepfake detection mechanisms that will also
detect synthetic audio.

5.3 Applications:

• Prevention of reputational damages to individuals

• Cheat hoaxes detection
• Refrain from demonizing democratic discourse
• Reduces exacerbation of social divisions.
•Prevents Financial Fraud

62
CHAPTER- 6 : APPENDIX

Preprocess:

from PIL import Image

from torchvision import transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader, random_split

def preprocess_image(image, augment=False):

transform_list = [
transforms.Resize((299, 299)),
]

if augment:
transform_list.extend([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ColorJitter(brightness=0.1),
])

transform_list.extend([
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5])
])

transform = transforms.Compose(transform_list)

# image = Image.open(image_path).convert('RGB')
image = transform(image)
return image

def get_transforms(augment=False):
transform_list = [
transforms.Resize((299, 299)),
]

if augment:
transform_list.extend([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ColorJitter(brightness=0.1),
])
63
transform_list.extend([
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5])
])

return transforms.Compose(transform_list)

def get_datasets(data_dir, batch_size=32, val_split=0.2, augment=False):

transform = get_transforms(augment=augment)

dataset = ImageFolder(root=data_dir, transform=transform)

val_size = int(len(dataset) * val_split)
train_size = len(dataset) - val_size

train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

return train_loader, val_loader

Model:

from PIL import Image

from torchvision import transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader, random_split

def preprocess_image(image, augment=False):

transform_list = [
transforms.Resize((299, 299)),
]

if augment:
transform_list.extend([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ColorJitter(brightness=0.1),
])

transform_list.extend([
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5],

64
std=[0.5, 0.5, 0.5])
])

transform = transforms.Compose(transform_list)

# image = Image.open(image_path).convert('RGB')
image = transform(image)
return image

def get_transforms(augment=False):
transform_list = [
transforms.Resize((299, 299)),
]

if augment:
transform_list.extend([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ColorJitter(brightness=0.1),
])

transform_list.extend([
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5])
])

return transforms.Compose(transform_list)

def get_datasets(data_dir, batch_size=32, val_split=0.2, augment=False):

transform = get_transforms(augment=augment)

dataset = ImageFolder(root=data_dir, transform=transform)

val_size = int(len(dataset) * val_split)
train_size = len(dataset) - val_size

train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

return train_loader, val_loader

65
Train:

import torch
from torch.utils.data import DataLoader
import os
import logging

import torch.nn as nn
import torch.optim as optim

from preprocess import get_datasets

def load_data(batch_size):
train_loader, val_loader = get_datasets(data_dir='data')
return train_loader, val_loader

def train_model(model, train_loader, val_loader, epochs, learning_rate):

if not os.path.exists('logs'):
os.makedirs('logs')
logging.basicConfig(filename='logs/training_logs.txt', level=logging.INFO)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
best_val_accuracy = 0.0

for epoch in range(epochs):

model.train()
running_loss = 0.0
running_corrects = 0

for inputs, labels in train_loader:

optimizer.zero_grad()
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

running_loss += loss.item() * inputs.size(0)

running_corrects += torch.sum(preds == labels.data)

epoch_loss = running_loss / len(train_loader.dataset)

66
epoch_acc = running_corrects.double() / len(train_loader.dataset)

model.eval()
val_running_loss = 0.0
val_running_corrects = 0

with torch.no_grad():
for inputs, labels in val_loader:
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)

val_running_loss += loss.item() * inputs.size(0)

val_running_corrects += torch.sum(preds == labels.data)

val_loss = val_running_loss / len(val_loader.dataset)

val_acc = val_running_corrects.double() / len(val_loader.dataset)

logging.info(f'Epoch {epoch+1}/{epochs}, Train Loss: {epoch_loss:.4f}, '

f'Train Acc: {epoch_acc:.4f}, Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}')

print(f'Epoch {epoch+1}/{epochs}')
print(f'Train Loss: {epoch_loss:.4f}, Train Acc: {epoch_acc:.4f}')
print(f'Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}')

if val_acc > best_val_accuracy:

best_val_accuracy = val_acc
torch.save(model.state_dict(), '.\checkpoints\model.pth')

if name == ' main ':

batch_size = 32
epochs = 20
learning_rate = 0.001

train_loader, val_loader = load_data(batch_size)

train_model(train_loader, val_loader, epochs, learning_rate)

Evaluate:

import torch
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import ImageFolder
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score,
f1_score
67
import json
import os
import numpy as np
import seaborn as sns

import torch.nn as nn
import matplotlib.pyplot as plt
from preprocess import get_transforms

def evaluate_model(model, test_loader):

model.eval()
all_preds = []
all_labels = []

with torch.no_grad():
for images, labels in test_loader:
images = images.to(device)
labels = labels.to(device)
outputs = model(images)
_, preds = torch.max(outputs, 1)
all_preds.extend(preds.cpu().numpy())
all_labels.extend(labels.cpu().numpy())

all_preds = np.array(all_preds)
all_labels = np.array(all_labels)

accuracy = accuracy_score(all_labels, all_preds)

precision = precision_score(all_labels, all_preds, average='binary')
recall = recall_score(all_labels, all_preds, average='binary')
f1 = f1_score(all_labels, all_preds, average='binary')
cm = confusion_matrix(all_labels, all_preds)

metrics = {
'accuracy': accuracy,
'precision': precision,
'recall': recall,
'f1_score': f1,
'confusion_matrix': cm.tolist()
}
return metrics

def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion Matrix'):

if normalize:
cm = cm.astype(float) / cm.sum(axis=1)[:, np.newaxis]

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt=".2f" if normalize else "d", cmap='Blues',

68
xticklabels=classes, yticklabels=classes)
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.title(title)
plt.show()

if name == ' main ':

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load the model checkpoint

checkpoint_path = 'checkpoints/model.pth'
model = torch.load(checkpoint_path)
model = model.to(device)

# Load the same image transformations used during training

transform = get_transforms()

# Load the test dataset

test_dir = './data/real-vs-fake/test'
test_dataset = ImageFolder(test_dir, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# Evaluate the model

metrics = evaluate_model(model, test_loader)

# Save metrics to JSON file

os.makedirs('results', exist_ok=True)
with open('results/metrics.json', 'w') as f:
json.dump(metrics, f, indent=4)

# Print confusion matrix

cm = np.array(metrics['confusion_matrix'])
class_names = test_dataset.classes
plot_confusion_matrix(cm, class_names)

69
App:

import sys
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(_file_), '..')))

import streamlit as st
from PIL import Image
import torch
from src.model import get_pretrained_model
from src.preprocess import preprocess_image
from io import BytesIO

@st.cache_data()
def load_model():
model = get_pretrained_model('resnet18')
model.eval()
return model

model = load_model()

def predict_image(model, image):

# Preprocess the uploaded image to match the model's input requirements
processed_image = preprocess_image(image).unsqueeze(0) # Add batch dimension
with torch.no_grad():
outputs = model(processed_image)
probabilities = torch.nn.functional.softmax(outputs, dim=1)
_, predicted = torch.max(outputs, 1)
return predicted.item(), probabilities

st.title("Image Classification")

uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])

if uploaded_file is not None:

# Display the uploaded image
image = Image.open(uploaded_file).convert("RGB")
st.image(image, caption="Uploaded Image", use_column_width=True)
st.write("")

# Classify the uploaded image

st.write("Classifying...")
label, probabilities = predict_image(model, image)

# Display the prediction

st.write(f"Prediction: {'Fake' if label == 1 else 'Real'}")

70
# Display the probabilities
st.write(f"Probabilities: Real: {probabilities[0][0]:.4f}, Fake: {probabilities[0][1]:.4f}")

# Display additional metrics (e.g., confidence score)

confidence_score = probabilities[0][label].item()
st.write(f"Confidence Score: {confidence_score:.4f}")

ground_truth_label = 1
# Calculate precision, recall, F1-score if ground truth is available
if ground_truth_label is not None:
precision = (label == ground_truth_label) / (label + ground_truth_label)
recall = (label == ground_truth_label) / (ground_truth_label)
if(precision + recall):
f1_score = 2 * (precision * recall) / (precision + recall)
st.write(f"Precision: {precision:.4f}")
st.write(f"Recall: {recall:.4f}")
st.write(f"F1 Score: {f1_score:.4f}")

71
71
CHAPTER – 7 :REFERESCE

.[1] Joshua Brockschmidt, Jiacheng Shang, and Jie Wu. On the Generality of Facial Forgery

Detection. In 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems

Workshops (MASSW), pages 43–47. IEEE, 2019.

[2] Yuezun Li, Ming-Ching Chang, and Siwei Lyu. In Ictu Oculi: Exposing AI Generated Fake

Face Videos by Detecting Eye Blinking. arXiv preprint arXiv:1806.02877v2, 2018. [3]

TackHyun Jung, SangWon Kim, and KeeCheon Kim. Deep-Vision: Deepfakes Detection

Using Human Eye Blinking Pattern. IEEE Access, 8:83144–83154, 2020.

[4] Konstantinos Vougioukas, Stavros Petridis, and Maja Pantic. Realistic Speech-Driven

Facial Animation with GANs. International Journal of Computer Vision, 128:1398–1413,

2020.

[5] Hai X. Pham, Yuting Wang, and Vladimir Pavlovic. Generative Adversarial Talking Head:

Bringing Portraits to Life with a Weakly Supervised Neural Network. arXiv preprint

arXiv:1803.07716, 2018

[6] Yuezun Li, Siwei Lyu, “ExposingDF Videos By Detecting Face Warping Artifacts,” in

arXiv:1811.00656v3.

[7] Yuezun Li, Ming-Ching Chang and Siwei Lyu “Exposing AI Created Fake Videos by

Detecting Eye Blinking” in arxiv.

[8] Huy H. Nguyen , Junichi Yamagishi, and Isao Echizen “ Using capsule networks to detect

forged images and videos ”.

[9] Umur Aybars Ciftci, ˙Ilke Demir, Lijun Yin “Detection of Synthetic Portrait Videos using

Biological Signals” in arXiv:1901.02212v2.

[10] https://www.kaggle.com/c/deepfake-detection-challenge/data

[11] Liu, M. Y., Huang, X., Mallya, A., Karras, T., Aila, T., Lehtinen, J., and Kautz, J. (2019).

Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE International

Conference on Computer Vision (pp. 10551-10560).

[12] Park, T., Liu, M. Y., Wang, T. C., and Zhu, J. Y. (2019). Semantic image synthesis with

spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition (pp. 2337-2346).

[13] DeepFaceLab: Explained and usage tutorial. Available at

https://mrdeepfakes.com/forums/thread-deepfacelab-explained-and-usage-tutorial.

[14] DSSIM. Available at https://github.com/keras-

team/kerascontrib/blob/master/keras_contrib/losses/dssim.py.

[15] Lattas, A., Moschoglou, S., Gecer, B., Ploumpis, S., Triantafyllou, V., Ghosh, A., &

Zafeiriou, S. (2020). AvatarMe: realistically renderable 3D facial reconstruction “in-the-wild”.

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp.

760- 769).

[16] Ha, S., Kersner, M., Kim, B Seo, S., & Kim, D. (2020, April). MarioNETte: few-shot

face reenactment preserving identity of unseen targets. In Proceedings of the AAAI Conference

on Artificial Intelligence (vol. 34, no. 07, pp. 10893-10900).

[17] Deng, Y., Yang, J., Chen, D., Wen, F., & Tong, X.(2020). Disentangled and controllable

face image generationvia 3D imitative-contrastive learning. In Proceedings of the IEEE/CVF

Conference on Computer Vision and Pattern Recognition (pp. 5154-5163).

[18] Tewari, A., Elgharib, M., Bharaj, G., Bernard, F., Seidel, H. P., P´erez, P., ... & Theobalt,

C. (2020). StyleRig: Rigging StyleGAN for 3D control over portrait images. In Proceedings of

the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6142-6151).

[19] Li, L., Bao, J., Yang, H., Chen, D., & Wen, F. (2019). FaceShifter: Towards high fidelity

and occlusion aware face swapping. arXiv preprint arXiv:1912.13457.

[20] Nirkin, Y., Keller, Y., & Hassner, T. (2019). FSGAN: subject agnostic face swapping and

reenactment. In Proceedings of the IEEE/CVF International Conference on Computer Vision

(pp. 7184-7193).

[21] Olszewski, K., Tulyakov, S., Woodford, O., Li, H., & Luo, L. (2019). Transformable

bottleneck networks. In Proceedings of the IEEE/CVF International Conference on Computer

Vision (pp. 7648-7657). [22] Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. (2019). Everybody

dance now. In Proceedings of the IEEE/CVF International Conference on Computer Vision

(pp. 5933-5942).

[23] Thies, J., Elgharib, M., Tewari, A., Theobalt, C., & Nießner, M. (2020, August). Neural

voice puppetry: Audio-driven facial reenactment. In European Conference on Computer Vision

(pp. 716- 731). Springer, Cham.

Mini Project Report
No ratings yet
Mini Project Report
19 pages
Project Report Model
No ratings yet
Project Report Model
53 pages
Deepfake Project
No ratings yet
Deepfake Project
24 pages
Deepfake Video Detection Report
No ratings yet
Deepfake Video Detection Report
45 pages
Deepfake Detection Project Seminar
No ratings yet
Deepfake Detection Project Seminar
14 pages
Deepfake Detector
No ratings yet
Deepfake Detector
32 pages
ABHINAYA
No ratings yet
ABHINAYA
30 pages
Shivanudeepi
No ratings yet
Shivanudeepi
58 pages
CSS MIcro Project (1-5)
No ratings yet
CSS MIcro Project (1-5)
24 pages
DeepFake Synopsis (AutoRecovered)
No ratings yet
DeepFake Synopsis (AutoRecovered)
16 pages
Project Report TY Core - 3 E.B
No ratings yet
Project Report TY Core - 3 E.B
24 pages
Demoppt 2
No ratings yet
Demoppt 2
25 pages
DeepFake Detection System
No ratings yet
DeepFake Detection System
60 pages
Group 4 Review 1
No ratings yet
Group 4 Review 1
13 pages
DF Report
No ratings yet
DF Report
40 pages
Batch-10 Project - PPT
No ratings yet
Batch-10 Project - PPT
28 pages
Deepfake Detection System Using AI
No ratings yet
Deepfake Detection System Using AI
11 pages
Jnana Sangama, Belagavi, Karnataka - 590018: Project Presentation On
No ratings yet
Jnana Sangama, Belagavi, Karnataka - 590018: Project Presentation On
26 pages
Projectreport 1
No ratings yet
Projectreport 1
104 pages
Jnana Sangama, Belagavi, Karnataka - 590018: Project Presentation On
No ratings yet
Jnana Sangama, Belagavi, Karnataka - 590018: Project Presentation On
26 pages
Software Engineering Software Requirements Specification (SRS) Document
No ratings yet
Software Engineering Software Requirements Specification (SRS) Document
13 pages
Deepfake (1) - 2
No ratings yet
Deepfake (1) - 2
11 pages
ML Report
No ratings yet
ML Report
21 pages
21bce5801 53620
No ratings yet
21bce5801 53620
49 pages
Project Report Model
No ratings yet
Project Report Model
52 pages
Vit 3 55 - Merged
No ratings yet
Vit 3 55 - Merged
55 pages
DFDS Project Report (2025)
No ratings yet
DFDS Project Report (2025)
56 pages
Deep Fake Detection Vtu Report
No ratings yet
Deep Fake Detection Vtu Report
41 pages
Deep Fake Paper
No ratings yet
Deep Fake Paper
4 pages
Synopsis 2
No ratings yet
Synopsis 2
7 pages
Final Proposal
No ratings yet
Final Proposal
7 pages
Final - Year - Project - B15.pptx-1
No ratings yet
Final - Year - Project - B15.pptx-1
7 pages
1.deepfake Detection Using LSTM and Resnext
No ratings yet
1.deepfake Detection Using LSTM and Resnext
65 pages
CSS MIcro Project (1-5) - 1
No ratings yet
CSS MIcro Project (1-5) - 1
34 pages
Batch - No - 38 - Survey Paper 01
No ratings yet
Batch - No - 38 - Survey Paper 01
5 pages
Deep Fake Video Detection1 .Major Project File-2
No ratings yet
Deep Fake Video Detection1 .Major Project File-2
28 pages
Document From ?????
No ratings yet
Document From ?????
9 pages
Presentation of Fyp Proposal
No ratings yet
Presentation of Fyp Proposal
27 pages
Deep Fake Project Documentation
No ratings yet
Deep Fake Project Documentation
46 pages
GRP20 DFD PPT
No ratings yet
GRP20 DFD PPT
14 pages
Report of Technical Seminar 6
No ratings yet
Report of Technical Seminar 6
26 pages
Atique
No ratings yet
Atique
4 pages
Deepfake Detection Synopsis
No ratings yet
Deepfake Detection Synopsis
28 pages
DeepFake Synopsis (38,42)
No ratings yet
DeepFake Synopsis (38,42)
3 pages
Working Paper
No ratings yet
Working Paper
5 pages
Anti-Deepfake Tech for Engineers
No ratings yet
Anti-Deepfake Tech for Engineers
20 pages
Deep Fake
100% (1)
Deep Fake
27 pages
KECReport
No ratings yet
KECReport
23 pages
Feature Based Sentiment Analysis Using Stacked Meta Ensemble Learner 2 Copy
No ratings yet
Feature Based Sentiment Analysis Using Stacked Meta Ensemble Learner 2 Copy
8 pages
Paper 4
No ratings yet
Paper 4
5 pages
Ratnesh
No ratings yet
Ratnesh
26 pages
Abstract
No ratings yet
Abstract
1 page
Deepfake Video Detection Using Deep Learning
No ratings yet
Deepfake Video Detection Using Deep Learning
89 pages
Project Phase
No ratings yet
Project Phase
25 pages
Wa0009
No ratings yet
Wa0009
12 pages
Group 4
No ratings yet
Group 4
11 pages
Main Report Draft UNTOCHED
No ratings yet
Main Report Draft UNTOCHED
82 pages
Harsiiiii
No ratings yet
Harsiiiii
34 pages
Shinan
No ratings yet
Shinan
34 pages
Assignment BBA 4 SEM
No ratings yet
Assignment BBA 4 SEM
55 pages
Microsoft Schools List 02012020
No ratings yet
Microsoft Schools List 02012020
100 pages
A Literature Review: Corporate Social Responsibility Practices of Indian Companies
No ratings yet
A Literature Review: Corporate Social Responsibility Practices of Indian Companies
17 pages
Salesperson Performance: Behavior, Role Perceptions, and Satisfaction
No ratings yet
Salesperson Performance: Behavior, Role Perceptions, and Satisfaction
25 pages
Lesson Plan News Item
No ratings yet
Lesson Plan News Item
5 pages
Development of Programme For International Student Assessment (PISA) - Aligned Science Curriculum Framework For K-10 Learners in The Philippines
No ratings yet
Development of Programme For International Student Assessment (PISA) - Aligned Science Curriculum Framework For K-10 Learners in The Philippines
7 pages
Periyar The Great Thinker - Pannan
No ratings yet
Periyar The Great Thinker - Pannan
76 pages
Siwes Project Report - 035932
No ratings yet
Siwes Project Report - 035932
18 pages
NUS MBA Brochure Aug 2025
No ratings yet
NUS MBA Brochure Aug 2025
5 pages
Three-Phase Cogenerator Models Analysis
No ratings yet
Three-Phase Cogenerator Models Analysis
11 pages
Table of Specifications
No ratings yet
Table of Specifications
14 pages
5 Class Area and Its Boundary
No ratings yet
5 Class Area and Its Boundary
2 pages
S.I.W.E.S Experience Report
No ratings yet
S.I.W.E.S Experience Report
4 pages
Frank Lloyd Wright: Fallingwater by
100% (1)
Frank Lloyd Wright: Fallingwater by
20 pages
Smart Systems
No ratings yet
Smart Systems
9 pages
Resume Danielle Kaiser
No ratings yet
Resume Danielle Kaiser
2 pages
Unit - 3 Notes
No ratings yet
Unit - 3 Notes
23 pages
Oncoscope 2025 Official Brochure
No ratings yet
Oncoscope 2025 Official Brochure
16 pages
Admission List After R-1 & R-2 Received From Conv Publish Web 30-12-2024
No ratings yet
Admission List After R-1 & R-2 Received From Conv Publish Web 30-12-2024
9 pages
Test Paper The 9 Form Name
No ratings yet
Test Paper The 9 Form Name
1 page
Why Join PNA
No ratings yet
Why Join PNA
2 pages
Understanding Inattentional Blindness
No ratings yet
Understanding Inattentional Blindness
2 pages
Dr. N.K. Mohammed Memorial M.E.S Central School: Valanchery - 676552, Malappuram (Dist.), Kerala
No ratings yet
Dr. N.K. Mohammed Memorial M.E.S Central School: Valanchery - 676552, Malappuram (Dist.), Kerala
30 pages
Ai Proj SDR
No ratings yet
Ai Proj SDR
9 pages
Academic Performance of Student-Athletes Basis For Project SAW (Student-Athletes For The Win) A School Sports Program
No ratings yet
Academic Performance of Student-Athletes Basis For Project SAW (Student-Athletes For The Win) A School Sports Program
10 pages
Smith Derek Cover Letter Au
No ratings yet
Smith Derek Cover Letter Au
1 page
Grade 8 Dressmaking Measurement Lesson
100% (1)
Grade 8 Dressmaking Measurement Lesson
6 pages
English 11 Honors Narr
No ratings yet
English 11 Honors Narr
4 pages
League Tournament Registration Form
No ratings yet
League Tournament Registration Form
1 page
Etiquettes and Manners
No ratings yet
Etiquettes and Manners
10 pages