0% found this document useful (0 votes)

1 views11 pages

Team Synopsis

The document presents a synopsis for a project titled 'Multimodal Deepfake Detection' submitted by students for their Bachelor of Technology degree in Computer Science and Engineering. It outlines the growing challenges posed by deepfakes and misinformation, proposing a hybrid detection framework that combines Convolutional Neural Networks and Vision Transformers for improved accuracy. The project aims to develop a robust system capable of detecting manipulated content across various media formats and techniques.

Uploaded by

ayush12082003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views11 pages

Team Synopsis

Uploaded by

ayush12082003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

A SYNOPSIS ON

Multimodal Deepfake Detection

Submitted in partial fulfilment of the requirement for the award of the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE & ENGINEERING
Submitted by:

Student Name 1: Amol Joshi University Roll No. 2021648

Student Name 2: Ayush Kumar Singh University Roll No. 2021671
Student Name 3: Abhinav University Roll No. 2021635

Under the Guidance of

Dr. Deepak Gaur

Project Team ID: ID No. MP2025AI&DS18

Department of Computer Science and Engineering

Graphic Era (Deemed to be University)
Dehradun, Uttarakhand
September-2025
CANDIDATE’S DECLARATION
I/We hereby certify that the work which is being presented in the Synopsis entitled
“Multimodal Deepfake Detection” in partial fulfillment of the requirements for the award
of the Degree of Bachelor of Technology in Computer Science and Engineering in the
Department of Computer Science and Engineering of the Graphic Era (Deemed to be
University), Dehradun shall be carried out by the undersigned under the supervision of Dr.
Deepak Gaur, Associate Professor, Department of Computer Science and Engineering,
Graphic Era (Deemed to be University), Dehradun.

Amol Joshi 2021648 signature

Ayush Kumar Singh 2021671 signature
Abhinav 2021635 signature
The above mentioned students shall be working under the supervision of the undersigned on
the “Multimodal Deepfake Detection”

Signature Signature

Internal Evaluation (By DPRC Committee)

Status of the Synopsis: Accepted / Rejected

Any Comments:

Name of the Committee Members: Signature with Date

1.
2.
Table of Contents

Chapter No. Description Page No.

Chapter 1 Introduction and Problem Statement 4-5
Chapter 2 Background/ Literature Survey 6-7
Chapter 3 Objectives 8
Chapter 5 Possible Approach/ Algorithms 9-10
Chapter 6 References 11
Chapter 1

Introduction and Problem Statement

In the following sections, a brief introduction and the problem statement for the work has
been included.

1.1 Introduction

The rapid growth of the internet and social media platforms has revolutionized the way information is
produced, consumed, and shared. While these advancements have made communication faster and
more accessible, they have also created an environment where misinformation and manipulated
content can spread at an unprecedented scale. Two of the most critical issues in this regard are fake
news and deepfakes.

Deepfakes, on the other hand, use advanced AI techniques such as Generative Adversarial Networks
(GANs) to manipulate images, audio, and video content. These fabricated media files can
convincingly mimic real individuals, creating false scenarios that can damage reputations, spread
propaganda, or even threaten national security.

This phenomena pose serious challenges to digital trust, cybersecurity, journalism, and democratic
systems. For instance, fake news articles can mislead public opinion during elections, while deepfake
videos can be weaponized for blackmail or misinformation campaigns. With the increasing
sophistication of these manipulations, traditional detection methods such as manual verification or
single-modality models are no longer sufficient.

To address this growing problem, researchers are turning towards Artificial Intelligence (AI) and
Deep Learning. In particular, Natural Language Processing (NLP) techniques like BERT have proven
highly effective in understanding textual semantics, while Computer Vision methods such as
Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) are capable of detecting
subtle inconsistencies in manipulated images and videos.

This project proposes a multimodal detection framework that integrates both text and image/video
analysis for a comprehensive solution. By combining BERT-based text classification with CNN +
ViT-based visual detection, the system aims to accurately identify fake news and deepfakes. This
approach not only increases detection accuracy but also mirrors real-world scenarios, where
misinformation often appears in multiple forms simultaneously.
1.2 Problem Statement
The problem statement for the present work can be stated as follows:

The main problem lies in the rapid spread of misinformation through fake news articles and deepfake
media content. Fake news manipulates textual information to mislead readers, while deepfakes alter
images and videos so realistically that they are almost impossible to detect with the human eye. This
has created a serious challenge for digital trust, cybersecurity, and public awareness, as false content
can influence opinions, damage reputations, and even destabilize societies.

Traditional detection methods are either manual (slow and unscalable) or single-modality (focusing
only on text or only on images/videos), which makes them insufficient in combating the multimodal
nature of modern misinformation. Moreover, the increasing sophistication of AI-based content
generation tools makes it even harder to identify manipulated media accurately.

To address this problem, we propose building a multimodal AI-based detection system that will use a
combination of Convolutional Neural Networks (CNNs) for spatial feature extraction and Vision
Transformers (ViTs) for analyzing image patches and video frames.

Finally, we will fuse the outputs of both pipelines to provide a unified decision on whether the given
content is real or fake.
Chapter 2

Background/ Literature Survey

The emergence of deepfake technology has introduced a major challenge to digital media authenticity.
Deepfakes are synthetic media — images, audio, or videos — generated or manipulated using deep
learning techniques, especially Generative Adversarial Networks (GANs) and autoencoders. These
manipulations can create hyper-realistic yet false content that is almost indistinguishable from
genuine media.

While deepfake technology has legitimate applications in entertainment, gaming, and accessibility, it
is increasingly misused for malicious purposes such as spreading misinformation, political
propaganda, character defamation, identity fraud, and cybercrime. The ability of deepfakes to erode
trust in visual evidence poses significant risks to cybersecurity, journalism, legal systems, and social
stability.

Traditional manual detection methods, such as visual inspection or forensic analysis, are no longer
effective given the realism and scale of modern deepfakes. This has led to growing interest in AI-
driven deepfake detection systems. Early approaches rely on Convolutional Neural Networks (CNNs)
to identify artifacts, inconsistencies, or unnatural features in images and videos. More recently, Vision
Transformers (ViTs) have been explored to capture global contextual information across image
patches, improving robustness against advanced manipulations.

Several researchers have attempted to tackle the problem of deepfake detection using different
techniques, ranging from traditional CNNs to more advanced transformer-based architectures. Below
is a summary of notable previous works:

I. MesoNet

 MesoNet introduced a lightweight CNN architecture designed for real-time deepfake

detection. It focused on capturing mid-level patterns that distinguish fake from real faces.
 While efficient, MesoNet struggled with generalization. It performed well on the dataset it
was trained on but failed when tested on unseen manipulation techniques. Its lightweight
nature also limited its ability to detect subtle, high-quality deepfakes.
II. XceptionNet

 XceptionNet, a deeper CNN model, was trained on the FaceForensics++ dataset,

leveraging depthwise separable convolutions for efficient feature extraction. It became a
benchmark in early deepfake detection research.
 While efficient, MesoNet struggled with generalization. It performed well on the dataset it
was trained on but failed when tested on unseen manipulation techniques. Its lightweight
nature also limited its ability to detect subtle, high-quality deepfakes.

III. Capsule Networks

 Capsule Networks were used to capture part-whole relationships in facial structures,

aiming to detect inconsistencies caused by manipulation.
 While promising, Capsule Networks were computationally expensive and had
unstable training, making them impractical for large-scale real-world use.
Chapter 3

Objectives
The objectives of the proposed work are as follows:

 Develop a multimodal deepfake detection framework that ensures the model can detect
manipulated content. For this, we will create a unified pipeline capable of handling two different
models which are CNNs and ViTs.

 Leverage CNNs to extract local features such as facial texture inconsistencies, blending errors,
edge artifacts, or pixel-level distortions that typically appear in deepfake images and video
frames.

 Incorporate Vision Transformers (ViTs) to capture long-range dependencies and relationships

between image patches, enabling the detection of subtle manipulations in high-quality deepfakes
that CNNs may overlook.

 Design a robust and generalizable detection pipeline to ensure the system is not limited to one
dataset or manipulation technique by training and testing on multiple benchmark datasets. The
goal is to minimize overfitting to dataset-specific artifacts and enhance real-world applicability.

 Benchmark and validate the proposed solution against state-of-the-art methods to compare the
hybrid CNN + ViT model with existing deepfake detection approaches in terms of accuracy,
precision, recall, scalability, and robustness under different noise and compression settings.
Chapter 4

Possible Approach/ Algorithms

The goal is a robust deepfake detection system that generalizes across manipulation techniques and
real-world artifacts. Our proposed strategy is a hybrid model that combines:

1. Overview & Design Rationale

 The goal is a robust deepfake detection system that generalizes across manipulation techniques
and real-world artifacts. Our proposed strategy is a hybrid model that combines:

 CNN backbone(s) to capture local, pixel-level artifacts (texture, edges, blending), and Vision
Transformer (ViT) layers to model global, long-range dependencies across image patches
(context, inconsistencies across face/background).

 The operation primarily will be the frame level (extract frames from videos) and aggregate
frame-level predictions to a video-level decision using temporal pooling or a lightweight
temporal model. The pipeline emphasizes dataset diversity, augmentation, and cross-dataset
validation to reduce overfitting to dataset-specific artifacts.

2. End-to-end Pipeline

 Data collection: Obtain public datasets (FaceForensics++, DFDC, Celeb-DF, etc.). Keep
separate splits for cross-dataset testing.

 Preprocessing:

i. Video to frame extraction.

ii. Face detection & alignment (crop to face bounding box + margin). Optionally keep whole-
frame variants for context.
iii. Resize frames to a fixed input (e.g., 224×224 or 256×256).
iv. Resize frames to a fixed input.

 Data augmentation : Random crops, rotations, color jitter, Gaussian blur/noise, JPEG
compression, and random temporal jitter to mimic social-media artifacts.
 Modeling:

i. CNN backbone to extract feature maps.

ii. Patch embedding + ViT encoder(s) to process either raw image patches or CNN feature patches
(hybrid).

 Temporal aggregation — Average pooling over frame probabilities or a small

transformer/LSTM to produce a video-level prediction.
 Evaluation & cross-dataset generalization — test across unseen datasets and compression
levels.

3. Training Strategy & Losses

 Loss Functions: Binary Cross Entropy(BCE) loss.

 Optimizer: AdamW
 Regularization for Dropout in classification head and Label smoothing can help reduce
overconfidence and generalize.
 Batch size depending on GPU memory. Currently training for 20–50 epochs, monitor validation
AUC and early stop.

4. Evaluation Protocols and Metrics

 Metrics: Accuracy, Precision, Recall, F1-score, AUC-ROC, and Balanced Accuracy. Report
both frame-level and video-level metrics.
 Robustness testing: Evaluate with different compression levels, additive noise, and cross-dataset
generalization.
 Comparing with previous stratagies using the evaluation metrics.
References

[1]Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: a compact facial
video forgery detection network. IEEE International Workshop on Information Forensics and
Security (WIFS), pp. 1–7.

[2]Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019).
FaceForensics++: Learning to detect manipulated facial images. Proceedings of the
IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1–11.

[3]Guera, D., & Delp, E. J. (2018). Deepfake video detection using recurrent neural
networks. IEEE International Conference on Advanced Video and Signal Based Surveillance
(AVSS), pp. 1–6.

[4]Nguyen, H. H., Yamagishi, J., & Echizen, I. (2019). Use of a Capsule Network to detect
fake images and videos. arXiv preprint arXiv:1910.12467.

[5]Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ...
& Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition
at scale. International Conference on Learning Representations (ICLR).

[5]Tran, H., He, X., Singh, A., Zheng, C., & Bui, T. (2021). Exploring self-attention for
deepfake detection. IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 2500–2504.

[6]Dolhansky, B., Howes, R., Pflaum, B., Baram, N., & Ferrer, C. C. (2020). The DeepFake
Detection Challenge (DFDC) dataset. arXiv preprint arXiv:2006.07397.

[7]Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2020). Celeb-DF: A large-scale challenging
dataset for deepfake forensics. Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 3207–3216.

[8]Verdoliva, L. (2020). Media forensics and deepfakes: An overview. IEEE Journal of

Selected Topics in Signal Processing, 14(5), 910–932.

[9]Tariq, S., Lee, S., Kim, H., Shin, Y., & Woo, S. S. (2018). Detecting both machine and
human created fake face images in the wild. ACM Workshop on Information Hiding and
Multimedia Security (IH&MMSec), pp. 81–87.

Mini - 2
No ratings yet
Mini - 2
10 pages
K024 K006 DWM ResearchPaper
No ratings yet
K024 K006 DWM ResearchPaper
16 pages
A Performance Enhancement of Deepfake Video
No ratings yet
A Performance Enhancement of Deepfake Video
10 pages
Document Review 1
No ratings yet
Document Review 1
5 pages
Ijset v11 Issue6 571
No ratings yet
Ijset v11 Issue6 571
5 pages
Conferencepaper (Paper ID ICSSAS 0228)
No ratings yet
Conferencepaper (Paper ID ICSSAS 0228)
7 pages
A Study On Deepfake Detection Methods Using Computer Vision Algorithms
No ratings yet
A Study On Deepfake Detection Methods Using Computer Vision Algorithms
9 pages
Group 85 Survey Paper (1) - 1
No ratings yet
Group 85 Survey Paper (1) - 1
5 pages
Wa0009
No ratings yet
Wa0009
12 pages
Deepfake (1) - 2
No ratings yet
Deepfake (1) - 2
11 pages
DeepFake Detection for Students
No ratings yet
DeepFake Detection for Students
2 pages
Ai For Detecting Deep Fakes
No ratings yet
Ai For Detecting Deep Fakes
9 pages
Rumour Identification and Souce Verification in Social Media
No ratings yet
Rumour Identification and Souce Verification in Social Media
9 pages
Paper - Multi-Modal Deepfake Detection of Images, Videos, and Audio Using AIML Techniques
No ratings yet
Paper - Multi-Modal Deepfake Detection of Images, Videos, and Audio Using AIML Techniques
7 pages
DeepFake Synopsis (AutoRecovered)
No ratings yet
DeepFake Synopsis (AutoRecovered)
16 pages
Working Paper
No ratings yet
Working Paper
5 pages
Computers 13 00031
No ratings yet
Computers 13 00031
18 pages
Innovative Project
No ratings yet
Innovative Project
7 pages
Research Paper Draft
No ratings yet
Research Paper Draft
12 pages
Deepfake Detection Using CNN and DCGANS To Drop-Out Fake Multimedia Content A Hybrid Approach
No ratings yet
Deepfake Detection Using CNN and DCGANS To Drop-Out Fake Multimedia Content A Hybrid Approach
6 pages
Deepfake Detection Using Deep Learning: Emails
No ratings yet
Deepfake Detection Using Deep Learning: Emails
5 pages
DF Report
No ratings yet
DF Report
40 pages
A12 - Deepfake Detection Implementation
No ratings yet
A12 - Deepfake Detection Implementation
4 pages
A Novel Blockchain Based Deepfake Detection Method Using Federated and Deep Learning Models
No ratings yet
A Novel Blockchain Based Deepfake Detection Method Using Federated and Deep Learning Models
19 pages
Multimodal DeepFake Detection Paper 2
No ratings yet
Multimodal DeepFake Detection Paper 2
7 pages
Seminar
No ratings yet
Seminar
18 pages
Synopsis 2
No ratings yet
Synopsis 2
7 pages
1 s2.0 S240584402500653X Main
No ratings yet
1 s2.0 S240584402500653X Main
23 pages
(IJCST-V11I4P16) :nikhil Sontakke, Sejal Utekar, Shivansh Rastogi, Shriraj Sonawane
No ratings yet
(IJCST-V11I4P16) :nikhil Sontakke, Sejal Utekar, Shivansh Rastogi, Shriraj Sonawane
7 pages
Deepfake Implementetion Paper
No ratings yet
Deepfake Implementetion Paper
5 pages
Hybrid ViT-CNN Deepfake Detection Report
No ratings yet
Hybrid ViT-CNN Deepfake Detection Report
17 pages
Deepfake 2
No ratings yet
Deepfake 2
20 pages
Detection Synopsis
No ratings yet
Detection Synopsis
5 pages
Deepfake Detection for AI Experts
No ratings yet
Deepfake Detection for AI Experts
51 pages
Research Paper 1
No ratings yet
Research Paper 1
4 pages
A Machine Learning Based Approach For Deepfake Detection in Social Media Through Key Video Frame Extraction
No ratings yet
A Machine Learning Based Approach For Deepfake Detection in Social Media Through Key Video Frame Extraction
18 pages
Deepfake Detection
No ratings yet
Deepfake Detection
10 pages
Deepfake Video Detection Using Convolutional Visio
No ratings yet
Deepfake Video Detection Using Convolutional Visio
9 pages
NNW 2023 33 006
No ratings yet
NNW 2023 33 006
15 pages
Seminar Literature Review - Deepfake Detection - Rizkiaji Putro
No ratings yet
Seminar Literature Review - Deepfake Detection - Rizkiaji Putro
22 pages
Synopsis Report
No ratings yet
Synopsis Report
8 pages
IJATCSEDeepfake Video Detection Using Convolutional Neural Network
No ratings yet
IJATCSEDeepfake Video Detection Using Convolutional Neural Network
6 pages
CVi T
No ratings yet
CVi T
29 pages
Deepfake Detection For Human Face Images and Videos: A Survey
No ratings yet
Deepfake Detection For Human Face Images and Videos: A Survey
19 pages
Deep Fake Ai Morphology Detection-E7
No ratings yet
Deep Fake Ai Morphology Detection-E7
16 pages
DeepFake Detection Survey
100% (1)
DeepFake Detection Survey
19 pages
Deepfake Video Detection Using Convolutional Vision Transformer
No ratings yet
Deepfake Video Detection Using Convolutional Vision Transformer
9 pages
Paper 3
No ratings yet
Paper 3
6 pages
Nccds 3
No ratings yet
Nccds 3
3 pages
Deep Fake Deection
No ratings yet
Deep Fake Deection
13 pages
Deep-Fake Detection
No ratings yet
Deep-Fake Detection
7 pages
Group 21-Report
No ratings yet
Group 21-Report
25 pages
Document From ?????
No ratings yet
Document From ?????
9 pages
Final PDF
No ratings yet
Final PDF
42 pages
Visual Deepfake Detection Review of Techniques Tools Limitations and Future Prospects
No ratings yet
Visual Deepfake Detection Review of Techniques Tools Limitations and Future Prospects
39 pages
Ratnesh
No ratings yet
Ratnesh
26 pages
Deep Fake Detection Using CNN: Project Course On Neural Network
No ratings yet
Deep Fake Detection Using CNN: Project Course On Neural Network
4 pages
Notes # 4 Particle Nature of EM Radiation - Black Body Radiation & Photoelectric Effect
No ratings yet
Notes # 4 Particle Nature of EM Radiation - Black Body Radiation & Photoelectric Effect
5 pages
No SQL Notes
No ratings yet
No SQL Notes
18 pages
Team Synopsis-2
No ratings yet
Team Synopsis-2
12 pages
Team Synopsis 1
No ratings yet
Team Synopsis 1
12 pages
Pison VH10 User Manual
No ratings yet
Pison VH10 User Manual
166 pages
Distributed System Design Patterns
No ratings yet
Distributed System Design Patterns
25 pages
Service Management in Linux
No ratings yet
Service Management in Linux
21 pages
440N-Z2NRS1B: Mag-Coded 1 24V DC 2Nc-Sfty-Reed 10M Catalogue No
No ratings yet
440N-Z2NRS1B: Mag-Coded 1 24V DC 2Nc-Sfty-Reed 10M Catalogue No
2 pages
260 Easymow Robot Lawn Mower Manual
No ratings yet
260 Easymow Robot Lawn Mower Manual
21 pages
Research Paper B
No ratings yet
Research Paper B
44 pages
Roadroid - Continuous Road Condition Monitoring With Smartphones
No ratings yet
Roadroid - Continuous Road Condition Monitoring With Smartphones
20 pages
WannierTools Guide for Researchers
No ratings yet
WannierTools Guide for Researchers
107 pages
Intro to Computing Quiz
No ratings yet
Intro to Computing Quiz
1 page
It0049 Finalreviewer PDF
No ratings yet
It0049 Finalreviewer PDF
57 pages
Python MySQL Fashion Store Management
No ratings yet
Python MySQL Fashion Store Management
17 pages
Best Example of General Cover Letter
100% (1)
Best Example of General Cover Letter
4 pages
AutoCAD Chapter 4 Questions
No ratings yet
AutoCAD Chapter 4 Questions
8 pages
Brocade 8Gb SAN Switch For HPE BladeSystem C-Class-C04201477
No ratings yet
Brocade 8Gb SAN Switch For HPE BladeSystem C-Class-C04201477
20 pages
Some Visual Literacy Initiatives in Academic Institutions - A Literature Review From 1999 To The Present
No ratings yet
Some Visual Literacy Initiatives in Academic Institutions - A Literature Review From 1999 To The Present
35 pages
SoW - TMS - Market - Mordor Intelligence
No ratings yet
SoW - TMS - Market - Mordor Intelligence
5 pages
CAESARII
No ratings yet
CAESARII
4 pages
Software Project Management: Project Evaluation and Project Planning
No ratings yet
Software Project Management: Project Evaluation and Project Planning
121 pages
18csl57 Cn@Azdocuments - in
No ratings yet
18csl57 Cn@Azdocuments - in
99 pages
Application of Synchronized Phasor Measurements Units in Power Systems
No ratings yet
Application of Synchronized Phasor Measurements Units in Power Systems
16 pages
AI For Marketing
No ratings yet
AI For Marketing
198 pages
Web Services XML-RPC (JAVA-Python)
No ratings yet
Web Services XML-RPC (JAVA-Python)
21 pages
Secure Data Group Sharing and Conditional
No ratings yet
Secure Data Group Sharing and Conditional
3 pages
The Brian D. Kirkpatrick - Resume
No ratings yet
The Brian D. Kirkpatrick - Resume
2 pages
Assist Remove Quiz
No ratings yet
Assist Remove Quiz
17 pages
Bruteforce Android Pattern
No ratings yet
Bruteforce Android Pattern
5 pages
QS TrinityF90 Overview 220912
No ratings yet
QS TrinityF90 Overview 220912
2 pages
A 55nm Ultra High Density Two-Port Register File Compiler With Improved Write Replica Technique
No ratings yet
A 55nm Ultra High Density Two-Port Register File Compiler With Improved Write Replica Technique
4 pages
HPE - c04083434 - HP Scripting Toolkit 9.60 For Linux User Guide
No ratings yet
HPE - c04083434 - HP Scripting Toolkit 9.60 For Linux User Guide
69 pages
Chapter04 Answers at End
No ratings yet
Chapter04 Answers at End
47 pages