=Paper=
{{Paper
|id=Vol-3900/Paper9
|storemode=property
|title=A Survey: Deepfake and Current Technologies for Solutions
|pdfUrl=https://ceur-ws.org/Vol-3900/Paper9.pdf
|volume=Vol-3900
|authors=Sayan Banerjee,Sumit Kumar Yadav,Ankit Dhara,Md Ajij
|dblpUrl=https://dblp.org/rec/conf/dosier/BanerjeeYDA24
}}
==A Survey: Deepfake and Current Technologies for Solutions==
A Survey: Deepfake and Current Technologies for
Solutions
Sayan Banerjee1 , Sumit Kumar Yadav1 , Ankit Dhara1 and Md Ajij1,*,†
1
Department of Computer Science and Technology, University of North Bengal, Raja Rammohunpur, Darjeeling, West Bengal,
734013, India
Abstract
This paper offers a detailed survey of deepfake detection methods, addressing the challenges posed by the fast-
paced advancements in deepfake technology. It provides an overview of various detection techniques, examining
their effectiveness in identifying manipulated content. The survey covers traditional detection strategies, such as
digital forensics and watermarking, as well as modern AI-driven approaches like convolutional and recurrent
neural networks. The study delves into the key features of deepfake technology, which leverages advanced
machine learning models, particularly Generative Adversarial Networks (GANs), to manipulate video, audio, and
images. These techniques have led to the creation of highly realistic synthetic media that is increasingly difficult
to detect, raising serious concerns about privacy, misinformation, and security. Recent progress in deepfake
detection has focused on improving the accuracy and efficiency of real-time solutions. Approaches that integrate
visual, audio, and behavioural cues have demonstrated significant potential in distinguishing authentic content
from fake media. Despite these advancements, there remains an urgent need for detection systems that can
generalize effectively across different types of deepfakes, as many current models struggle with previously unseen
or extremely realistic synthetic content. The survey reviews a broad spectrum of detection methods, assessing
their strengths, weaknesses, and performance on various datasets. It also identifies gaps in the current research
landscape and suggests directions for future work, emphasizing the importance of developing more robust and
scalable detection frameworks.
Keywords
Deepfake, Survey, Advanced machine learning models, Generative Adversarial Networks (GANs), Convolutional
Neural Networks (CNN), Recurrent Neural Networks (RNN)
1. Introduction
Deepfakes, a term combining "deep learning" and "fake", describe highly convincing synthetic media
produced using advanced machine learning techniques. Emerging in 2017, deepfakes initially focused on
facial manipulation in videos. Since then, the technology has expanded to encompass audio and image
alteration. Using algorithms like Generative Adversarial Networks (GANs), deepfakes can realistically
swap faces, modify facial expressions, and even mimic voices, making it increasingly challenging to
distinguish between genuine and synthetic content. Although initially developed for entertainment
purposes, deepfake technology has evolved rapidly, bringing with it significant implications for digital
privacy, security, and the reliability of online information.
The swift advancement of deepfake technology is both impressive and concerning. As the algorithms
become more sophisticated, so does the quality of synthetic content. This progress has sparked
worries about the potential misuse of deepfakes for spreading misinformation, committing fraud, and
facilitating identity theft. Deepfakes have already been used in disinformation campaigns, influencing
public perception and casting doubt on media authenticity. Their potential to erode trust in individuals
and institutions underscores the urgent need for effective detection and prevention measures.
This paper seeks to offer an in-depth survey of the existing methods for detecting and mitigating
deepfakes. By examining various techniques, such as facial feature analysis, biometric inconsistencies,
The 2024 Sixth Doctoral Symposium on Intelligence Enabled Research (DoSIER 2024), November 28–29, 2024, Jalpaiguri, India
*
Corresponding author.
†
These authors contributed equally.
$ banerjeesayan554@gmail.com (S. Banerjee); sk9373279@gmail.com (S. K. Yadav); ankitdhara8250@gmail.com (A. Dhara);
mdajij@nbu.ac.in (M. Ajij)
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
and behavioural patterns, the study assesses the effectiveness of these approaches across different
datasets and scenarios. The goal is to highlight current solutions while identifying research gaps and
suggesting future directions to address the growing sophistication of deepfake technology.
The motivation for this survey stems from the increasing need for reliable systems capable of
accurately and efficiently detecting synthetic media. As deepfakes become more prevalent and easily
accessible, developing robust detection methods is crucial to protect privacy, uphold the integrity of
digital content, and prevent misuse. This paper aims to contribute to this effort by thoroughly analysing
the current state of deepfake detection, supporting the development of more advanced and dependable
solutions.
Deepfake technology, a product of advancements in artificial intelligence (AI), specifically deep
learning, enables the creation of hyper-realistic synthetic media that can manipulate audio, video, and
images to mimic reality convincingly. While this technology offers legitimate applications, such as
in entertainment and education, its misuse poses significant societal threats. Deepfakes have been
used to spread misinformation, perpetuate fraud, violate individual privacy, and destabilize public trust
[1, 2]. The societal implications of deepfake proliferation are profound. For example, deepfakes can
undermine democratic processes by creating fabricated political speeches or events [3]. They can also
perpetuate personal and institutional damages, such as identity theft and reputation harm [4]. Moreover,
the accessibility of deepfake-generating tools exacerbates the problem by enabling individuals with
minimal technical expertise to create deceptive content [5]. These issues necessitate urgent attention
and robust countermeasures to combat the deepfake menace effectively.
Existing reviews on deepfake technologies primarily focus on foundational concepts and early
detection mechanisms. However, the rapid evolution of AI and the growing sophistication of deepfake
creation techniques have rendered many of these reviews outdated [6, 7]. This survey aims to fill the
gap by providing a comprehensive overview of recent advancements in deepfake detection, prevention,
and mitigation strategies. It also emphasizes the importance of addressing the societal and ethical
challenges associated with deepfakes [8].
We hypothesize that advancements in machine learning, AI, and cybersecurity offer promising
solutions to mitigate the threats posed by deepfakes. By leveraging innovative detection techniques,
regulatory frameworks, and collaborative efforts, it is possible to reduce the negative impacts of deepfake
technology effectively [9, 10].
This survey is guided by several objectives: to consolidate and evaluate current solutions to the
challenges posed by deepfakes, to identify gaps and limitations in existing approaches to deepfake
detection and mitigation, and to propose future research directions and strategies for combating
deepfake-related issues. The scope of this survey encompasses deepfake detection techniques, including
machine learning and digital watermarking methods [2, 9], prevention strategies such as AI-generated
content authentication and multi-modal analysis [10], and mitigation efforts, including regulatory
frameworks, ethical considerations, and public awareness campaigns [11, 7].
The remainder of this paper is organized as follows: Section 2 reviews deepfake technology, including
its evolution, societal implications, and research gaps. Section 3 details the workflow of deepfake
detection, highlighting key stages and methodologies. Section 4 outlines detection and mitigation
approaches, comparing techniques and evaluation metrics. Section 5 discusses findings, trends, datasets,
and mathematical foundations. Section 6 identifies challenges and research gaps, including dataset
limitations and real-time detection issues. Section 7 explores recommendations and potential impacts.
Section 8 concludes with a summary of findings and the importance of addressing gaps.
2. Literature Review
The proliferation of deepfake technology has prompted extensive research into its origins, advancements,
and countermeasures. This section provides a structured review, covering the historical background,
key findings, critical analyses, and research gaps in deepfake technology.
2.1. Historical Background
Deepfake technology has transformed the digital landscape, leveraging advancements in artificial
intelligence and deep learning. The early foundation of this field was laid with the development of
Generative Adversarial Networks (GANs), which facilitated the creation of hyper-realistic visual and
audio content [12]. Initially, deepfakes found applications in entertainment and creative industries, such
as enhancing visual effects in movies and creating virtual influencers [13]. However, their malicious
use for spreading misinformation, violating privacy, and manipulating political narratives has garnered
significant attention [14, 15]. The dual-edged nature of this technology highlights both its innovative
potential and the ethical dilemmas it poses.
2.2. Key Findings
2.2.1. Categorization of Detection Methods
Research efforts in deepfake detection have yielded several methodologies, each with distinct approaches
and objectives:
• AI-Based Techniques: Machine learning and deep learning algorithms, particularly Convo-
lutional Neural Networks (CNNs), have achieved notable success in identifying deepfakes by
detecting artifacts introduced during the generation process [16, 17]. Advanced models such as
recurrent neural networks (RNNs) and transformers have also been explored to analyze temporal
inconsistencies in videos [18]. Pre-trained models and transfer learning have further enhanced
the efficiency of these techniques.
• Signal Processing Approaches: Signal processing-based methods focus on identifying spatial
and temporal anomalies in manipulated media. These methods often examine discrepancies in
frame transitions, lighting inconsistencies, and unnatural blending between facial regions [19].
Techniques such as spectral analysis and phase correlation are employed to uncover hidden
manipulations that are otherwise challenging to detect.
• Blockchain Solutions: Blockchain technology is increasingly being adopted for media authenti-
cation and traceability. By leveraging immutable ledgers, these solutions can validate the origin
and integrity of digital content, thereby providing a robust mechanism to counteract deepfake
manipulation [17]. Integration with smart contracts can further automate validation processes,
enhancing reliability.
• Feature Extraction-Based Methods: Feature extraction-based approaches analyze unique
patterns within media to differentiate between authentic and manipulated content. Techniques
such as frequency domain analysis, optical flow analysis, and texture-based methods have been
employed to identify irregularities that are imperceptible to the human eye [20]. In addition,
facial landmark detection and biomechanical consistency checks provide granular insights into
potential manipulations.
• Hybrid Approaches: Hybrid methods combine multiple techniques, such as integrating AI-based
algorithms with signal processing or blockchain frameworks, to enhance detection accuracy.
These approaches aim to capitalize on the strengths of each methodology while mitigating their
individual limitations [21]. Examples include combining temporal analysis with CNN-based
models or integrating blockchain verification with real-time anomaly detection algorithms.
The timeline of deepfake evolution, as shown in Figure 1, provides a detailed overview of the
technological advancements that have driven this field. It highlights critical breakthroughs, including
the introduction of Generative Adversarial Networks (GANs) in 2014, which revolutionized content
generation by enabling high-quality synthetic media. Subsequent developments include advanced
autoencoders and transfer learning techniques, which improved model scalability and personalization.
The timeline also emphasizes the rise of real-time face reenactment systems, deep neural networks for
voice synthesis, and advancements in deepfake detection algorithms. These milestones underline the
rapid growth and sophistication of this technology, posing significant challenges and opportunities in
various domains.
Figure 1: Timeline illustrating the evolution of deepfake technology.
Datasets such as the DeepFake Detection Challenge (DFDC) and FaceForensics++ have underpinned
advancements in detection algorithms, providing benchmarks for evaluation [22, 19].
2.3. Critical Analysis
The landscape of deepfake detection is characterized by both significant progress and persistent chal-
lenges. AI-driven methods have achieved high accuracy in controlled environments but often struggle
with generalization to diverse datasets and unforeseen manipulation techniques [14, 15]. Signal pro-
cessing approaches, while effective in controlled scenarios, may lack robustness against sophisticated
deepfake methods. Blockchain solutions, though promising, face scalability and adoption challenges.
Feature extraction techniques are often computationally intensive, limiting their applicability in real-
time settings [20, 18].
Recurring issues include the need for standardized evaluation metrics, improved computational
efficiency, and ethical considerations. Furthermore, the rapid evolution of deepfake generation methods
necessitates continuous adaptation of detection strategies [23, 21]. The absence of datasets that capture
real-world variability remains a bottleneck, as most benchmarks are designed for academic purposes
[22].
2.4. Identification of Research Gaps
While considerable advancements have been made, several critical gaps remain unaddressed:
• Real-Time Detection: The development of lightweight and efficient algorithms capable of
real-time processing remains a significant challenge [24]. Advances in edge computing could
provide a pathway for achieving this goal.
• Robustness Across Domains: Current detection methods require improved generalization to
handle diverse datasets and evolving threats [21]. Domain adaptation techniques and unsupervised
learning approaches could play a pivotal role.
• Ethical and Legal Frameworks: Comprehensive guidelines and regulations addressing the
misuse of deepfake technology are urgently needed [25]. Collaboration between technologists,
policymakers, and ethicists is essential to establish a robust framework.
• Advanced Benchmarks: The lack of standardized and representative datasets hinders the
objective evaluation and comparison of detection methods [22]. Future benchmarks should
incorporate real-world variations, such as diverse lighting, occlusions, and cultural differences.
Addressing these gaps is imperative for advancing the field of deepfake detection and fostering trust
in digital ecosystems. Future research must prioritize the development of scalable, robust, and ethically
aligned solutions to counteract the growing threats posed by deepfake technology. Collaboration across
disciplines and the integration of emerging technologies will be key to overcoming these challenges.
3. Workflow: Deepfake Detection
The process of deepfake detection involves several critical steps, as illustrated in Figure 2. Each step
plays a vital role in accurately distinguishing between original and fake content. Below is a detailed
explanation of the workflow, along with examples of methodologies and techniques commonly employed
at each stage:
Applying various feature extraction
methodologies
Feature
Video Frames
Extraction
Original
Classification
or
Fake
Applying various classification
techniques
Figure 2: Workflow illustrating the steps in deepfake detection.
1. Input - Video Frames Extraction
The first step involves splitting the input video into individual frames. These frames serve as the
foundational data for further analysis. High-resolution frames are preferred to ensure the features used
in detection are well-represented.
Example Methodology:
• Frame Sampling: Extract frames at fixed intervals (e.g., every nth frame) to reduce computational
load while maintaining key details.
2. Feature Extraction
Feature extraction involves identifying and isolating the most critical aspects of the video frames that
can reveal inconsistencies or unnatural patterns. These features form the basis for differentiating
between real and fake media.
Example Feature Extraction Methods:
• Pixel-Level Artifacts Detection: Focus on artifacts such as inconsistent lighting, shadows, or
pixel distortions often introduced during deepfake generation.
• Temporal Inconsistencies: Analyze frame-to-frame transitions for unnatural movement or
discontinuities.
• Frequency Domain Analysis: Techniques like Discrete Fourier Transform (DFT) or Wavelet
Transform to detect anomalies in high-frequency bands.
• Biometric Feature Analysis: Focus on facial landmarks, eye movement, and lip-sync patterns
to identify irregularities.
3. Classification
Once features are extracted, they are fed into a classification model to predict whether the content is
original or fake. This step leverages machine learning and deep learning algorithms to make the final
determination.
Example Classification Techniques:
• Traditional Machine Learning Models:
– SVM (Support Vector Machines): Effective for small datasets and well-defined features.
– Random Forest: Ensemble-based approach for feature importance and classification.
• Deep Learning Models:
– Convolutional Neural Networks (CNNs): Suitable for spatial features like pixel-level
inconsistencies or facial biometrics.
– Recurrent Neural Networks (RNNs): Ideal for temporal features such as frame continuity
and motion consistency.
– EfficientNet, MobileNetV2, and VGG16: Pretrained architectures fine-tuned for deepfake
detection tasks.
• Hybrid Models: Combining CNNs for spatial features with RNNs for temporal consistency
checks.
4. Output - Classification Result
The final step produces a classification result that labels the input as either "Original" or "Fake". The
accuracy and reliability of this output depend on the effectiveness of the previous steps and the quality
of training data used to build the detection model.
Evaluation Metrics:
• Accuracy, Precision, Recall: Measure overall model performance.
• F1 Score: Balance between precision and recall.
• AUC-ROC Curve: Evaluate model sensitivity to different thresholds.
4. Methodologies and Approaches
This section outlines the methodologies employed in surveying the research landscape on deepfake
detection and mitigation. It describes the survey methodology, provides detailed insights into various
approaches analyzed, and presents a comparative analysis of these methodologies.
4.1. Survey Methodology
The reviewed papers were selected using a systematic approach to ensure comprehensive coverage of
the field. A database search was conducted across platforms such as IEEE Xplore, Springer, and ACM
Digital Library using keywords like "deepfake detection," "GAN-based manipulation," and "blockchain
authentication." The inclusion criteria prioritized articles published in peer-reviewed journals and
conferences between 2019 and 2025. Studies that lacked empirical results or focused solely on deepfake
generation without discussing detection were excluded. A total of 50 papers met these criteria and were
included in this review.
The evaluation framework for categorizing existing solutions focused on three key dimensions:
• Technique: Classification into AI-based, signal processing-based, blockchain-assisted, hand-
crafted feature extraction, and hybrid approaches.
• Performance Metrics: Accuracy, scalability, and computational efficiency.
• Applicability: Suitability for real-time detection and generalization across datasets.
4.2. Approaches Analyzed
4.2.1. Machine Learning/AI-Based Techniques
Machine learning and AI-based techniques are among the most widely explored methods for deepfake
detection. Convolutional Neural Networks (CNNs) effectively detect spatial inconsistencies, such as
unnatural textures and blending artifacts, in manipulated media [16]. Recurrent Neural Networks
(RNNs) and transformers analyze temporal patterns, making them well-suited for video analysis [18].
Generative Adversarial Networks (GANs), while primarily used for creating deepfakes, are also utilized
for adversarial training to identify and counteract synthetic content [20]. Furthermore, pre-trained
models and transfer learning approaches have improved detection performance by reducing training
requirements and leveraging pre-existing knowledge bases.
4.2.2. Digital Forensics Techniques
Digital forensics relies on analyzing inconsistencies and artifacts in video and audio signals. Techniques
such as phase correlation, frequency domain analysis, and optical flow detection identify discrepancies
that are challenging for deepfake generation algorithms to mimic [19]. For instance, variations in
lighting, unnatural reflections, and irregularities in motion provide telltale signs of manipulation. These
methods are particularly valuable in scenarios where content integrity is under scrutiny.
4.2.3. Blockchain for Authentication
Blockchain technology offers a robust framework for verifying the authenticity and provenance of
digital content. Immutable ledgers record the history of media, ensuring traceability and preventing
tampering [17]. Smart contracts enable automated verification processes, enhancing the scalability of
blockchain-assisted solutions. This approach is particularly effective in applications requiring real-time
validation, such as social media and news dissemination platforms.
4.2.4. Handcrafted Feature Extraction Techniques
Handcrafted feature extraction focuses on identifying specific features that distinguish manipulated
from authentic media. These methods analyze elements such as facial landmarks, eye blinking patterns,
and lip synchronization [20]. Techniques like Local Binary Patterns (LBP) and Histogram of Oriented
Gradients (HOG) are used to detect texture inconsistencies and unnatural movements. Although
computationally less intensive than AI-based approaches, handcrafted techniques often struggle with
the subtle sophistication of modern deepfakes.
4.2.5. Hybrid Approaches
Hybrid approaches integrate multiple methodologies to enhance robustness and accuracy. For example,
combining CNNs with optical flow analysis leverages both spatial and temporal insights. Similarly,
blockchain verification can be paired with AI-driven anomaly detection for comprehensive validation
[21]. These approaches aim to balance the strengths of individual techniques while mitigating their
limitations, making them suitable for complex and diverse use cases.
4.3. Comparative Analysis
A comparative analysis of the methodologies is presented in Table 1, highlighting their efficiency,
accuracy, scalability, and suitability for real-time detection.
Table 1
Comparison of Deepfake Detection Approaches
Approach Accuracy Scalability Real-Time Suitabil-
ity
AI-Based Techniques High Moderate Limited due to com-
putational intensity
Digital Forensics Moderate High Moderate
Blockchain Solutions High Low High
Handcrafted Feature Extrac- Moderate High High
tion
Hybrid Approaches Very High Moderate High
AI-based techniques excel in accuracy but are computationally demanding, making scalability and
real-time application challenging. Digital forensics methods offer high scalability but may struggle
with sophisticated manipulations. Blockchain solutions provide high reliability and real-time suitability
but face scalability issues due to resource requirements. Handcrafted feature extraction methods are
efficient and scalable but less effective against subtle manipulations. Hybrid approaches represent a
balanced solution, combining accuracy, scalability, and real-time suitability.
In conclusion, while each methodology has its strengths and weaknesses, hybrid approaches demon-
strate the most promise for addressing the diverse challenges posed by deepfake technology.
5. Findings and Trends
5.1. Key Insights
Recent advancements in deepfake detection have introduced innovative techniques that significantly
improve accuracy and robustness against increasingly sophisticated deepfake content. Maheshwari et
al. (2024) explored plasmonic nanomaterials with surface plasmon resonance (SPR) for image detection,
achieving over 95% accuracy even in complex scenarios [26]. A hybrid deep learning model combining
MesoNet4 and ResNet101 was proposed by Javed et al. (2024), attaining detection accuracies of 98.73%,
96.89%, and 97.90% on FaceForensics++, CelebV1, and CelebV2 datasets, respectively [27]. Advanced
biosensors integrating plasmonic resonance with convolutional neural networks reached 98.7% accuracy
and demonstrated rapid response times (0.8 seconds per frame) [28].
Blockchain-based federated learning approaches, such as Heidari et al.’s (2024) method, enhanced
accuracy by 6.6% compared to benchmarks while maintaining data confidentiality [29]. Temporal feature
prediction schemes focusing on audio-visual modalities demonstrated superior accuracy (84.33%) on
the FakeAVCeleb dataset [30]. Vision Transformers (ViTs) showed great promise in multiclass detection
tasks, achieving an F1-score of 99.90%, outperforming traditional CNNs [31]. Kingra et al.’s (2024)
SFormer architecture, based on spatio-temporal transformers, achieved up to 100% accuracy on datasets
such as FF++ and Deeper-Forensics [32]. Almestekawy et al. (2024) demonstrated that incorporating
spatiotemporal textures improved reproducibility and accuracy by up to 91.96% [33]. Guarnera et al.
(2024) introduced a hierarchical multi-level approach for deepfake detection, achieving 97% accuracy
across multiple GAN and diffusion model tasks [34]. Gao et al. (2024) used temporal audio-video feature
prediction to reach an 84.33% accuracy [30]. Lastly, Arshed et al. (2024) explored Vision Transformers
(ViTs) achieving F1-scores close to 99.90% [31].
5.2. Statistical Analysis
Table 2 compares the performance metrics, including accuracy, computational cost, and dataset bench-
marks, for different methods. These approaches reflect varying trade-offs in sensitivity, speed, and
dataset applicability.
Table 2
Performance Comparison of Deepfake Detection Techniques
Technique Accuracy Dataset(s) Strengths/Weaknesses
Plasmonic Nanomaterials 95% Custom Dataset High sensitivity; robust to
(SPR) [26] lighting conditions but com-
putationally intensive.
Hybrid Model (MesoNet4 + 98.73% (FaceForen- FaceForensics++, Real-time capability; limited
ResNet101) [27] sics++), 96.89% CelebV1, CelebV2 multimodal application.
(CelebV1)
Advanced Plasmonic Biosen- 98.7% Custom dataset Fast response time; integra-
sor [28] tion challenges in real-world
scenarios.
Blockchain-Based Feder- 6.6% increase over Diverse Data privacy maintained;
ated Learning [29] benchmarks high computational com-
plexity.
Temporal Feature Prediction 84.33% FakeAVCeleb Novel audio-visual fusion;
[30] lower accuracy compared to
transformer-based models.
Vision Transformers (ViTs) 99.90% Multiclass-prepared High accuracy; robust to
[31] dataset compression and resizing.
SFormer [32] Up to 100% FF++, Deeper- Superior generalization;
Forensics computationally expensive.
Spatiotemporal Textures 91.96% Celeb-DF, FF++ Enhanced stability; mod-
[33] erate accuracy in cross-
dataset scenarios.
Hierarchical Multi-level 97% GAN and Diffusion Robust to attacks like com-
GAN Analysis [34] Model Dataset pression; lacks real-time ca-
pabilities.
Patch-Wise Deep Learning 99.90% GAN, Stable Diffu- Impressive F1 rates; compu-
[31] sion Datasets tational overhead.
5.3. Popular Datasets for Deepfake Validation
Datasets play a crucial role in validating and improving deepfake detection solutions. Table 3 highlights
some of the most popular datasets used in this domain, emphasizing their size, types of content, and
primary applications.
5.4. Emerging Trends
Several trends in deepfake detection have emerged:
• Multimodal Solutions: Techniques like temporal feature prediction and hybrid models in-
creasingly integrate multiple modalities (e.g., audio and video) to enhance detection accuracy
Table 3
Popular Datasets Used for Deepfake Validation
Dataset Name Description Size Types of Content Source
FaceForensics++ Large-scale dataset 1,000 videos Deepfake, Neural University
for face forgery de- Rendered, Face of Erlangen-
tection. Swapping Nuremberg
DeepFakeDetection Focused on detecting 3,000 videos Real, Deepfake University of Califor-
deepfake videos. nia, Berkeley
Celeb-DF High-resolution 5,639 videos Celebrities, TV Zhejiang University
deepfake videos Shows
featuring celebrities.
DFDC (Deepfake Comprehensive 100,000 videos Real, Deepfake Facebook AI
Detection Chal- dataset for deepfake
lenge) challenges.
The Realism of Evaluates realism in Fully Anno- Deepfake, GANs Stanford University
Deepfakes deepfake generation. tated
[30].
• Transformer Architectures: Vision Transformers (ViTs) and spatio-temporal transformer
models like SFormer demonstrate exceptional performance, particularly in generalizing across
datasets [31, 32].
• Adversarial Learning: GAN-based methods for deepfake generation have inspired adversarial
learning approaches to detect increasingly realistic fakes.
• Real-Time and Scalable Solutions: Biosensors and hybrid architectures focus on reducing
latency, with potential for real-time applications [28, 27].
• Privacy-Preserving Techniques: Blockchain-based federated learning represents a shift to-
wards safeguarding data privacy while achieving robust detection [29].
These trends indicate a paradigm shift towards integrating diverse modalities, leveraging advanced
architectures, and prioritizing real-time and privacy-preserving solutions for scalable and effective
deepfake detection.
5.5. Mathematical Foundations for Detection
In deepfake detection, various mathematical models and techniques are employed to enhance accuracy
and robustness. The key mathematical foundations for these detection models include Generative
Adversarial Networks (GANs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks
(RNNs), Attention Mechanisms, Adversarial Training Loss, and Ensemble Prediction. As shown in
Table 4, GANs leverage an adversarial training approach, where a generator and discriminator interact
to distinguish real from fake data. CNNs, on the other hand, apply convolution operations to extract
spatial features from images, crucial for analyzing image patterns in deepfakes. RNNs are employed for
sequential data, such as video frames, to capture temporal dependencies. The attention mechanism,
often used in Vision Transformers (ViTs), helps models focus on significant features, enhancing the
detection process. Additionally, adversarial training loss is designed to improve model robustness
by exposing it to adversarial examples. Finally, ensemble prediction aggregates results from multiple
models to boost the overall detection accuracy.
6. Challenges and Gaps
6.1. Current Challenges
Despite the advancements in deepfake detection technologies, several technical challenges persist,
limiting the effectiveness of current solutions:
Table 4
Mathematical Formulas for Deepfake Detection Models
Mathematical Concept Formula and Explanation
1. Generative Adversarial min𝐺 max𝐷 𝑉 (𝐷, 𝐺) = E𝑥∼𝑝data (𝑥) [log 𝐷(𝑥)] + E𝑧∼𝑝𝑧 (𝑧) [log(1 −
Networks (GANs) 𝐷(𝐺(𝑧)))]
𝐷(𝑥): Discriminator’s probability of 𝑥 being real.
𝐺(𝑧): Data generated by 𝐺 from noise 𝑧.
𝑝data (𝑥): Distribution of real data.
𝑝𝑧 (𝑧): Distribution of noise.
∑︀𝑀 −1 ∑︀𝑁 −1
2. Convolutional Neural 𝑦[𝑖, 𝑗] = 𝑚=0 𝑛=0 𝑥[𝑖 + 𝑚, 𝑗 + 𝑛] · 𝑘[𝑚, 𝑛]
Networks (CNNs)
𝑥[𝑖, 𝑗]: Input image pixel at position (𝑖, 𝑗).
𝑘[𝑚, 𝑛]: Filter kernel of size 𝑀 × 𝑁 .
𝑦[𝑖, 𝑗]: Convolved output.
3. Recurrent Neural Net- ℎ𝑡 = 𝜎(𝑊ℎ ℎ𝑡−1 + 𝑊𝑥 𝑥𝑡 + 𝑏)
works (RNNs)
ℎ𝑡 : Hidden state at time 𝑡.
ℎ𝑡−1 : Hidden state from the previous time step.
𝑥𝑡 : Input at time 𝑡.
𝑊ℎ , 𝑊𝑥 : Weight matrices.
𝑏: Bias vector.
𝜎: Activation function. (︁ )︁𝑇
4. Attention Mechanism Attention(𝑄, 𝐾, 𝑉 ) = softmax 𝑄𝐾 √
𝑑𝑘
𝑉
𝑄, 𝐾, 𝑉 : Query, key, and value matrices.
𝑑𝑘 : Dimension of the key vector.
5. Adversarial Training ℒadv = E(𝑥,𝑦)∼𝒟 [max𝛿∈𝑆 ℓ(𝑓 (𝑥 + 𝛿), 𝑦)]
Loss
𝛿: Perturbation within constraint 𝑆.
ℓ(𝑓 (𝑥), 𝑦): Loss function comparing prediction 𝑓 (𝑥) with label 𝑦.
∑︀𝑁
6. Ensemble Prediction 𝑃ensemble = 𝑁1 𝑖=1 𝑃𝑖
𝑃𝑖 : Prediction probability from the 𝑖-th model.
𝑁 : Number of models.
• Detection Accuracy for Low-Quality Videos: Many deepfake detection models struggle
with low-resolution or highly compressed videos, which are often encountered on social media
platforms. This degradation in quality obscures telltale artifacts, reducing detection performance.
• Computational Overhead: Deep learning-based detection methods, while highly accurate,
often require significant computational resources. Balancing the need for high detection accuracy
with computational efficiency remains a key challenge, particularly for real-time applications.
• Generalization Across Techniques: As new and more sophisticated deepfake generation
techniques emerge, detection models often fail to generalize, requiring constant retraining on
updated datasets.
• Real-Time Detection: Many existing approaches lack the speed needed for real-time detection,
especially in live-streaming or high-throughput environments, where immediate detection is
crucial.
• Robustness to Adversarial Attacks: Deepfake detection models are vulnerable to adversarial
attacks that subtly alter fake content to evade detection mechanisms.
6.2. Research Gaps
In addition to technical challenges, there are several gaps in current research that must be addressed to
advance deepfake detection methodologies:
• Standardized Datasets: While several datasets exist, there is a lack of universally accepted
benchmarks that cover diverse content types, resolutions, and manipulation techniques. Creating
standardized, diverse datasets would enhance model comparability and reliability.
• Legal and Ethical Frameworks: Deepfake detection research often overlooks the legal and
ethical implications of using synthetic media. Establishing guidelines for the responsible use of
detection technologies and addressing privacy concerns is critical.
• Robustness Against Evolving Deepfake Techniques: As generative models continue to
evolve, there is a need for detection methods that can adapt to new manipulation techniques
without requiring frequent retraining.
• Cross-Platform Scalability: Detection methods often perform well on specific datasets but fail
when deployed across different platforms or real-world scenarios. Research into scalable and
robust cross-platform solutions is necessary.
• Human-AI Collaboration: Current systems primarily focus on automated detection, with little
emphasis on integrating human expertise to improve accuracy and interpretability of results.
• Ethical Use of Detection Tools: There is a need to address potential misuse of detection tools
themselves, such as leveraging them to create more advanced deepfakes by understanding their
weaknesses.
Addressing these challenges and research gaps will require a concerted effort from academia, industry,
and policymakers to ensure that deepfake detection technologies remain effective, equitable, and ethical
in the face of evolving threats.
7. Future Directions
7.1. Recommendations
To advance the field of deepfake detection and mitigate the risks associated with synthetic media, the
following actionable steps are recommended:
• Development of Lightweight, Real-Time Models: Future research should focus on creating
computationally efficient deepfake detection models capable of real-time processing. This in-
volves exploring novel architectures, such as transformer-based models optimized for speed and
scalability.
• Building More Diverse and Representative Datasets: Establishing datasets that include a
wide variety of manipulation techniques, demographics, and content types will improve the
robustness and generalizability of detection models. Collaboration among research institutions
and industry can facilitate the creation of comprehensive benchmarks.
• Creating Legal and Ethical Frameworks: Policymakers and researchers should work together
to establish guidelines for the responsible use of generative technologies. This includes defining
acceptable practices, ensuring transparency, and addressing privacy concerns in dataset usage.
• Enhancing Robustness Against Adversarial Attacks: Research should prioritize techniques
to make detection models resilient to adversarial examples, such as adversarial training, ensemble
methods, and anomaly detection frameworks.
• Integration of Multimodal Approaches: Combining audio, video, and textual data can lead to
more comprehensive detection systems. Future work should focus on integrating these modalities
effectively to improve detection accuracy.
• Fostering Human-AI Collaboration: Developing tools that allow human experts to interact
with detection systems can enhance the interpretability and reliability of results, particularly in
high-stakes scenarios.
7.2. Potential Impact
The proposed advancements in deepfake detection can have far-reaching implications across various
domains:
• Policy-Making: Improved detection methods and standardized datasets can inform regulatory
frameworks, helping governments and organizations address the ethical and legal challenges
posed by deepfake technology.
• Societal Trust: By effectively mitigating the spread of synthetic media, advanced detection
technologies can restore public trust in digital content, reducing the impact of misinformation
and manipulation.
• Adoption of AI Technologies: The development of robust and ethical deepfake detection
systems will encourage the responsible adoption of AI technologies in industries such as media,
entertainment, and cybersecurity.
• Enhanced Security Measures: Real-time detection capabilities can be integrated into digi-
tal platforms, safeguarding users against malicious deepfake content and protecting sensitive
information.
By addressing these recommendations and leveraging the potential impact, the research community
can ensure that deepfake detection technologies remain a step ahead of evolving generative methods,
fostering a safer and more trustworthy digital environment.
8. Conclusion
This survey has explored the current state of deepfake detection technologies, highlighting the rapid
advancements in methods designed to counteract the growing sophistication of generative models. Key
insights include the effectiveness of hybrid approaches, such as combining multimodal analysis with
AI-based techniques, and the potential of transformer-based architectures to improve accuracy and
scalability. Despite these advancements, challenges persist in detecting low-quality or adversarially
manipulated deepfakes, underscoring the need for robust and adaptable solutions.
This work consolidates knowledge from diverse fields, presenting a comprehensive review of the
strengths and limitations of existing deepfake detection methods. By identifying research gaps-such as
the need for standardized datasets and ethical frameworks-this survey provides a roadmap for future
studies. It also emphasizes the importance of integrating human expertise with automated systems to
enhance the interpretability and reliability of detection outcomes.
As deepfake technology continues to evolve, the importance of proactive research and collaboration
cannot be overstated. The development of lightweight, real-time detection models and the establishment
of legal and ethical standards are crucial steps toward combating the misuse of synthetic media. By
fostering cross-disciplinary partnerships and prioritizing innovation, the research community can
address emerging threats and ensure the responsible use of AI technologies, safeguarding societal trust
and digital integrity.
Declaration on Generative AI
The author(s) have not employed any Generative AI tools.
References
[1] S. H. Al-Khazraji, H. H. Saleh, A. I. KHALID, I. A. MISHKHAL, Impact of deepfake technology on
social media: Detection, misinformation and societal implications, The Eurasia Proceedings of
Science Technology Engineering and Mathematics 23 (2023) 429–441.
[2] M. Sharma, M. Kaur, A review of deepfake technology: an emerging ai threat, Soft Computing for
Security Applications: Proceedings of ICSCS 2021 (2022) 605–619.
[3] C. Whyte, Deepfake news: Ai-enabled disinformation as a multi-level public policy challenge,
Journal of cyber policy 5 (2020) 199–217.
[4] P. Singh, D. B. Dhiman, Exploding ai-generated deepfakes and misinformation: A threat to global
concern in the 21st century, Available at SSRN 4651093 (2023).
[5] W. Matli, Extending the theory of information poverty to deepfake technology, International
Journal of Information Management Data Insights 4 (2024) 100286.
[6] A. O. Kwok, S. G. Koh, Deepfake: a social construction of technology perspective, Current Issues
in Tourism 24 (2021) 1798–1802.
[7] D. Chapagain, N. Kshetri, B. Aryal, Deepfake disasters: A comprehensive review of technology,
ethical concerns, countermeasures, and societal implications, in: 2024 International Conference
on Emerging Trends in Networks and Computer Communications (ETNCC), IEEE, 2024, pp. 1–9.
[8] D. Sarkar, S. De Sarkar, Combatting deep-fakes in india–an analysis of the evolving legal paradigm
and its challenges (2024).
[9] M. Mustak, J. Salminen, M. Mäntymäki, A. Rahman, Y. K. Dwivedi, Deepfakes: Deceptions,
mitigations, and opportunities, Journal of Business Research 154 (2023) 113368.
[10] M. R. Shoaib, Z. Wang, M. T. Ahvanooey, J. Zhao, Deepfakes, misinformation, and disinformation
in the era of frontier ai, generative ai, and large ai models, in: 2023 International Conference on
Computer and Applications (ICCA), IEEE, 2023, pp. 1–7.
[11] M. Pawelec, Decent deepfakes? professional deepfake developers’ ethical considerations and their
governance potential, AI and Ethics (2024) 1–26.
[12] R. Chataut, A. Upadhyay, Introduction to deepfake technology and its early foundations, in:
Deepfakes and Their Impact on Business, IGI Global Scientific Publishing, 2025, pp. 1–18.
[13] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, J. Ortega-Garcia, Deepfakes and beyond: A
survey of face manipulation and fake detection, Information Fusion 64 (2020) 131–148.
[14] P. Yu, Z. Xia, J. Fei, Y. Lu, A survey on deepfake video detection, Iet Biometrics 10 (2021) 607–624.
[15] A. Malik, M. Kuribayashi, S. M. Abdullahi, A. N. Khan, Deepfake detection for human face images
and videos: A survey, Ieee Access 10 (2022) 18757–18775.
[16] A. Heidari, N. Jafari Navimipour, H. Dag, M. Unal, Deepfake detection using deep learning
methods: A systematic and comprehensive review, Wiley Interdisciplinary Reviews: Data Mining
and Knowledge Discovery 14 (2024) e1520.
[17] M. S. Rana, M. N. Nobi, B. Murali, A. H. Sung, Deepfake detection: A systematic literature review,
IEEE access 10 (2022) 25494–25513.
[18] B. Kaddar, S. A. Fezza, Z. Akhtar, W. Hamidouche, A. Hadid, J. Serra-Sagristà, Deepfake detection
using spatiotemporal transformer, ACM Transactions on Multimedia Computing, Communications
and Applications (2024).
[19] Y. Nirkin, L. Wolf, Y. Keller, T. Hassner, Deepfake detection based on discrepancies between faces
and their context, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2021)
6111–6121.
[20] T. Zhang, Deepfake generation and detection, a survey, Multimedia Tools and Applications 81
(2022) 6259–6276.
[21] Y. Patel, S. Tanwar, R. Gupta, P. Bhattacharya, I. E. Davidson, R. Nyameko, S. Aluvala, V. Vimal,
Deepfake generation and detection: Case study and challenges, IEEE Access (2023).
[22] B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, C. C. Ferrer, The deepfake detection
challenge (dfdc) dataset, arXiv preprint arXiv:2006.07397 (2020).
[23] J. W. Seow, M. K. Lim, R. C. Phan, J. K. Liu, A comprehensive overview of deepfake: Generation,
detection, datasets, and opportunities, Neurocomputing 513 (2022) 351–371.
[24] M. Weerawardana, T. Fernando, Deepfakes detection methods: A literature survey, in: 2021 10th
International Conference on Information and Automation for Sustainability (ICIAfS), IEEE, 2021,
pp. 76–81.
[25] A. Chadha, V. Kumar, S. Kashyap, M. Gupta, Deepfake: an overview, in: Proceedings of second
international conference on computing, communications, and cyber-security: IC4S 2020, Springer,
2021, pp. 557–566.
[26] R. U. Maheshwari, B. Paulchamy, B. K. Pandey, D. Pandey, Enhancing sensing and imaging
capabilities through surface plasmon resonance for deepfake image detection, Plasmonics (2024)
1–20.
[27] M. Javed, Z. Zhang, F. H. Dahri, A. A. Laghari, Real-time deepfake video detection using eye
movement analysis with a hybrid deep learning approach, Electronics 13 (2024) 2947.
[28] R. U. Maheshwari, S. Kumarganesh, S. KVM, A. Gopalakrishnan, K. Selvi, B. Paulchamy,
P. Rishabavarthani, K. M. Sagayam, B. K. Pandey, D. Pandey, Advanced plasmonic resonance-
enhanced biosensor for comprehensive real-time detection and analysis of deepfake content,
Plasmonics (2024) 1–18.
[29] A. Heidari, N. J. Navimipour, H. Dag, S. Talebi, M. Unal, A novel blockchain-based deepfake
detection method using federated and deep learning models, Cognitive Computation (2024) 1–19.
[30] Y. Gao, X. Wang, Y. Zhang, P. Zeng, Y. Ma, Temporal feature prediction in audio–visual deepfake
detection, Electronics 13 (2024) 3433.
[31] M. A. Arshed, S. Mumtaz, M. Ibrahim, C. Dewi, M. Tanveer, S. Ahmed, Multiclass ai-generated
deepfake face detection using patch-wise deep learning model, Computers 13 (2024) 31.
[32] S. Kingra, N. Aggarwal, N. Kaur, Sformer: An end-to-end spatio-temporal transformer architecture
for deepfake detection, Forensic Science International: Digital Investigation 51 (2024) 301817.
[33] A. Almestekawy, H. H. Zayed, A. Taha, Deepfake detection: Enhancing performance with
spatiotemporal texture and deep learning feature fusion, Egyptian Informatics Journal 27 (2024)
100535.
[34] L. Guarnera, O. Giudice, S. Battiato, Level up the deepfake detection: a method to effectively
discriminate images generated by gan architectures and diffusion models, in: Intelligent Systems
Conference, Springer, 2024, pp. 615–625.