<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>DFDA: An Analysis of Deep Learning Models to Detect Deepfake Videos</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Munleef Bhat</string-name>
          <email>munleefbhat@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prateek Agrawal</string-name>
          <email>dr.agrawal.prateek@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Charu Gupta</string-name>
          <email>charu.wa1987@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Bhagwan Parshuram Institute of Technology</institution>
          ,
          <addr-line>Delhi</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science and Engineering, Lovely Professional University</institution>
          ,
          <addr-line>Punjab</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Shree Guru Gobind Singh Tricentenary University</institution>
          ,
          <addr-line>Gurugram</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>2</volume>
      <fpage>9</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>Emerging technology dubbed Generative Artificial Intelligence (AI) has facilitated the fabrication of counterfeit videos and images that exhibit striking realism. This presents a significant apprehension as these simulated visuals, referred to as DeepFakes, possess the capacity to disseminate disinformation and ensnare individuals with ease. To confront this issue, scholars are employing sophisticated computational algorithms and methodologies to identify DeepFakes. This manuscript provides a comprehensive examination of the strategies employed for DeepFake detection. It delves into the integration of diverse forms of media (such as images, videos, and speech) with machine learning to discern counterfeit content. Additionally, it discusses the pivotal datasets utilized by researchers for evaluating their DeepFake detecvideos, and even fabricated vocalizations. They revealed that amalgamating distinct techniques, such as integrating images and videos or employing varied machine learning methodologies, can yield highly eficacious results in DeepFake detection. Moreover, the paper ofers recommendations for prospective investigations to enhance DeepFake detection, thus fostering a safer cyberspace. Additionally, it introduces a novel dataset termed Celeb-DF, comprising numerous high-fidelity counterfeit videos featuring renowned personalities. This dataset is crafted to facilitate researchers in refining their techniques for detecting DeepFakes. In essence, this paper endeavours to augment ongoing endeavors to mitigate the proliferation of counterfeit content online by enhancing our capacity to identify DeepFakes.</p>
      </abstract>
      <kwd-group>
        <kwd>Deepfake</kwd>
        <kwd>machine learning</kwd>
        <kwd>generative AI</kwd>
        <kwd>secure society</kwd>
        <kwd>equality</kwd>
        <kwd>deep learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Fueled by AI and deep learning advancements, DeepFakes [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] have emerged as a powerful
instrument for manipulating digital media with unparalleled realism. Their capacity to seamlessly
integrate faces into existing footage has triggered concerns across various sectors, owing to
their ability to easily mislead unsuspecting viewers. These synthetic videos, often
indistinguishable from genuine recordings, present a multifaceted threat to societal trust, personal privacy,
and the credibility of information dissemination platforms. In the political arena, DeepFakes
CEUR
Workshop
Proceedings
      </p>
      <p>
        ceur-ws.org
ISSN1613-0073
have the potential to disrupt democratic processes by fabricating speeches or events, fostering
uncertainty, and undermining public trust in institutions. Socially, they could instigate turmoil
or perpetuate detrimental stereotypes by falsely attributing statements or deeds to public
figures. Financially, malicious entities might exploit DeepFakes for fraudulent purposes, such as
impersonating executives or manipulating stock prices through fabricated announcements [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
From a legal standpoint, the proliferation of DeepFakes poses intricate inquiries regarding
authenticity, liability, and the boundaries of free expression. Courts may encounter dificulties
in adjudicating cases involving digitally manipulated evidence, while individuals could find
themselves unfairly targeted or defamed by synthetic content. Moreover, the malevolent
exploitation of DeepFakes for generating illicit material, such as non-consensual pornography
or grooming vulnerable individuals, underscores the pressing necessity for robust protections
and legal frameworks. The potential psychological harm inflicted on victims and the erosion of
trust in digital communications necessitate prompt and decisive action from policymakers, law
enforcement agencies, and technology firms alike. As society wrestles with the ramifications of
this disruptive technology, interdisciplinary collaboration and continuous research are
imperative to devise efective countermeasures. Ethical considerations should steer the responsible
development and deployment of AI tools, ensuring that technological advancements serve the
collective welfare rather than morphing into instruments of mass deceit. Generally, humans
struggle to discern between authentic videos and DeepFakes unaided by technology. DeepFakes
are generated using a blend of techniques like merging, substituting, and overlaying images and
video clips, crafting exceedingly realistic yet ultimately counterfeit content. Utilizing advanced
AI techniques like DeepFakes and computational adversarial networks (GANs) have progressed
to incorporate audio, further amplifying their authenticity. Recent Analyzing longitudinal and
time data in video and audio formats, together with spatial and time data in pictures, is part
of the process of trying to identify altered information. To propel the frontiers of DeepFake
detection, researchers have made benchmarking datasets publicly accessible. By amalgamating
these datasets with existing techniques, cutting-edge methods now leverage information fusion
to robustly identify counterfeit media. While numerous surveys delve into DeepFake detection
(DFD), elucidating advancements and hurdles, this paper zooms in on media modality fusion in
DFD, supplementing existing critiques. It explores contemporary approaches in DFD, citing
pertinent studies and benchmarking datasets alongside their findings. Additionally, it
deliberates on challenges and potential future trajectories to further elevate the condition of DeepFake
identification at the moment.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Datasets</title>
      <p>
        In the contemporary digital landscape, the notion of ”Digital Transformation” has garnered
global attention, ofering a plethora of lucrative applications spanning from facial recognition
systems to centralized data management and intelligent automation. This transformation
harnesses advanced technologies to streamline everyday human tasks and bolster eficiency. Since
2018, there has been a noticeable uptick in the advancement of contemporary generative models,
particularly in vision-related realms such as facial and frame synthesis, as well as tone
synthesis. Acknowledging the potential harm posed by manipulated images and videos, numerous
multinational corporations (MNCs) and academic institutions have taken the lead in crafting
their own synthesized datasets tailored specifically for identifying fraudulent media using
deeplearning-based methodologies. Maintaining benchmark datasets regularly updated with diverse
and evolving DeepFake content is imperative to ensure that detection models undergo rigorous
testing against a broad range of manipulative strategies. Assessment measures are included
in each benchmark dataset and research publication to help determine their dependability.
for subsequent comparison with enhanced iterations of DeepFake detection algorithms [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Although these datasets undergo meticulous training and testing phases, benchmarking serves
to demonstrate the improved eficacy of new or older approaches to detection on the updated
dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        In 2018, the collective count of DeepFake videos tallied at 3,038, encompassing 1,669
counterfeit videos and 1,369 authentic videos. By 2020, this number surged to 188,154 videos,
comprising 114,500 counterfeit videos and 73,654 authentic videos [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Remarkably, The UADFV
collection was the lowest in scale, whilst the DeepFake is Detecting Challenge (DFDC)-Full
Dataset was the largest repository for DeepFake databases. The size of reference databases
for DeepFake detection continued to expand, with each dataset surpassing 100,000 videos by
2023, inclusive of the DF-Platter dataset. Researchers commonly employed both historical and
contemporary benchmark datasets to assess DeepFake detection eficacy, ensuring equitable
and comprehensive evaluations across various studies [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Deepfake detection in images and videos</title>
      <p>
        In a general sense DeepFake methods for identification may be divided into two primary
groups: frame forgery assessment for picture recognition and behavioral and geographic
analysis for video classification. Li et.al [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] introduced a new method for discerning altered
faces in images or videos. Their methodology cantered on a crucial element of human facial
behaviour—the frequency of eye blinking—to verify physiological cues frequently absent in
synthesized fraudulent videos, as depicted in Fig. 1
      </p>
      <p>
        In this investigation, scholars explored the blinking frequencies of eyes in genuine videos
versus DeepFake videos, formulating an innovative approach for DeepFake detection. The
results suggested that irregular blinking rates could indicate a fabricated or synthetic video.
Fig, 1 illustrates the meticulous examination of eye blinking across frames in both authentic
and DeepFake videos, utilizing computations based on the average interval between successive
blinks and the average duration of blinks to ascertain authenticity. This technique comprises
two stages: (a) facial recognition through image or frame analysis, identification of facial
landmarks, alignment of faces, and extraction of the eye area, and (b) counting eye blinks by
using features from the first stage of a long-term recurrent neuronal network (LRCN), as shown
in Fig. 2. Afchar et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] focused on using deep learning-inspired systems for detecting to
examine mesoscopic features of pictures. They presented two recognition approaches with two
diferent function of activation, namely Meso-4 and MesoInception-4. To support expansion,
the authors used four layers of pools and convolutions in Meso-4, which was followed by a large
network with an activation function for Rectified Linear Units (ReLU), as shown in Fig. 3. On
the other hand, the MesoInception-4 architecture, which was used to evaluate the Face2Face and
DeepFake datasets, replaced the original convolution layers with inception models, as shown in
Figure 4. Excellent detection rates were demonstrated by the results, which achieved 98% for
the DeepFake technology information and 95% for the Face2Face datasets.
      </p>
      <p>
        Hinton et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] introduced an innovative capsule architecture to overcome the A
convolutional neuro networks’ (CNNs’) restrictions. Building upon this concept, Nguyen et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
extended its application to detect a variety of image and video forgeries, including replay attacks,
utilizing Convolutional Neural Networks (CNNs) and Capsule Networks. They integrated They
use maxing out expectations methods and anticipate trafic into their system. In their
methodology, video streams are initially divided into frames, followed by face detection, extraction,
and resizing. The extracted faces are then processed in order to derive latent characteristics
that are used as inputs by the Capsule Network via the VGG-19 system for forgery detection.
Post-processing involves calculating average probabilities, resulting in a detection rate of 99.23%
at the image level and 95.93% at every frame region for the DeepFake Set. A pipeline that is
computerised was developed by Rossler et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to identify fake faces., employing tracking
algorithms to trace human faces in videos or images. Subsequently, faces are analyzed by
various classifiers to identify forgery, achieving high precision across multiple DeepFake datasets.
Dolhansky et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] used the DeepFake Detection Challenge (DFDC) sample to develop three
methods for detecting using various characteristics. including TamperNet, a lightweight DNN
model, designed for identifying acute level manipulations and digitally fabricated images. They
achieved high accuracy on DeepFake and digitally fabricated images.
      </p>
      <p>
        In the subsequent approach, two additional detection models were deployed utilizing
XceptionNet on both facial and complete image datasets for forensic examination. These frame-oriented
models implemented two levels: a per-frame detector limitation and one related corresponding
to the video’s captured frames per secondly. dictating the number of frames required to surpass
the cutof point per frames for categorizing a footage as counterfeit. During validation,
maximizing log-WP across each fold unveiled optimal recall reminders of -3.352 for XceptionNet
(full), -2.14 for XceptionNet (facial), and -3.044 for TamperNet. Korshunov et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] employed
the VID-TIMIT dataset to fabricate DeepFake videos using open-source GAN-based software,
focusing on the influence of criteria for training and mixing on video fidelity. They generated low
and high-quality renditions for each subject and showcased that cutting-edge facial recognition
algorithms, grounded on VGG and FaceNet, were susceptible to DeepFake videos, displaying
false acceptance rates of up to 95.00%. In an audio-visual integrated system, feature extraction
precedes the classification of altered videos from authentic ones making use of a two-classifier
framework. A comparable approach was used by Chugh et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], who used auditory properties
called Mel-frequency cepstral factors (MFCC). and distances between mouth landmarks as visual
features. Digital presentation attacks in DeepFake videos comprised PCA, LDA, IQMs, and
SVMs. Feature blocks underwent dimensionality reduction via PCA before being fed to LSTM to
distinguish altered from unaltered videos. Kaur et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] assessed basic face-swap recognition
techniques, noting the shortcomings of lip-sync-based methods in detecting disparities between
lip movements and speech. They also shown how a Support Vector Machine (SVM) classifier in
conjunction with image quality evaluations could identify DeepFake movies of superior quality
having a comparably high error rate of 8.97%. A statistical technique based on hypothesis
testing was developed by Agarwal and Varshney [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to detect dishonest or faked content in
photographs. As part of their approach, they had to calculate a mathematical threshold value
that matched the error likelihood of identifying real or GAN-generated pictures.. Lyu [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
underscored the dificulties in identifying DeepFakes in audio recordings and high-definition
videos that have been definition-synthesized. The writer highlighted concerns regarding the
incapacity of current DeepFake generation techniques to precisely map hues of hair in
relation to humansface. Considering the aforementioned research, this paper furnishes a succinct
overview of the proposed DeepFake detection framework and stresses the necessity for future
advancements in DeepFake detection methodologies. The author advocates for an adversarial
perturbation-enabled model that reduces dependence on face detectors based on DNNs. The
proposed detection methodology consists of two stages: an AI system for DeepFake detection
and a face detection phase with adversarial disruption. A comparison of the eficacy of several
deep learning (DL) algorithms was conducted by Kumar et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. in DeepFake classification
employing metric learning. They utilized a Multitask Cascaded Convolutional Neural network
(MTCNN), which consists of three networks—a proposal network, a refining the internet, and
output networks—for the purpose of extracting faces from pictures or recordings. Xception
architecture facilitated transfer learning, while sequence classification employed LSTM and 3D
convolution alongside a triplet network for metric learning. The triplet network determined
how many frames a video clip has., comparing it to authentic video frames to assess realism.
Given the spacing between the triplets, three diferent triplet-generation techniques were
investigated: easy, lightly, and hard ones. embedding vectors. The proposed architecture utilized
XceptionNet and MTCNN, with FaceNet for facial detection and feature extraction. With triplet
loss, semi-hard triplets were able to discern amongst phony and real frames, achieving an
AUC score of 99.2% on Celeb-DF and 99.71% precision on extremely dense neuron texture.
data. For the purpose of recognizing fake movies, Mittal et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] presented a deep-learning
network structure that was influenced by the networks of Siamese rats and triplet loss. In
their study, Mittal et al. evaluated the model’s performance using the AUC measure on the
DF-TIMIT and DFDC large-scale DFD samples. Their methodology attained a per-video AUC
of 84.4% on the DFDC datasets and 96.6% on the DF-TIMIT information set, which is better
than numerous state-of-the-art (SOTA) DFD approaches like Two-Stream, MesoNet, HeadPose,
FWA, VA, Xception, Multi-task, Capsules, and DSP-FWA. Interestingly, it’s the first technique
to use video and audio hybrid modality at the same time. for DeepFake detection. In their
study, they elucidated the correlation between audio and visual paradigms taken from the
same footage, using speaking and face attributes identified as Sreal and Freal, correspondingly.
Important characteristics were extracted from verbal and visual faces using Open-Face and
PyAudio analysis. Huang et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] introduced a counterfeit uperficial reconstruction technique,
devoid polishing technique that needs after-processing of prior knowledge about the GAN,
efectively bypassing existing state-of-the-art (SOA) techniques for detection. As of right now,
GAN-based picture generating techniques frequently leave artifact patterns in synthesized
images due to inherent limitations. To tackle this issue, the authors proposed techniques to
identify and diminish such artifact patterns. Their approach entails training a dictionary model
to capture genuine image motifs and using limited coding with linear projected for storing
DeepFake pictures in a low-dimensional region. The DeepFake is image’s artifact-free version’s
cursory repair subsequently minimizes artifact patterns. Evaluation involved testing against
three SOA DFD methods—GANFingerprint, DCTA, and CNNDetector—along with 16 well-liked
GAN-based methods for creating phony pictures to determine an image’s legitimacy. A DFD
technique based on a two-stage network structure was presented by Masi et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] with the
goal of separating digitally changed faces by enhancing artifacts and reducing high-level face
information. This methodology is illustrated in Fig. 4.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Audio Modality Fusion in Deepfake detection</title>
      <p>Comparable to DeepFakes in pictures and videos, audio material manipulation presents a
significant hurdle for researchers in distinguishing genuine from counterfeit audio. An impactful
incident in mid-2019 involved criminals utilizing A machine learning program to imitate the
speaking tone of a CEO and carry out a 243,000 USD fraudulent transfer. Systems that
automate identification of speakers (ASV) are especially vulnerable to attacks including voice
the conversion (VC), audio phishing, replay, and speech synthesizer (SS). which are exploited
for illicit purposes. Advancements in SS and VC techniques have markedly complicated the
diferentiation between counterfeit and authentic speech, exacerbating the menace posed by
synthetic audio and DeepFakes. This heightens the risk of misinformation influencing emotions
and viewpoints, potentially culminating in organized and detrimental actions founded on false
Initial perceptions. Engineers have combined ASV approaches with audio spoofed detection
systems that use defensive measures rankings to discriminate between real and fake speech in
order to counteract SS and VC assaults..</p>
    </sec>
    <sec id="sec-5">
      <title>5. Deepfake detection methods</title>
      <p>
        Li et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] delved into the advancements in computational modeling and face location
recognition integration like GANs and VAEs, which have markedly elevated the realism of DeepFakes
in both visuals and videos. Following suit, Cozzolino et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] demonstrated the utilization of
DeepFake technology in forensic inquiries, employing to scrutinize categorization frameworks
constructed with data mining visual anomalies and disparities. They underscored the eficacy
of temporal amalgamation of convolutional representations and deep learning methodologies
in identifying DeepFakes. Vignesh et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] tackled the escalating menace posed by DeepFake
videos, adeptly portraying contrived scenarios or personalities. The increasingly intricate
generation processes of DeepFake videos present hurdles in place of conventional detecting methods.
The authors proposed a multi-attentional method that combines mechanisms of self-control,
focus on space, and temporal attentiveness to overcome this. This tactic enables the model to
focus on important areas and motifs while ignoring irrelevant information, allowing for the
eficient extraction of local as well as international context-specific data from films.. By
combining these attention processes, the suggested DeepFake detector model recognizes antiques,
contradictions, or unusual patterns suggestive of DeepFake tampering. The model examines
chronological and visual clues, including motion sequences, eye movements, and smiles in
order to make decisions. The authors stressed how crucial it is to train the multi-attentional
DeepFake detect model using large datasets that include a variety of DeepFake versions. By
using this approach, the model becomes more resilient to new manipulation techniques and
more broadly applicable.Transfer acquisition and domain-adaptation approaches may be utilized
by the model to achieve superior performance across many DeepFake video formats. However,
the authors pointed out that DeepFake detection techniques and manufacturing processes are
still in combat with one another. They emphasized the necessity of ongoing study, creativity
and and cooperation between academic institutions, business, and governmental bodies in
order to remain ahead of hostile actors and guarantee the establishment of eficient DeepFake
detectors entities. Mittal et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] underscored the significance of incorporating audio cues in
DeepFake detection, as discrepancies between audio and visual elements frequently arise in
manipulated videos.. They suggested a combined audio-visual DeepFake detection method that
simultaneously analyzes both forms in order to solve this. The model makes use of
convolutional artificial neural networks (CNNs) and deep mining to gather characteristics in visual data.
audio using spectrogram analysis, capturing facial expressions and speech patterns, respectively.
These features are fused using using awareness or synthesis techniques to produce a shared
description fed into a classification model. Training on diverse datasets enhances the model’s
ability to detect various DeepFake variations, resulting in improved precision and resilience to
alteration methods.
      </p>
      <p>
        In 2022, Varma and Rattani [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] addressed gender bias in DeepFake datasets by introducing
the Gender-Balanced DeepFake (GBDF) dataset, tailored for Face-in-Video (FIR) DeepFake
detection. This dataset aims to rectify the gender imbalance, crucial for unbiased model performance.
GBDF encompasses diverse subjects, facial emotions, backdrops, and perspective adjustments
showcasing real and DeepFake footage produced using a variety of editing methods. The paper
delineated the collection and curation process of GBDF, ensuring It provides a thorough and
well-rounded collection for studies regarding DeepFake diagnosis. It discussed annotation
methods, data preprocessing, and challenges encountered in building a gender-balanced dataset.
Experimental evaluations showcased GBDF’s eficacy demonstrating the advantages of a
balanced gender in improving detection eficiency and resilience in the training and assessment of
DeepFake detectors.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The above sections discussing image and video features in DeepFake detection research have
shed light on the progress made by researchers since late 2017. While significant strides have
been taken to refine existing models, there remains considerable scope for further research to
improve the inspection of the pipeline’s cost, efectiveness, and performance -efectiveness, and
practical applicability in real-world contexts. A major challenge faced by DeepFake detection
models is their limited ability to become less successful in situations where there are
diferences in lighting, face phrases, because and video quality. This is because the model cannot
extrapolate across distinct datasets. Additionally, the presence of ”unseen classes” in testing
datasets compared to training datasets presents another hurdle. To tackle these challenges,
researchers are exploring various methods, including incorporating concentration processes,
using transferred information from learned models, and expanding training sets with a variety
of specimens to improve conversion skills. These endeavors are aimed at driving forward the
progress of DeepFake detection technology and its utility in real-world applications.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Tolosana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vera-Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Morales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ortega-Garcia</surname>
          </string-name>
          ,
          <article-title>Deepfakes and beyond: A survey of face manipulation and fake detection</article-title>
          ,
          <source>Information Fusion</source>
          <volume>64</volume>
          (
          <year>2020</year>
          )
          <fpage>131</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Narayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Thakral</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vatsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Df-platter: multiface heterogeneous deepfake dataset</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>9739</fpage>
          -
          <lpage>9748</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tariq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Woo</surname>
          </string-name>
          ,
          <article-title>A convolutional lstm based residual network for deepfake video detection</article-title>
          , arXiv preprint arXiv:
          <year>2009</year>
          .
          <volume>07480</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Varshney</surname>
          </string-name>
          ,
          <article-title>Limits of deepfake detection: A robust estimation viewpoint</article-title>
          , arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>03493</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <article-title>Deepfake detection: Current challenges and next steps</article-title>
          ,
          <source>in: 2020 IEEE international conference on multimedia &amp; expo workshops (ICMEW)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhavsar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <article-title>Detecting deepfakes with metric learning</article-title>
          ,
          <source>in: 2020 8th international workshop on biometrics and forensics (IWBF)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mittal</surname>
          </string-name>
          , U. Bhattacharya,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chandra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Manocha</surname>
          </string-name>
          ,
          <article-title>Emotions don't lie: An audio-visual deepfake detection method using afective cues</article-title>
          ,
          <source>in: Proceedings of the 28th ACM international conference on multimedia</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>2823</fpage>
          -
          <lpage>2832</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rössler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cozzolino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Verdoliva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Riess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thies</surname>
          </string-name>
          , M. Nießner, FaceForensics++:
          <article-title>Learning to detect manipulated facial images</article-title>
          ,
          <source>in: International Conference on Computer Vision</source>
          (ICCV),
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rossler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cozzolino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Verdoliva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Riess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nießner</surname>
          </string-name>
          , Faceforensics++:
          <article-title>Learning to detect manipulated facial images</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF international conference on computer vision</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Dolhansky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Howes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pflaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Baram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Ferrer</surname>
          </string-name>
          ,
          <article-title>The deepfake detection challenge (dfdc) preview dataset</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>08854</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Face x-ray for more general face forgery detection</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>5001</fpage>
          -
          <lpage>5010</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.-C.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lyu</surname>
          </string-name>
          , In ictu oculi:
          <article-title>Exposing ai created fake videos by detecting eye blinking</article-title>
          ,
          <source>in: 2018 IEEE International workshop on information forensics and security (WIFS)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Korshunov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marcel</surname>
          </string-name>
          ,
          <article-title>Vulnerability assessment and detection of deepfake videos</article-title>
          ,
          <source>in: 2019 International Conference on Biometrics (ICB)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>K.</given-names>
            <surname>Chugh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dhall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          ,
          <article-title>Not made for each other-audio-visual dissonance-based deepfake detection and localization</article-title>
          ,
          <source>in: Proceedings of the 28th ACM international conference on multimedia</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>439</fpage>
          -
          <lpage>447</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kumaraguru</surname>
          </string-name>
          ,
          <article-title>Deepfakes: temporal sequential analysis to detect faceswapped video clips using convolutional long short-term memory</article-title>
          ,
          <source>Journal of electronic imaging 29</source>
          (
          <year>2020</year>
          )
          <fpage>033013</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Vakhshiteh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramachandra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nickabadi</surname>
          </string-name>
          ,
          <article-title>Threat of adversarial attacks on face recognition: A comprehensive survey</article-title>
          , arXiv preprint arXiv:
          <year>2007</year>
          .
          <volume>11709</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Juefei-Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Miao</surname>
          </string-name>
          , Y. Liu, G. Pu, Fakepolisher:
          <article-title>Making deepfakes more detection-evasive by shallow reconstruction</article-title>
          ,
          <source>in: Proceedings of the 28th ACM international conference on multimedia</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1217</fpage>
          -
          <lpage>1226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>I.</given-names>
            <surname>Masi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Killekar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Mascarenhas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Gurudatt</surname>
          </string-name>
          , W. AbdAlmageed,
          <article-title>Two-branch recurrent network for isolating deepfakes in videos</article-title>
          , in: Computer Vision-ECCV
          <year>2020</year>
          : 16th European Conference, Glasgow, UK,
          <year>August</year>
          23-
          <issue>28</issue>
          ,
          <year>2020</year>
          , Proceedings,
          <source>Part VII 16</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>667</fpage>
          -
          <lpage>684</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>Advancing high fidelity identity swapping for forgery detection</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>5074</fpage>
          -
          <lpage>5083</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>R. U*</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M</given-names>
            ,
            <surname>R. Vignesh</surname>
          </string-name>
          <string-name>
            <surname>K</surname>
          </string-name>
          , T. K,
          <article-title>Deepfake video forensics based on transfer learning, 2020</article-title>
          . URL: http://dx.doi.org/10.35940/ijrte.F9747.038620. doi:
          <volume>10</volume>
          .35940/ijrte.f9747.
          <fpage>038620</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Nadimpalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rattani</surname>
          </string-name>
          ,
          <article-title>On improving cross-dataset generalization of deepfake detectors</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2204</volume>
          .
          <fpage>04285</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>