<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>DeepFakesON-Phys: DeepFakes Detection based on Heart Rate Estimation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Javier Hernandez-Ortega</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruben Tolosana</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julian Fierrez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aythami Morales</string-name>
          <email>aythami.moralesg@uam.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Biometrics</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Data Pattern Analytics Lab - BiDA Lab</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Autonoma de Madrid</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This work introduces a novel DeepFake detection framework based on physiological measurement. In particular, we consider information related to the heart rate using remote photoplethysmography (rPPG). rPPG methods analyze video sequences looking for subtle color changes in the human skin, revealing the presence of human blood under the tissues. In this work we investigate to what extent rPPG is useful for the detection of DeepFake videos. The proposed fake detector named DeepFakesON-Phys uses a Convolutional Attention Network (CAN), which extracts spatial and temporal information from video frames, analyzing and combining both sources to better detect fake videos. DeepFakesONPhys has been experimentally evaluated using the latest public databases in the field: Celeb-DF and DFDC. The results achieved, above 98% AUC (Area Under the Curve) on both databases, outperform the state of the art and prove the success of fake detectors based on physiological measurement to detect the latest DeepFake videos.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        DeepFakes have become a great public concern
recently
        <xref ref-type="bibr" rid="ref4 ref7">(Citron 2019; Cellan-Jones 2019)</xref>
        . The very popular
term “DeepFake” is usually referred to a deep learning based
technique able to create fake videos by swapping the face of
a person by the face of another person. This type of
digital manipulation is also known in the literature as Identity
Swap, and it is moving forward very fast
        <xref ref-type="bibr" rid="ref22 ref30 ref34 ref40 ref41">(Tolosana et al.
2020b)</xref>
        .
      </p>
      <p>
        Currently, most face manipulations are based on
popular machine learning techniques such as AutoEncoders
(AE)
        <xref ref-type="bibr" rid="ref23">(Kingma and Welling 2013)</xref>
        and Generative
Adversarial Networks (GAN)
        <xref ref-type="bibr" rid="ref13">(Goodfellow et al. 2014)</xref>
        , achieving in
general very realistic visual results, specially in the latest
generation of public DeepFakes
        <xref ref-type="bibr" rid="ref16 ref22 ref30 ref34 ref40 ref41 ref6">(Tolosana et al. 2020a)</xref>
        , and
the present trends
        <xref ref-type="bibr" rid="ref22 ref30 ref34">(Karras et al. 2020)</xref>
        . However, and despite
the impressive visual results, are current face manipulations
also considering the physiological aspects of the human
being in the synthesis process?
      </p>
      <p>
        Physiological measurement has provided very
valuable information to many different tasks such as
elearning
        <xref ref-type="bibr" rid="ref15 ref16 ref22 ref30 ref34 ref6">(Hernandez-Ortega et al. 2020a)</xref>
        , health care
        <xref ref-type="bibr" rid="ref29">(McDuff et al. 2015)</xref>
        , human-computer interaction
        <xref ref-type="bibr" rid="ref39">(Tan and
Nijholt 2010)</xref>
        , and security
        <xref ref-type="bibr" rid="ref27">(Marcel et al. 2019)</xref>
        , among many
other tasks.
      </p>
      <p>
        In physical face attacks, a.k.a. Presentation Attacks (PAs),
real subjects are often impersonated using artifacts such
as photographs, videos, and masks
        <xref ref-type="bibr" rid="ref27">(Marcel et al. 2019)</xref>
        .
Face recognition systems are known to be vulnerable against
these attacks unless proper detection methods are
implemented
        <xref ref-type="bibr" rid="ref11 ref12">(Galbally, Marcel, and Fierrez 2014;
HernandezOrtega et al. 2019)</xref>
        . Some of these detection methods are
based on liveness detection by using information such as
eye blinking or natural facial micro-expressions (Bharadwaj
et al. 2013). Specifically for detecting 3D mask
impersonation, which is one of the most challenging type of attacks,
detecting pulse from face videos using remote
photoplethysmography (rPPG) has shown to be an effective
countermeasure (Hernandez-Ortega et al. 2018). When applying this
technique to a video sequence with a fake face, the estimated
heart rate signal is significantly different to the heart rate
extracted from a real face
        <xref ref-type="bibr" rid="ref11 ref12">(Erdogmus and Marcel 2014)</xref>
        .
      </p>
      <p>Seeing the good results achieved by rPPG techniques
when dealing with physical 3D face mask attacks, and since
DeepFakes are digital manipulations somehow similar to
them, in this work we hypothesize that fake detectors based
on physiological measurement can also be used against
DeepFakes after adapting them properly. DeepFake
generation methods have historically tried to mimic the visual
appearance of genuine faces. However, to the best of our
knowledge, they do not emulate the physiology of human
beings, e.g., heart rate, blood oxygenation, or breath rate,
so estimating that type of signals from the video could be a
powerful tool for the detection of DeepFakes.</p>
      <p>The novelty of this work consists in using rPPG
features previously learned for the task of heart rate
estimation and adapting them for the detection of DeepFakes by
means of a knowledge-transfer process, thus obtaining a
novel fake detector based on physiological measurement
named DeepFakesON-Phys. In particular, the information
related to the heart rate is considered to decide whether a
video is real or fake. Our physiological detector intends to be
a robust solution to the weaknesses of most state-of-the-art
DeepFake detectors based on the visual features existing in
fake videos (Matern, Riess, and Stamminger 2019; Agarwal
Output Score</p>
      <p>[0,1]
Conv. 2D +
Tanh activation
Conv. 2D +
Sigmoid activation
Average Pooling
Convolutional</p>
      <p>Attention
Network
32x3x3 32x3x3 Pool Size 2</p>
      <p>Element-wise
Multiplication</p>
      <p>
        Element-wise
64x3x3 Pool Size 2Multiplication
1x1
1x1
32x3x3 32x3x3 Pool Size 2
64x3x3 Pool Size 2
Preprocessing
Appearance Model
and Farid 2019) and also on the artifacts/fingerprints inserted
during the synthesis process
        <xref ref-type="bibr" rid="ref22 ref30 ref34">(Neves et al. 2020)</xref>
        , which are
highly dependent on a specific fake manipulation technique.
      </p>
      <p>
        DeepFakesON-Phys is based on DeepPhys
        <xref ref-type="bibr" rid="ref14 ref5">(Chen and
McDuff 2018)</xref>
        , a deep learning model trained for heart
rate estimation from face videos based on rPPG. DeepPhys
showed high accuracy even when dealing with challenging
conditions such as heterogeneous illumination or low
resolution, outperforming classic hand-crafted approaches. We
used the architecture of DeepPhys, but making changes to
make it suitable for DeepFake detection. We initialized the
weights of the layers of DeepFakesON-Phys with the ones
from DeepPhys (meant for heart rate estimation based on
rPPG) and we adapted them to the new task using
finetuning. This process allowed us to train our detector without
the need of a high number of samples (compared to training
it from scratch). Fine-tuning also helped us to obtain a model
that detects DeepFakes by looking to rPPG related features
from the images in the face videos.
      </p>
      <p>
        In this context, the main contributions of our work are:
• An in-depth literature review of DeepFake detection
approaches with special emphasis to physiological
techniques, including the key aspects of the detection systems,
the databases used, and the main results achieved.
• An approach based on physiological measurement to
detect DeepFake videos: DeepFakesON-Phys1. Fig. 1
graphically summarizes the proposed fake detection
approach based on the original architecture DeepPhys
        <xref ref-type="bibr" rid="ref14 ref5">(Chen
and McDuff 2018)</xref>
        , a Convolutional Attention Network
(CAN) composed of two parallel Convolutional Neural
Networks (CNN) able to extract spatial and temporal
information from video frames. This architecture is adapted
for the detection of DeepFake videos by means of a
knowledge-transfer process.
1https://github.com/BiDAlab/DeepFakesON-Phys
• A thorough experimental assessment of the
proposed DeepFakesON-Phys, considering the latest public
databases of the 2nd DeepFake generation such as
CelebDF v2 and DFDC Preview. DeepFakesON-Phys achieves
high-accuracy results, outperforming the state of the art.
In addition, the results achieved prove that current face
manipulation techniques do not pay attention to the
heartrate-related physiological information of the human being
when synthesizing fake videos.
      </p>
      <p>The remainder of the paper is organized as
follows. Related Works summarizes previous studies
focused on the detection of DeepFakes. Proposed</p>
      <sec id="sec-1-1">
        <title>Method: DeepFakesON-Phys describes the proposed</title>
        <p>DeepFakesON-Phys fake detection approach. Databases
summarizes all databases considered in the experimental
framework of this study. Experiments describes the
experimental protocol and the results achieved in comparison with
the state of the art. Finally, Conclusions draws the final
conclusions and points out future research lines.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>
        Different approaches have been proposed in the literature
to detect DeepFake videos. Table 1 shows a comparison of
the most relevant approaches in the area, paying special
attention to the fake detectors based on physiological
measurement. For each study we include information related to
the method, classifiers, best performance, and databases for
research. It is important to remark that in some cases,
different evaluation metrics are considered, e.g., Area Under
the Curve (AUC) and Equal Error Rate (EER), which
complicate the comparison among studies. Finally, the results
highlighted in italics indicate the generalization ability of
the detectors against unseen databases, i.e., those databases
were not considered for training. Most of these results are
extracted from
        <xref ref-type="bibr" rid="ref22 ref26 ref30 ref34">(Li et al. 2020)</xref>
        .
      </p>
      <p>
        The first studies in the area focused on the visual
artiStudy
        <xref ref-type="bibr" rid="ref2 ref25 ref28">(Matern, Riess, and Stamminger 2019)</xref>
        (Li and Lyu 2019; Li et al. 2020)
      </p>
      <p>
        <xref ref-type="bibr" rid="ref36">(Ro¨ssler et al. 2019)</xref>
        (Nguyen, Yamagishi, and Echizen 2019)
      </p>
      <p>
        <xref ref-type="bibr" rid="ref22 ref30 ref34 ref9">(Dang et al. 2020)</xref>
        <xref ref-type="bibr" rid="ref10">(Dolhansky et al. 2019)</xref>
        <xref ref-type="bibr" rid="ref37">(Sabir et al. 2019)</xref>
        <xref ref-type="bibr" rid="ref8">(Conotter et al. 2014)</xref>
        <xref ref-type="bibr" rid="ref14 ref24 ref5">(Li, Chang, and Lyu 2018)</xref>
        <xref ref-type="bibr" rid="ref2 ref25">(Agarwal and Farid 2019)</xref>
        <xref ref-type="bibr" rid="ref34 ref6">(Ciftci, Demir, and Yin 2020)</xref>
        <xref ref-type="bibr" rid="ref21 ref34">(Jung, Kim, and Kim 2020)</xref>
        <xref ref-type="bibr" rid="ref22 ref30 ref33 ref34">(Qi et al. 2020)</xref>
        facts existed in the 1st generation of fake videos. The authors
of
        <xref ref-type="bibr" rid="ref2 ref25 ref28">(Matern, Riess, and Stamminger 2019)</xref>
        proposed fake
detectors based on simple visual artifacts such as eye colour,
missing reflections, and missing details in the teeth areas,
achieving a final 85.1% AUC.
      </p>
      <p>
        Approaches based on the detection of the face warping
artifacts have also been studied in the literature. For
example,
        <xref ref-type="bibr" rid="ref2 ref22 ref25 ref26 ref30 ref34 ref42">(Li and Lyu 2019; Li et al. 2020)</xref>
        proposed detection
systems based on CNN in order to detect the presence of
such artifacts from the face and the surrounding areas, being
one of the most robust detection approaches against unseen
face manipulations.
      </p>
      <p>
        Undoubtedly, fake detectors based on pure deep
learning features are the most popular ones: feeding the
networks with as many real/fake videos as possible and
letting the networks to automatically extract the
discriminative features. In general, these fake detectors have achieved
very good results using popular network architectures such
as Xception
        <xref ref-type="bibr" rid="ref10 ref36">(Ro¨ssler et al. 2019; Dolhansky et al. 2019)</xref>
        ,
novel ones such as Capsule Networks
        <xref ref-type="bibr" rid="ref2 ref25 ref32">(Nguyen, Yamagishi,
and Echizen 2019)</xref>
        , and novel training techniques based on
attention mechanisms
        <xref ref-type="bibr" rid="ref22 ref30 ref34 ref9">(Dang et al. 2020)</xref>
        .
      </p>
      <p>
        Fake detectors based on the image and temporal
discrepancies across frames have also been proposed in the
literature.
        <xref ref-type="bibr" rid="ref37">(Sabir et al. 2019)</xref>
        proposed a Recurrent
Convolutional Network similar to
        <xref ref-type="bibr" rid="ref14 ref5">(Gu¨era and Delp 2018)</xref>
        , trained
end-to-end instead of using a pre-trained model. Their
proposed detection approach was tested using FaceForensics++
database
        <xref ref-type="bibr" rid="ref36">(Ro¨ssler et al. 2019)</xref>
        , achieving AUC results above
96%.
      </p>
      <p>
        Although most approaches are based on the detection of
fake videos using the whole face, in
        <xref ref-type="bibr" rid="ref16 ref22 ref30 ref34 ref40 ref41 ref6">(Tolosana et al. 2020a)</xref>
        the authors evaluated the discriminative power of each facial
region using state-of-the-art network architectures,
achieving interesting results on DeepFake databases of the 1st and
2nd generations.
      </p>
      <p>
        Finally, we pay special attention to the fake detectors
based on physiological information. The eye blinking rate
was studied in
        <xref ref-type="bibr" rid="ref14 ref21 ref24 ref34 ref5">(Li, Chang, and Lyu 2018; Jung, Kim, and
Kim 2020)</xref>
        .
        <xref ref-type="bibr" rid="ref14 ref24 ref5">(Li, Chang, and Lyu 2018)</xref>
        proposed Long-Term
Recurrent Convolutional Networks (LRCN) to capture the
temporal dependencies existed in human eye blinking. Their
method was evaluated on the UADFV database, achieving
a final 99.0% AUC. More recently,
        <xref ref-type="bibr" rid="ref21 ref34">(Jung, Kim, and Kim
2020)</xref>
        proposed a different approach named DeepVision.
They fused the Fast-HyperFace
        <xref ref-type="bibr" rid="ref35">(Ranjan, Patel, and
Chellappa 2017)</xref>
        and EAR
        <xref ref-type="bibr" rid="ref38">(Soukupova and Cech 2016)</xref>
        algorithms to track the blinking, achieving an accuracy of 87.5%
over an in-house database.
      </p>
      <p>
        Fake detectors based on the analysis of the way we speak
were studied in
        <xref ref-type="bibr" rid="ref2 ref25">(Agarwal and Farid 2019)</xref>
        , focusing on
the distinct facial expressions and movements. These
features were considered in combination with Support Vector
Machines (SVM), achieving a 96.3% AUC over their own
database.
      </p>
      <p>
        Finally, fake detection methods based on the heart rate
have been also studied in the literature. One of the first
studies in this regard was
        <xref ref-type="bibr" rid="ref8">(Conotter et al. 2014)</xref>
        where the
authors preliminary evaluated the potential of blood flow
changes in the face to distinguish between computer
generated and real videos. Their proposed approach was
evaluated using 12 videos (6 real and fake videos each),
concluding that it is possible to use this metric to detect computer
generated videos.
      </p>
      <p>
        Changes in the blood flow have also been studied
in
        <xref ref-type="bibr" rid="ref22 ref30 ref33 ref34 ref34 ref6">(Ciftci, Demir, and Yin 2020; Qi et al. 2020)</xref>
        using
DeepFake videos. In
        <xref ref-type="bibr" rid="ref34 ref6">(Ciftci, Demir, and Yin 2020)</xref>
        , the authors
considered rPPG techniques to extract robust biological
features. Classifiers based on SVM and CNN were analyzed,
achieving final accuracies of 94.9% and 91.5% for the
DeepFakes videos of FaceForensics++ and Celeb-DF,
respectively.
      </p>
      <p>
        Recently, in
        <xref ref-type="bibr" rid="ref22 ref30 ref33 ref34">(Qi et al. 2020)</xref>
        a more sophisticated fake
detector named DeepRhythm was presented. This approach
was also based on features extracted using rPPG
techniques. DeepRhythm was enhanced through two modules:
i) motion-magnified spatial-temporal representation, and ii)
dual-spatial-temporal attention. These modules were
incorporated in order to provide a better adaptation to
dynamically changing faces and various fake types. In general, good
results with accuracies of 100% were achieved on
FaceForensics++ database. However, this method suffers from
a demanding preprocessing stage, needing a precise
detection of 81 facial landmarks and the use of a color
magnification algorithm prior to fake detection. Also, poor results
were achieved on databases of the 2nd generation such as the
DFDC Preview (Acc. = 64.1%).
      </p>
      <p>
        In the present work, in addition to the proposal of a
different DeepFake detection architecture, we enhance previous
approaches, e.g.
        <xref ref-type="bibr" rid="ref22 ref30 ref33 ref34">(Qi et al. 2020)</xref>
        , by keeping the
preprocessing stage as light and robust as possible, only composed of a
face detector and frame normalization. To provide an
overall picture, we include in Table 1 the results achieved with
our proposed DeepFakesON-Phys in comparison with key
related works, which shows that we outperform the state of
the art on Celeb-DF v2 and DFDC Preview databases.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Method: DeepFakesON-Phys</title>
      <p>Fig. 1 graphically summarizes the architecture of
DeepFakesON-Phys, the proposed fake detector based
on heart rate estimation. We hypothesize that rPPG methods
should obtain significantly different results when trying to
estimate the subjacent heart rate from a video containing
a real face, compared with a fake face. Since the changes
in color and illumination due to oxygen concentration are
subtle and invisible to the human eye, we think that most
of the existing DeepFake manipulation methods do not
consider the physiological aspects of the human being yet.</p>
      <p>
        The initial architecture of DeepFakesON-Phys is based on
the DeepPhys model described in
        <xref ref-type="bibr" rid="ref14 ref5">(Chen and McDuff 2018)</xref>
        ,
whose objective was to estimate the human heart rate using
facial video sequences. The model is based on deep
learning and was designed to extract spatio-temporal
information from videos mimicking the behavior of traditional
handcrafted rPPG techniques. Features are extracted through the
color changes in users’ faces that are caused by the
variation of oxygen concentration in the blood. Signal processing
methods are also used for isolating the color changes caused
by blood from other changes that may be caused by factors
such as external illumination, noise, etc.
      </p>
      <p>As can be seen in Fig. 1, after the first preprocessing stage,
the Convolutional Attention Network (CAN) is composed of
two different CNN branches:
• Motion Model: it is designed to detect changes between
consecutive frames, i.e., performing a short-time analysis
of the video for detecting fakes. To accomplish this task,
the input at a time t consists of a frame computed as the
normalized difference of the current frame I(t) and the
previous one I(t 1).
• Appearance Model: it focuses on the analysis of the
static information on each video frame. It has the target
of providing the Motion Model with information about
which points of the current frame may contain the most
relevant information for detecting DeepFakes, i.e., a batch
of attention masks that are shared at different layers of the
CNN. The input of this branch at time t is the raw frame
of the video I(t), normalized to zero mean and unitary
standard deviation.</p>
      <p>The attention masks coming from the Appearance Model
are shared with the Motion Model at two different points of
the CAN. Finally, the output layer of the Motion Model is
also the final output of the entire CAN.</p>
      <p>
        In the original architecture
        <xref ref-type="bibr" rid="ref14 ref5">(Chen and McDuff 2018)</xref>
        , the
output stage consisted of a regression layer for estimating
the time derivative of the subject’s heart rate. In our case, as
we do not aim to estimate the pulse of the subject, but the
presence of a fake face, we change the final regression layer
to a classification layer, using a sigmoid activation function
for obtaining a final score in the [0,1] range for each instant t
of the video, related to the probability of the face being real.
      </p>
      <p>
        Since the original DeepPhys model from
        <xref ref-type="bibr" rid="ref14 ref5">(Chen and
McDuff 2018)</xref>
        is not publicly available, instead of
training a new CAN from scratch, we decided to initialize
DeepFakesON-Phys with the weights from the model
pretrained for heart rate estimation presented in
        <xref ref-type="bibr" rid="ref22 ref30 ref34">(HernandezOrtega et al. 2020b)</xref>
        , which is also an adaptation of
DeepPhys but trained using the COHFACE database
        <xref ref-type="bibr" rid="ref20">(Heusch,
Anjos, and Marcel 2017)</xref>
        . This model also showed to have
high accuracy in the heart rate estimation task using real face
videos, so our idea is to take benefit of that acquired
knowledge to better train DeepFakesON-Phys through a proper
fine-tuning process.
      </p>
      <p>Once we initialized DeepFakesON-Phys with the
mentioned weights, we freeze the weights of all the layers of
the original CAN model apart from the new classification
layer and the last fully-connected layer, and we retrain the
model. Due to this fine-tuning process we take benefit of the
weights learned for heart rate estimation, just adapting them
for the DeepFake detection task. This way, we make sure
that the weights of the convolutional layers remain looking
for information relative to heart rate and the last layers learn
how to use that information for detecting the existence of
DeepFakes.</p>
    </sec>
    <sec id="sec-4">
      <title>Databases</title>
      <p>Two different public databases are considered in the
experimental framework of this study. In particular, Celeb-DF v2
and DFDC Preview, the two most challenging DeepFake
databases up to date. Their videos exhibit a large range of
variations in aspects such as face sizes (in pixels),
lighting conditions (i.e., day, night, etc.), backgrounds, different
acquisition scenarios (i.e., indoors and outdoors), distances
from the person to the camera, and pose variations, among
others. These databases present enough images (fake and
genuine) to fine-tune the original weights meant for heart
rate estimation, obtaining new weights also based in rPPG
features but adapted for DeepFake detection. Table 2
summarizes the main characteristics of the databases.</p>
      <sec id="sec-4-1">
        <title>Celeb-DF v2</title>
        <p>
          The aim of the Celeb-DF v2 database
          <xref ref-type="bibr" rid="ref22 ref26 ref30 ref34">(Li et al. 2020)</xref>
          was
to generate fake videos of better visual quality compared
with the previous UADFV database. This database consists
of 590 real videos extracted from Youtube, corresponding to
celebrities with a diverse distribution in terms of gender, age,
and ethnic group. Regarding fake videos, a total of 5,639
videos were created swapping faces using DeepFake
technology. The final videos are in MPEG4.0 format.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>DFDC Preview</title>
        <p>
          The DFDC database
          <xref ref-type="bibr" rid="ref10">(Dolhansky et al. 2019)</xref>
          is one of the
latest public databases, released by Facebook in
collaboration with other companies and academic institutions such as
Microsoft, Amazon, and the MIT. In the present study we
consider the DFDC Preview dataset consisting of 1,131 real
videos from 66 paid actors, ensuring realistic variability in
gender, skin tone, and age. It is important to remark that no
publicly available data or data from social media sites were
used to create this dataset, unlike other popular databases.
Regarding fake videos, a total of 4,119 videos were created
using two different unknown approaches for fakes
generation. Fake videos were generated by swapping subjects with
similar appearances, i.e., similar facial attributes such as skin
tone, facial hair, glasses, etc. After a given pairwise model
was trained on two identities, the identities were swapped
onto the other’s videos.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <sec id="sec-5-1">
        <title>Experimental Protocol</title>
        <p>
          Celeb-DF v2 and DFDC Preview databases have been
divided into non-overlapping datasets, development and
evaluation. It is important to remark that each dataset comprises
videos from different identities (both real and fake), unlike
some previous studies. This aspect is very important in order
to perform a fair evaluation and predict the generalization
ability of the fake detection systems against unseen
identities. Also, it is important to remark that the evaluation is
carried out at frame level as in most previous studies
          <xref ref-type="bibr" rid="ref22 ref30 ref34 ref40 ref41">(Tolosana
et al. 2020b)</xref>
          , not video level, using the popular AUC and
accuracy metrics.
        </p>
        <p>
          For the Celeb-DF v2 database, we consider real/fake
videos of 40 and 19 different identities for the development
and evaluation datasets respectively, whereas for the DFDC
Preview database, we follow the same experimental protocol
proposed in
          <xref ref-type="bibr" rid="ref10">(Dolhansky et al. 2019)</xref>
          as the authors already
considered this concern.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Fake Detection Results: DeepFakesON-Phys</title>
        <p>This section evaluates the ability of DeepFakesON-Phys to
detect the most challenging DeepFake videos of the 2nd
generation. Table 3 shows the fake detection performance
results achieved in terms of AUC and accuracy over the final
evaluation datasets of Celeb-DF v2 and DFDC Preview. It is
important to highlight that a separate fake detector is trained
for each database.</p>
        <p>In general, very good results are achieved in both
DeepFake databases. For the Celeb-DF v2 database,
DeepFakesON-Phys achieves an accuracy of 98:7% and an
AUC of 99:9%. Regarding the DFDC Preview database, the
results achieved are 94:4% accuracy and 98:2% AUC,
similar ones to the obtained for the Celeb-DF database.</p>
        <p>
          Observing the results, it seems clear that the fake detectors
have learnt to distinguish the spatio-temporal differences
between the real/fake faces of Celeb-DF v2 and DFDC
Preview databases. Since all the convolutional layers of the
proposed fake detector are frozen (the network was originally
initialized with the weights from the model trained to
predict the heart rate
          <xref ref-type="bibr" rid="ref15 ref22 ref30 ref34">(Hernandez-Ortega et al. 2020b)</xref>
          ), and we
only train the last fully-connected layers, we can conclude
that the proposed detection approach based on
physiological measurement is successfully using pulse-related features
for distinguishing between real and fake faces. These results
prove that current face manipulation techniques do not pay
attention to the heart-rate-related physiological information
of the human being when synthesizing fake videos.
Real classified as Real
Fake classified as Real
DeepFake Scores
DeepFake Scores
DeepFake Scores
        </p>
        <p>Fig. 2 shows some examples of successful and failed
detections when evaluating the proposed approach with
real/fake faces of Celeb-DF v2. In particular, all the failures
correspond to fake faces generated from a particular video,
misclassifying them as real faces. Fig. 2 shows a frame from
the original real video (top-left), one from a misclassified
fake video generated using that scenario (top-middle), and
another from a fake video correctly classified as fake and
generated using the same real and fake identities but from
other source videos (top-right). The detection threshold is
the same for all the testing databases and videos, and it has
been selected to maximize the accuracy in the evaluation.</p>
        <p>
          Looking at the score distributions along time of the three
examples (Fig. 2, bottom), it can be seen that for the real face
video (left) the scores are 1 for most of the time and always
over the detection threshold. However, for the fake videos
considered (middle and right), the score changes constantly,
making the score of some fake frames to cross the
detection threshold and consequently misclassifying them as real.
Nevertheless, it is important to remark that these mistakes
only happen if we analyze the results at frame level
(traditional approach followed in the literature
          <xref ref-type="bibr" rid="ref22 ref30 ref34 ref40 ref41">(Tolosana et al.
2020b)</xref>
          ). In case we consider an evaluation at video level,
DeepFakesON-Phys would be able to detect fake videos by
integrating the temporal information available in short-time
segments, e.g., in a similar way as described in
(HernandezOrtega et al. 2018) for continuous face anti-spoofing.
        </p>
        <p>We believe that the failures produced in this particular
case are propitiated by the interferences of external
illumination. rPPG methods that use handcrafted features are
usually fragile against external artificial illumination in the
frequency and power ranges of normal human heart rate,
making difficult to distinguish those illumination changes from
the color changes caused by blood perfusion. Anyway, the
proposed physiological approach presented in this work is
more robust to this kind of illumination perturbations than
hand-crafted methods, thanks to the fact that the training
process is data-driven, making possible to identify those
interferences by using their presence in the training data.</p>
      </sec>
      <sec id="sec-5-3">
        <title>Comparison with the State of the Art</title>
        <p>
          Finally, we compared in Table 4 the results achieved in the
present work with other state-of-the-art DeepFake detection
approaches: head pose variations
          <xref ref-type="bibr" rid="ref2 ref25 ref42">(Yang, Li, and Lyu 2019)</xref>
          ,
face warping artifacts
          <xref ref-type="bibr" rid="ref22 ref26 ref30 ref34">(Li et al. 2020)</xref>
          , mesoscopic
features
          <xref ref-type="bibr" rid="ref1">(Afchar et al. 2018)</xref>
          , pure deep learning features
          <xref ref-type="bibr" rid="ref16 ref22 ref22 ref30 ref30 ref34 ref34 ref40 ref41 ref6 ref9">(Dang
et al. 2020; Tolosana et al. 2020a)</xref>
          , and physiological
features
          <xref ref-type="bibr" rid="ref22 ref30 ref33 ref34 ref34 ref6">(Qi et al. 2020; Ciftci, Demir, and Yin 2020)</xref>
          . The best
results achieved for each database are remarked in bold.
Results in italics indicate that the evaluated database was
not used for training. Some of these results are extracted
from
          <xref ref-type="bibr" rid="ref22 ref26 ref30 ref34">(Li et al. 2020)</xref>
          .
        </p>
        <p>
          Note that the comparison in Table 4 is not always
under the same datasets and protocols, therefore it must be
interpreted with care. Despite of that, it is patent that the
proposed DeepFakesON-Phys has achieved state-of-the-art
results in both Celeb-DF and DFDC Preview databases.
In particular, it has further outperformed popular fake
detectors based on pure deep learning approaches such as
Xception and Capsule Networks
          <xref ref-type="bibr" rid="ref16 ref22 ref30 ref34 ref40 ref41 ref6">(Tolosana et al. 2020a)</xref>
          and also other recent physiological approaches based on
SVM/CNN
          <xref ref-type="bibr" rid="ref34 ref6">(Ciftci, Demir, and Yin 2020)</xref>
          .
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>This work has evaluated the potential of physiological
measurement to detect DeepFake videos. In particular, we have
proposed a novel DeepFake detector named
DeepFakesONPhys based on a Convolutional Attention Network (CAN)
originally trained for heart rate estimation using remote
photoplethysmography (rPPG). The proposed CAN approach
consists of two parallel CNN networks that extract and share
temporal and spatial information from video frames.</p>
      <p>DeepFakesON-Phys has been evaluated using Celeb-DF
v2 and DFDC Preview databases, two of the latest and most
challenging DeepFake video databases. Regarding the
experimental protocol, each database was divided into
development and evaluation datasets, considering different
identities in each dataset in order to perform a fair evaluation of
the technology.</p>
      <p>The soundness and competitiveness of
DeepFakesONPhys has been proven by the very good results achieved,
AUC values of 99.9% and 98.2% for the Celeb-DF and
DFDC databases, respectively. These results have
outperformed other state-of-the-art fake detectors based on face
warping and pure deep learning features, among others.
Finally, the experimental results of this study reveal that
current face manipulation techniques do not pay attention to
the heart-rate-related or blood-related physiological
information.</p>
      <p>
        Immediate work may consist in replicating the state of
the art DeepFake works and training them with the same
databases than the ones used to train DeepFakesON-Phys in
order to make a fair comparison of accuracy, and showing
the actual performance of our method. Another future work
will be oriented to the analysis of the robustness of the
proposed fake detection approach against face manipulations
unseen during the training process
        <xref ref-type="bibr" rid="ref22 ref30 ref34 ref40 ref41">(Tolosana et al. 2020b)</xref>
        ,
temporal integration of frame data (Hernandez-Ortega et al.
2018), and the application of the proposed physiological
approach to other face manipulation techniques such as face
morphing
        <xref ref-type="bibr" rid="ref22 ref30 ref34">(Raja and et al. 2020)</xref>
        .
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been supported by projects: IDEA-FAST
(IMI2-2018-15-two-stage-853981), PRIMA
(ITN-2019860315), TRESPASS-ETN (ITN-2019-860813), BIBECA
(RTI2018-101248-B-I00 MINECO/FEDER), and edBB
(Universidad Autonoma de Madrid, UAM). J. H.-O. is
supported by a PhD fellowship from UAM. R. T. is supported
by a Postdoctoral fellowship from CAM/FSE.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Afchar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nozick</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yamagishi</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Echizen</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and Farid,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Protecting World Leaders Against Deep Fakes</article-title>
          .
          <source>In Proc. IEEE/CVF Conf. on Computer Vision</source>
          and Pattern Recognition Workshops.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          2013.
          <article-title>Computationally Efficient Face Spoofing Detection with Motion Magnification</article-title>
          .
          <source>In Proc. IEEE/CVF Conf. on Comp. Vision and Pattern Recognition Workshops.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Cellan-Jones</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Deepfake Videos Double in Nine Months</article-title>
          . URL https://www.bbc.com/news/technology49961089.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>McDuff</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks</article-title>
          .
          <source>In Proc. European Conf. on Computer Vision</source>
          ,
          <fpage>349</fpage>
          -
          <lpage>365</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Ciftci</surname>
          </string-name>
          , U. A.;
          <string-name>
            <surname>Demir</surname>
            ,
            <given-names>I.;</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>FakeCatcher: Detection of Synthetic Portrait Videos Using Biological Signals</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Citron</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>How DeepFake Undermine Truth</article-title>
          and
          <string-name>
            <given-names>Threaten</given-names>
            <surname>Democracy</surname>
          </string-name>
          . URL https://www.ted.com.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Conotter</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bodnari</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Boato</surname>
            , G.; and Farid,
            <given-names>H.</given-names>
          </string-name>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Dang</surname>
            , H.; Liu,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Stehouwer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Liu,
          <string-name>
            <given-names>X.</given-names>
            ; and
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Dolhansky</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Howes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Pflaum,
          <string-name>
            <given-names>B.</given-names>
            ;
            <surname>Baram</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Ferrer</surname>
            ,
            <given-names>C. C.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>The Deepfake Detection Challenge (DFDC) Preview Dataset</article-title>
          . arXiv preprint:
          <year>1910</year>
          .08854 .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Erdogmus</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Marcel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Spoofing Face Recognition with 3D Masks</article-title>
          .
          <source>IEEE Transactions on Information Forensics and Security</source>
          <volume>9</volume>
          (
          <issue>7</issue>
          ):
          <fpage>1084</fpage>
          -
          <lpage>1097</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Galbally</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Marcel,
          <string-name>
            <given-names>S.</given-names>
            ; and
            <surname>Fierrez</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Biometric AntiSpoofing Methods: A Survey in Face Recognition</article-title>
          .
          <source>IEEE Access</source>
          <volume>2</volume>
          :
          <fpage>1530</fpage>
          -
          <lpage>1552</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pouget-Abadie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Mirza,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ;
            <surname>WardeFarley</surname>
          </string-name>
          , D.;
          <string-name>
            <surname>Ozair</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and Bengio,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Generative Adversarial Nets</article-title>
          .
          <source>In Proc. Advances in Neural Information Processing Systems</source>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Gu</surname>
          </string-name>
          ¨era, D.; and
          <string-name>
            <surname>Delp</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Deepfake Video Detection Using Recurrent Neural Networks</article-title>
          .
          <source>In Proc. Int. Conf. on Advanced Video and Signal Based Surveillance.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Hernandez-Ortega</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Daza,
          <string-name>
            <given-names>R.</given-names>
            ;
            <surname>Morales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Fierrez</surname>
          </string-name>
          , J.; and Tolosana,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2020a</year>
          .
          <article-title>Heart Rate Estimation from Face Videos for Student Assessment: Experiments on edBB</article-title>
          .
          <source>In Proc.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          2020b.
          <article-title>A Comparative Evaluation of Heart Rate Estimation Methods using Face Videos</article-title>
          .
          <source>In Proc. IEEE Intl. Workshop on Medical Computing.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Hernandez-Ortega</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fierrez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Morales</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Galbally</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Introduction to Face Presentation Attack Detection</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>In Handbook of Biometric Anti-Spoofing</source>
          ,
          <fpage>187</fpage>
          -
          <lpage>206</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          2018.
          <article-title>Time Analysis of Pulse-Based Face Anti-Spoofing in Visible and NIR</article-title>
          .
          <source>In Proc. IEEE Conf. on Comp. Vision and Pattern Recognition Workshops.</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Heusch</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Anjos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Marcel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>A reproducible study on remote heart rate measurement</article-title>
          .
          <source>arXiv preprint:1709</source>
          .
          <fpage>00962</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Jung</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>DeepVision: Deepfakes Detection Using Human Eye Blinking Pattern</article-title>
          .
          <source>IEEE Access</source>
          <volume>8</volume>
          :
          <fpage>83144</fpage>
          -
          <lpage>83154</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Karras</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ; et al.
          <year>2020</year>
          .
          <article-title>Analyzing and Improving the Image Quality of StyleGAN</article-title>
          .
          <source>In Proc. IEEE/CVF Conf. on Comp.</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D. P.</given-names>
          </string-name>
          ; and Welling,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2013</year>
          .
          <article-title>Auto-Encoding Variational Bayes</article-title>
          .
          <source>In Proc. Int. Conf. on Learning Represent.</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Lyu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2018</year>
          . In Ictu Oculi:
          <article-title>Exposing AI Generated Fake Face Videos by Detecting Eye Blinking</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Lyu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Exposing DeepFake Videos By Detecting Face Warping Artifacts</article-title>
          .
          <source>In Proc. IEEE/CVF Conf. on Comp. Vision and Pattern Recognition Workshops.</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Qi</surname>
          </string-name>
          , H.; and
          <string-name>
            <surname>Lyu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>CelebDF: A Large-Scale Challenging Dataset for DeepFake Forensics</article-title>
          .
          <source>In Proc. IEEE/CVF Conf. on Comp. Vision and Pattern Recognition.</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Marcel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Nixon,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Fierrez</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Handbook of Biometric Anti-Spoofing (2nd Edition)</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Matern</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Riess</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ; and Stamminger,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Exploiting Visual Artifacts to Expose DeepFakes and Face Manipulations</article-title>
          .
          <source>In Proc. IEEE Winter App. of Comp. Vision Workshops.</source>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>McDuff</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Estepp</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Piasecki</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Blackford</surname>
            ,
            <given-names>E. B.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>A Survey of Remote Optical Photoplethysmographic Imaging Methods</article-title>
          .
          <source>In Proc. Annual Int. Conf. of the IEEE Engineering in Medicine and Biology Society.</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Neves</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; et al.
          <year>2020</year>
          .
          <article-title>GANprintR: Improved Fakes and Evaluation of the State of the Art in Face Manipulation Detection</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <source>IEEE Journal of Selected Topics in Signal Processing</source>
          <volume>14</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1038</fpage>
          -
          <lpage>1048</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>H. H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yamagishi</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Echizen</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Use of a Capsule Network to Detect Fake Images and Videos</article-title>
          . arXiv preprint:
          <year>1910</year>
          .12467 .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Juefei-Xu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ; Ma, L.;
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Liu,
          <string-name>
            <given-names>Y.</given-names>
            ; and
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2020</year>
          .
          <article-title>DeepRhythm: Exposing DeepFakes with Attentional Visual Heartbeat Rhythms</article-title>
          . arXiv preprint:
          <year>2006</year>
          .07634 .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Raja</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ; and et al.
          <year>2020</year>
          .
          <article-title>Morphing Attack Detection - Database, Evaluation Platform and Benchmarking</article-title>
          .
          <source>IEEE Transactions on Information Forensics and Security</source>
          . .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <surname>Ranjan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>V. M.</given-names>
          </string-name>
          ; and Chellappa,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Hyperface: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition</article-title>
          .
          <source>IEEE Trans. on Pattern Analysis and Machine Intelligence</source>
          <volume>41</volume>
          (
          <issue>1</issue>
          ):
          <fpage>121</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <article-title>Ro¨ssler,</article-title>
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Cozzolino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ;
            <surname>Verdoliva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ;
            <surname>Riess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Thies</surname>
          </string-name>
          , J.; and Nießner,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>FaceForensics++: Learning to Detect Manipulated Facial Images</article-title>
          .
          <source>In Proc. IEEE/CVF Int. Conf. on Comp. Vision.</source>
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>Sabir</surname>
          </string-name>
          , E.; Cheng, J.;
          <string-name>
            <surname>Jaiswal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; AbdAlmageed, W.; Masi,
          <string-name>
            <surname>I.;</surname>
          </string-name>
          and Natarajan,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Recurrent Convolutional Strategies for Face Manipulation Detection in Videos</article-title>
          .
          <source>In Proc.</source>
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <surname>Soukupova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Cech</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Real-Time Eye Blink Detection Using Facial Landmarks</article-title>
          .
          <source>In Proc. Comp</source>
          . Vision Winter Workshop.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Nijholt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Brain-Computer Interfaces and Human-Computer Interaction</article-title>
          . In Brain-Computer Interfaces,
          <fpage>3</fpage>
          -
          <lpage>19</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <string-name>
            <surname>Tolosana</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Romero-Tapiador</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fierrez</surname>
            , J.; and VeraRodriguez,
            <given-names>R.</given-names>
          </string-name>
          <year>2020a</year>
          .
          <article-title>DeepFakes Evolution: Analysis of Facial Regions and Fake Detection Performance</article-title>
          .
          <source>Proc. International Conference on Pattern Recognition Workshops .</source>
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <string-name>
            <surname>Tolosana</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Vera-Rodriguez</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Fierrez,
          <string-name>
            <given-names>J.</given-names>
            ;
            <surname>Morales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ; and
            <surname>Ortega-Garcia</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2020b</year>
          .
          <article-title>DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection</article-title>
          .
          <source>Information Fusion</source>
          <volume>64</volume>
          :
          <fpage>131</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Lyu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Exposing Deep Fakes Using Inconsistent Head Poses</article-title>
          .
          <source>In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>