<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using GANs to Synthesise Minimum Training Data for Deepfake Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simranjeet Singh</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rajneesh Sharma</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alan F. Smeaton</string-name>
          <email>smeaton@dcu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Insight Centre for Data Analytics Dublin City University</institution>
          ,
          <addr-line>Glasnevin, Dublin 9</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computing</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>There are many applications of Generative Adversarial Networks (GANs) in fields like computer vision, natural language processing, speech synthesis, and more. Undoubtedly the most notable results have been in the area of image synthesis and in particular in the generation of deepfake videos. While deepfakes have received much negative media coverage, they can be a useful technology in applications like entertainment, customer relations, or even assistive care. One problem with generating deepfakes is the requirement for a lot of image training data of the subject which is not an issue if the subject is a celebrity for whom many images already exist. If there are only a small number of training images then the quality of the deepfake will be poor. Some media reports have indicated that a good deepfake can be produced with as few as 500 images but in practice, quality deepfakes require many thousands of images, one of the reasons why deepfakes of celebrities and politicians have become so popular. In this study, we exploit the property of a GAN to produce images of an individual with variable facial expressions which we then use to generate a deepfake. We observe that with such variability in facial expressions of synthetic GAN-generated training images and a reduced quantity of them, we can produce a near-realistic deepfake videos.</p>
      </abstract>
      <kwd-group>
        <kwd>Deepfake generation</kwd>
        <kwd>Generative Adversarial Networks</kwd>
        <kwd>GANs</kwd>
        <kwd>Variable face images</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Recently we have seen a rise in the presence of deepfake videos on social media and in
entertainment applications. Sometimes these are used for good but it is the mis-use of
deepfakes which attracts most media attention and commentary. What makes deepfakes
so important today is their low barrier to entry, meaning that easily available tools and
models can be used by researchers with even moderate programming skills to generate
very realistic deepfake videos. When this is considered in the context of targeted
advertisements for political elections on social media, then the impact of deepfakes could be
quite significant.</p>
      <p>A deepfake is a video created by manipulating an original video using advanced
machine learning techniques. This involves replacing the face of an individual from
a source video with the face of a second person in the destination video. A model of
the face of the second person, the one who is superimposed into the destination video,
is created based on a typically large collection of facial images. In the early days of
deepfake videos, celebrities were used in the destination videos because (a) it is easy to
get thousands of images of celebrities from the internet and (b) most of these pictures
are of the subject facing the camera. The Hollywood actor Nicholas Cage became even
more of a celebrity as a model based on images of his face was one of the first to be
made publicly available and was widely used in creating deepfakes when the interest
was in the quality of the generated videos and less on who the subjects were.</p>
      <p>Now that we have reached the point where the quality of deepfakes is almost
indiscernible from real videos, interest returns to how to generate these deepfakes, not
using celebrities as the subjects but using ordinary people. While there are nefarious
applications based on the use of deepfakes of non-celebrity individuals, there are also
useful scenarios. An example of this is using deepfake videos of a non-celebrity as a
sales agent or troubleshooter in an online chat system.</p>
      <p>One characteristic of the non-celebrity subject in a deepfake, is that there will
typically be a limited number of images of the subject’s face available for training a
deepfake generator, perhaps even no images to start from. Thus we expect that training data,
i.e. images of the face, may actually be taken from short video clips recorded
specifically for this purpose.</p>
      <p>In this paper we look at how deepfake videos of non-celebrity subjects can be
generated using limited training data, i.e. a small number of training images. In particular
we are interested not just in the limited number of images used but also in the variability
of facial expressions among those limited number of images. To test this we use a large
number of images to create a model of an individual face, and then we generate a small
number of synthetic but realistic images from that model which we use to generate a
deepfake. While this may seem counter intuitive, to use a large number of images of
a celebrity to generate a small number of synthetic images of that celebrity this allows
the synthetic images to include a lot of facial variety of expression which we could not
obtain easily if we were to use a real collection as the small number of deepfake training
images.</p>
      <p>The rest of this paper is organised as follows. In the next section we present an
overview of Generative Adversarial Networks (GANs) followed by a description of 4
metrics used to evaluate the quality of our output from image-generating GANs. We
then describe how we gathered, or more correctly how we synthesised image data for
training a GAN and we then present an analysis of those images in terms of their quality
and variability of facial expressions. That is followed by a description of how we used
those images to create a deepfake and then some conclusions and plans for future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Generative Adversarial Networks (GANs)</title>
      <p>
        The idea behind adversarial networks was first published by Olli Niemitalo however his
ideas were never implemented [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and a similar concept was introduced by Li, Gauci
and Gross in 2013 [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Generative Adversarial Network (GAN) implementations were
first described in 2014 by Ian Goodfellow and until 2017 the use of GANs was restricted
to just image enhancement to produce high quality images. In 2017 GANs were used for
the first time for generating new facial images and the idea began to make its presence
known in the fine arts arena and were thus dubbed creative adversarial networks [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
      </p>
      <p>
        GANs have been widely applied to domains such as computer vision, natural
language processing, etc. GANs have contributed immensely to the field of image
generation [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] where the quality of synthetic images a GAN can produce has improved
significantly over the years since its inception. Other example applications of GANs
include the generation of DNA sequences, 3D models of replacement teeth, impressionist
paintings, and of course video clips, some known as deepfakes.
      </p>
      <p>Deepfakes are a form of video manipulation where two trained networks are pitted
against each other to generate an output of sufficient quality as to be close to
indecipherable. They operate by inputting a set of images of a subject from which they build a
model of the face and then superimpose this face model on the target face in an original
video.</p>
      <p>One of the challenges faced by deepfake generation, apart from their computational
cost, is the requirement for a large number of training images of the subject to be faked
into the original image. In practice, the quality of the generated deepfake will depend
not only on the number of face images in the training data but the amount of facial
variability among those images and the amount of facial variation in the original video.
If the original video has a face with not much emotion shown and very little variation
in facial expression then it follows that the training data for the face to be superimposed
does not need a wide variety of facial expression and thus a smaller number of training
images are needed. If the original video has a lot of facial variation then the model to
be generated to replace this original face will need to be larger and more complex, and
thus require far more training data. Some commentators have said that as few as 500
images of the face of a subject are required for a good deepfake but in practice these
refer to deepfakes without much facial emotion and the best deepfakes are generated
using many thousands of source images of the face.</p>
      <p>Deepfakes have many applications in the entertainment industry such as movie
production and the Salvador Dali museum in St Petersburg, Florida1, but there are also
applications in areas like customer relations where text or audio chatbots are replaced
by real people or deepfakes, or in assistive technology where older people living alone
might interact with generated media which could consist of deepfaked videos of loved
ones. The problem with such applications is that there are usually few images available
from which to train a model to create a deepfake.</p>
      <p>
        In this study we look into how the amount of, and the variety of facial expressions
included in, the face data used to train a deepfake generator affects the quality of the
deepfake. One of the latest GANs, StyleGAN2 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], is used in our study to produce
synthetic facial images for training and various evaluation methods are used to benchmark
the quality of these synthetic images including Inception score [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and the Fre´chet
Inception Distance [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and the variety among those faces using OpenFace’s Comparison
method [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and face recognition’s compare method [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Our implementation of
StyleGAN2 is trained on a dataset of 132,000 images taken from stills of YouTube videos
of TV night show host John Oliver and from this we synthesise 1,000 images in a way
that includes a lot of facial variation. We then use these 1,000 images to train
deepface
      </p>
      <sec id="sec-2-1">
        <title>1 https://thedali:org/</title>
        <p>
          lab [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] to generate a deepfake where the (synthesised) John Oliver is superimposed on
a subject shown interacting with a chatbot in a dialogue. A schematic of the flow of our
data processing is shown in Figure 1.
        </p>
        <p>As we show later in this paper, when trained with enough facial variations in input
images, we found that deepfacelab is able to produce an accepted quality of generated
deepfakes.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation Metrics</title>
      <p>
        There are a number of methods developed to evaluate the quality of output produced by
GANs and to measure the variability in a set of images of faces and we discuss some of
these here. For a more detailed description of GAN output see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>3.1 Inception Score (IS)</title>
        <p>
          Inception Score was first introduced by Salimans et al. [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], and is the most
common method used for evaluating GAN outputs. It uses a pre-trained inception model
to classify generated images and calculates probabilities of each image belonging to
each class, and looks at the label distribution. Images with high probability towards one
class/label are considered high quality.
        </p>
        <p>
          In summary, Inception Score actually captures two properties of a generated dataset:
1. Image Quality: How highly an image belongs to one class as classified by an
inception classifier . . . do they look similar to a specific object?
2. Image Diversity: How many different images are generated by the GAN . . . is there
a range of different objects generated?
Inception score has a lowest value of 1.0 and higher values indicate an improving
quality of the GAN [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. However, even with these properties, IS has its limitations
as shown in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Firstly, it favours GANs which can store training data and generate
images around centers of data modes and secondly, since this method uses an Inception
Classifier which is trained on the ImageNet dataset with many object classes, it may
uplift those models which produce good images of objects. A third limitation of IS is
that since the score never takes a real dataset into account and evaluates the quality of
a GAN based on it’s generated dataset, this can be deceptive. This may favour GANs
which produce clear and diverse images of any object, far from a real dataset.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Fre´chet Inception Distance (FID)</title>
        <p>
          FID is another popular method for GAN evaluation introduced by Heusel et al. in
2017 [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. It uses feature vectors of real data and generated data and calculates
distances between them. The FID score is used to evaluate the quality of images generated
by GANs, and lower scores have been shown to correlate well with higher quality
generated images [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>
          Unlike Inception Score (IS), FID captures the statistics of generated data and
compares it with the statistics of real data. It is similar to IS in the way that it also uses the
inception v3 model. Instead of using the last output layer of the model, it uses the last
coding layer to capture specific features of the input data. These are collected for both
real and generated data. The distance between two distributions, real and generated, is
then calculated using Fre´chet-distance [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] which itself uses the Wasserstein-2 distance
which is a calculation between multi-variate Gaussians fitted to data embedded into a
feature space [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Lower distance values convey that the generated dataset is of high
quality and similar to real dataset [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>
          A model that generates only one image per class will have a bad FID score whereas
the same case will have high IS. FID compares data between real and generated data
sets whereas IS only measures diversity and quality of a generated dataset. Unlike IS,
data scores will be bad on an FID scale in cases where there is noise or other additions
to the data [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>OpenFace Python Library</title>
        <p>
          OpenFace is an open source general-purpose library for face recognition [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] with
various features including dlib’s face landmark detector [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Landmarks are used to crop
images to ensure only facial data is passed to the neural network for training, producing
a low-dimensional face representation for the faces in images [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. OpenFace includes
a function to calculate the squared L2 distance [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] between facial representations,
providing a comparison function among faces in a a dataset. An image in the dataset can
be paired with every other image in the dataset and the squared L2 distance computed,
ranging from 0 to 4, with 0 meaning the faces in two compared images are more likely
to be of the same person [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>In our work we applied OpenFace to an image set generated by StyleGAN2 to
measure the degree of variability among the generated faces and we computed the mean
and variance of inter-image scores among the images. To confirm our approach, two
datasets of facial images were generated, each with 100 images of the same person
taken from a smartphone in burst mode. In one dataset, the facial expressions were
kept the same and we called this dataset the “Monotone” dataset. In the second dataset,
various facial expressions were captured called the “Varied” dataset. The number of
comparisons this requires is 4,950 for each dataset from which we compute mean and
variance.</p>
        <p>
          Figure 2 shows a subset of each dataset with calculated mean and variance in
Table 1. The Monotone dataset gave a smaller mean and variance score which denotes the
person in the dataset is same but with less variation in facial expression compared to the
other dataset which has variability in facial expressions of the individual, though since
the mean is still close to zero, the person in the dataset is the same person.
face recognition is a simple library in Python for face recognition which also uses
dlib’s facial landmark detector [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and has a comparison feature which calculates
distance between facial landmarks of two images. Given a certain threshold, it returns a
True/False whether the person in both images is the same or not [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. To show its
capabilities power, we compared two images of the same individual shown in Figure 3, the
first taken in 2007 and the second in 2019 and [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] detects these as the same person.
For our purposes we iterate through the GAN-generated images and compare each with
the original images used to train StyleGAN2 using face recognition as another way of
evaluating the GAN-generated dataset.
        </p>
        <p>
          To further validate this method, we took 10 pairs of celebrity face images shown in
Figure 4, each pair of images taken years apart [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] and using face recognition we
compared them, observing that each pair is identified by face recognition as of same
person.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Facial Image Data Gathering for GAN Training</title>
      <p>
        For training a deepfake video generation system there are numerous datasets available
from other studies [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] where facial data was gathered but for almost all of these the
data was either, not in sufficient quantity or quality to train a GAN, or consists of faces
of different individuals whereas we require images of the same person.
      </p>
      <p>
        The GAN we use is StyleGAN2 developed by Karras et al. in 2019 with
improvements over its predecessor StyleGAN [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. StyleGAN2 can generate images up
to 1024x1024 pixels in size but this requires hardware intensive training. We worked
at 256x256 pixels image resolution considering the limited hardware available for this
study and we generated our own dataset by extracting frames from videos of an
individual.
      </p>
      <p>
        As stated in [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], John Oliver is “an English-American comedian, writer, producer,
political commentator, actor, and television host”. He is the host of a popular HBO
Series “Last week tonight with John Oliver” [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. We chose to use videos of him because
he is always in the frame and at same position on screen and talks with various facial
expressions. His recent videos have a plain background because of being shot at a home
studio due to COVID-19.
      </p>
      <p>
        Using 20 videos from the official YouTube channel2 we extracted 132,000 frames
cropped to John Oliver’s face area with the remaining part of the frames ignored. We
resized the images to 256x256 pixels for model training, using the Pillow python library
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. We trained StyleGAN2 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] by converting to TFRecords format [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] which took
around 30 minutes of processing and around 27GB of storage on a system with 30GB
of memory and 1 NVIDIA Tesla V100 GPU on the Google Cloud Platform.
      </p>
      <p>
        StyleGAN2 training uses the TFRecord format [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] for storing a sequence of binary
records which is advantageous in cases of large datasets which cannot be held in
memory during training and only require the dataset one at a time (e.g. a batch) when they
loaded from disk and processed [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The original dataset used by StyleGAN2 training
had 70K images from the Flickr Faces HQ (FFHQ) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] at 1024x1024 repeated 25M
times. That means if the dataset contains 70K images, the GAN would go over these
repetitively 25M=70K ' 357 to learn the salient features from 25M images. The
authors state they performed training with 8 GPUs for almost 10 days to generate high
quality images [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>Using our John Oliver dataset of 132,000 images, training was completed with
the number of images set to 500K. This made the GAN go over the dataset only
500K=132K ' 3:8 times. Since the dataset size was significant and had variation
in images even though all images are of a single person, the GAN was able to generate
quality output images and Figure 5 shows some of these images.</p>
      <p>After generating 1,000 images of John Oliver from our trained model, we applied
the metrics described in Section 3 and the results are shown in Table 2. Out of 1,000
generated faces, the face recognition library recognised 929 faces as John’s faces,
re</p>
      <sec id="sec-4-1">
        <title>2 https://www:youtube:com/user/LastWeekTonight</title>
        <p>jected 66 images and 5 of the 1,000 images were not recognised as faces due to having
noise in them around the facial landmarks.</p>
        <p>Metric
Inception Score "
Fre´chet Inception Distance #
OpenFace library comparison #
face recognition library comparison</p>
        <p>Value
Mean: 1.635
Variance: 0.011</p>
        <p>
          These figures gives us certainty that the images generated by the GAN are
recognised as John Oliver’s face. The comparison feature of the OpenFace Library [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] gave
a mean score of 0.401 and variance of 0.068 which means that the facial data is of the
same person and it has sufficient variability. The calculated Inception score [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] for the
same dataset was 1.628.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Deepfake Video Generation</title>
      <p>
        we used an NLP-based chatbot python library Chatterbot [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to generate dialogues
which we could use for our source videos. We built a simple bot and trained it on an
English language corpus which has various topics of conversation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Since this was
built to give dialogue for our videos, we generated only a few dialogues with the bot
and one conversation with random queries after training goes as follows:
Human: Do you know gossip?
Bot: Gregory said I respond to the current line, not with respect to the entire
conversation. Does that count as gossip?
Human: What is AI?
Bot: Artificial Intelligence is the branch of engineering and science devoted to
constructing machines that think.
      </p>
      <p>Human: What is a super computer?
Bot: Computers which can perform very large numbers of calculations at very high
speed and accuracy are called super computers.</p>
      <p>Human: How does a computer work?
Bot: Computers are very dumb. They only execute instructions given by humans.
Human: What was the first computer?</p>
      <p>
        We recorded a video with our own subject responding in the above dialogue. The
DeepFaceLab [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] face swapping framework was then used to swap the face of our subject
with faces from the 1,000 GAN-generated facial images of John Oliver.
DeepFaceLab’s model was trained to find the facial area of our subject in our recorded video
while the GAN-generated John Oliver facial images were fed to the model. The model
was trained for almost 80,000 iterations to learn the features from our subject’s face
and John Oliver’s face and swap them. The output of this swap was a regenerated video
dialogue at 256x256 pixel resolution with John Oliver’s face having same dialogue as
our subject, in other words it is a deepfake. A sill image is shown in Figure 6 and the
video is available for anonymous viewing at https://bit:ly/31xEjgy
In this study we introduced and developed an idea to consolidate various techniques
available to develop a video dialogue of an individual speaking to camera based on
a limited dataset of images of the individual. We generated a dataset of 132,000 video
frames extracted from TV night show host John Oliver’s YouTube videos and trained the
StyleGAN2 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] GAN to generate a sample of 1,000 images and 4 evaluation methods
were used to measure the variability and quality of these images. These included the
Python libraries OpenFace and face recognition which measure facial variability in a
dataset of faces.
      </p>
      <p>
        We then generated several dialogues from a chatbot we trained and recorded a video
with our own subject responding as part of one of these dialogues. We applied a Face
Swapping Framework DeepFaceLab [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] to swap the face of our subject with that of
the GAN-generated John Oliver images. The final video output of swapped dialogues
alongside the original dialogues is publicly and anonymously available at https://
bit:ly/31xEjgy.
      </p>
      <p>We observe that the deepfake video based on a synthetic set of 1,000 images of
John Oliver is of good quality. There is some colour variation across frames which we
could easily have smoothed using a tool like OpenCV but we decided to leave it there
to emphasise to the viewer how the video was created.</p>
      <p>
        Our future work is to repeat the video generation process using a more
homogeneous set of images generated by the GAN which synthesises images of John Oliver,
and then to compare the quality of the generated deepfakes. While most work on
deepfakes has been to detect them, such as [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], there is little work reported to date on
measuring their quality so ultimately the measure of deepfake quality may be how easily it
is to be recognised as a deepfake.
      </p>
      <p>Acknowledgments. We wish to thank Satyam Ramawat for acting as a test subject for
our image generation and AS is part-funded by Science Foundation Ireland under grant
number SFI/12/RC/2289 P2, co-funded by the European Regional Development Fund.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Helmut</given-names>
            <surname>Alt</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Godau</surname>
          </string-name>
          .
          <article-title>Computing the Fre´chet distance between two polygonal curves</article-title>
          .
          <source>International Journal of Computational Geometry &amp; Applications</source>
          ,
          <volume>5</volume>
          (
          <issue>01n02</issue>
          ):
          <fpage>75</fpage>
          -
          <lpage>91</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Brandon</given-names>
            <surname>Amos</surname>
          </string-name>
          , Bartosz Ludwiczuk, and
          <string-name>
            <given-names>Mahadev</given-names>
            <surname>Satyanarayanan</surname>
          </string-name>
          .
          <article-title>Openface: A generalpurpose face recognition library with mobile applications</article-title>
          .
          <source>Technical report</source>
          ,
          <source>CMU-CS-16- 118</source>
          , CMU School of Computer Science,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Brandon</given-names>
            <surname>Amos</surname>
          </string-name>
          , Bartosz Ludwiczuk, and Mahadev Satyanarayanan.
          <article-title>OpenFace: Free and open source face recognition with deep neural networks</article-title>
          ,
          <source>2016 (accessed July 07</source>
          ,
          <year>2020</year>
          ). https://cmusatyalab:github:io/openface/demo-2-comparison/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Ali</given-names>
            <surname>Borji</surname>
          </string-name>
          .
          <article-title>Pros and cons of GAN evaluation measures</article-title>
          .
          <source>Computer Vision</source>
          and Image Understanding,
          <volume>179</volume>
          :
          <fpage>41</fpage>
          -
          <lpage>65</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Jason</given-names>
            <surname>Brownlee</surname>
          </string-name>
          . Gentle Introduction to Vector
          <source>Norms in Machine Learning</source>
          ,
          <source>2018 (Last accessed July 26</source>
          ,
          <year>2020</year>
          ). https://machinelearningmastery:com/vectornorms-machine-learning/.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Jason</given-names>
            <surname>Brownlee</surname>
          </string-name>
          .
          <article-title>How to Implement the Fre´chet Inception Distance (FID) for Evaluating GANs,</article-title>
          <year>2019</year>
          (Last accessed June 26,
          <year>2020</year>
          ). https:// machinelearningmastery:com/how-to
          <article-title>-implement-the-frechetinception-distance-fid-from-scratch.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Jason</given-names>
            <surname>Brownlee</surname>
          </string-name>
          .
          <article-title>How to Implement the Inception Score (IS) for Evaluating GANs,</article-title>
          <year>2019</year>
          (Last accessed June 26,
          <year>2020</year>
          ). https://machinelearningmastery:com/how
          <article-title>-toimplement-the-inception-score-from-scratch-for-evaluatinggenerated-images.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Gunther</given-names>
            <surname>Cox</surname>
          </string-name>
          .
          <article-title>Building chatbot using chatterbot, (Last accessed Aug 09,</article-title>
          <year>2020</year>
          ). https: //chatterbot:readthedocs:io/en/stable/.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Gunther</given-names>
            <surname>Cox. ChatterBot Language Training</surname>
          </string-name>
          <string-name>
            <surname>Corpus</surname>
          </string-name>
          ,
          <source>(Last accessed Aug 09</source>
          ,
          <year>2020</year>
          ). https://github:com/gunthercox/chatterbot-corpus/tree/master/ chatterbot corpus/data/english.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Brian</surname>
            <given-names>Dolhansky</given-names>
          </string-name>
          , Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer.
          <article-title>The deepfake detection challenge (DFDC) preview dataset</article-title>
          .
          <source>arXiv preprint arXiv:1910.08854</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Bob Fisher. CVonline: Image Databases,
          <source>2019 (Last accessed July 26</source>
          ,
          <year>2020</year>
          ). http: //homepages:inf:ed:ac:uk/rbf/CVonline/Imagedbase:htm#face.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Gamauf</surname>
          </string-name>
          . Tensorflow Records?
          <article-title>What they are</article-title>
          and how to use them,
          <source>2018 (Last accessed July 28</source>
          ,
          <year>2020</year>
          ). https://medium:com/mostly-ai/
          <article-title>tensorflowrecords-what-they-are-and-how-to-use-them-c46bc4bbb564.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Adam</given-names>
            <surname>Geitgey</surname>
          </string-name>
          .
          <source>face recognition</source>
          ,
          <source>2017 (accessed July 07</source>
          ,
          <year>2020</year>
          ). https://github:com/ ageitgey/facen recognition/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Martin</surname>
            <given-names>Heusel</given-names>
          </string-name>
          , Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter.
          <article-title>GANs trained by a two time-scale update rule converge to a local Nash equilibrium</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>6626</fpage>
          -
          <lpage>6637</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Tero</surname>
            <given-names>Karras</given-names>
          </string-name>
          , Samuli Laine, and
          <string-name>
            <given-names>Timo</given-names>
            <surname>Aila</surname>
          </string-name>
          .
          <article-title>A style-based generator architecture for generative adversarial networks</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pages
          <fpage>4401</fpage>
          -
          <lpage>4410</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Tero</surname>
            <given-names>Karras</given-names>
          </string-name>
          , Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and
          <string-name>
            <given-names>Timo</given-names>
            <surname>Aila</surname>
          </string-name>
          .
          <article-title>Analyzing and improving the image quality of StyleGAN</article-title>
          .
          <source>In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          , pages
          <fpage>8110</fpage>
          -
          <lpage>8119</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Davis</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>King</surname>
          </string-name>
          .
          <article-title>Dlib-ml: A machine learning toolkit</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          ,
          <volume>10</volume>
          :
          <fpage>1755</fpage>
          -
          <lpage>1758</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. NVidia Labs.
          <source>StyleGAN Github</source>
          ,
          <source>2019 (Last accessed July 26</source>
          ,
          <year>2020</year>
          ). https:// github:com/NVlabs/stylegan2.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Ivan</surname>
            <given-names>Petrov</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Daiheng</given-names>
            <surname>Gao</surname>
          </string-name>
          , Nikolay Chervoniy, Kunlin Liu, Sugasa Marangonda, Chris Ume´,
          <string-name>
            <surname>Jian</surname>
            <given-names>Jiang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luis</surname>
            <given-names>RP</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheng</surname>
            <given-names>Zhang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pingyu Wu</surname>
          </string-name>
          , et al.
          <article-title>DeepFaceLab: A simple, flexible and extensible face swapping framework</article-title>
          .
          <source>arXiv preprint arXiv:2005.05535</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. python pillow. Pillow, friendly PIL fork,
          <source>(Last accessed July 26</source>
          ,
          <year>2020</year>
          ). https:// github:com/python-pillow/Pillow.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Tim</surname>
            <given-names>Salimans</given-names>
          </string-name>
          , Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and
          <string-name>
            <given-names>Xi</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Improved techniques for training GANs</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>2234</fpage>
          -
          <lpage>2242</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>S.</given-names>
            <surname>Sengupta</surname>
          </string-name>
          , J.C. Cheng, C.D.
          <string-name>
            <surname>Castillo</surname>
            ,
            <given-names>V.M.</given-names>
          </string-name>
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Chellappa</surname>
            , and
            <given-names>D.W.</given-names>
          </string-name>
          <string-name>
            <surname>Jacobs</surname>
          </string-name>
          .
          <article-title>Frontal to profile face verification in the wild</article-title>
          .
          <source>In IEEE Conference on Applications of Computer Vision</source>
          ,
          <year>February 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Wikipedia</surname>
          </string-name>
          . John Oliver,
          <source>Last accessed July 28</source>
          ,
          <year>2020</year>
          . https://en:wikipedia:org/ wiki/John Oliver.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Wikipedia</surname>
          </string-name>
          . Generative adversarial network,
          <source>(Last accessed on May 03</source>
          ,
          <year>2020</year>
          ). https: //en:wikipedia:org/wiki/Generative adversarial network#History.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>