<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Scene Change Task: Take me to Paris</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simon Brugman</string-name>
          <email>simon.brugman@cs.ru.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martha Larson</string-name>
          <email>m.larson@cs.ru.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Radboud University</institution>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>21</fpage>
      <lpage>25</lpage>
      <abstract>
        <p>This paper proposes the Scene Change Task benchmark. Task participants would create fun faux photos that take the place of real photos, but are still understandably not real. The main motivation is the investigation of an alternative for recent methods, which are aimed at realism and are problematic because they can be deceptive. Task participants would be provided with images of people and asked to develop an approach that changes the background scene to Paris. Two annotated subsets of existing datasets serve as starting resources for task participants. The submissions would be evaluated by two user studies, one time-restricted and one unrestricted. Study participants look at a mixture of Scence Change photos and real photos and answer the question, “Who was really there?” Successful Scene Change approaches demonstrate a high user-study error rate on the time-restricted experiment and a low error rate on the unrestricted experiment.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        The goal of the Scene Change Task benchmark is to explore Scene
Change photos, which we define to be fun faux photos that fool
you at first, but can be identified to be composites upon closer
inspection. Current research on image composites has a clear focus
on realism [
        <xref ref-type="bibr" rid="ref18 ref19 ref23 ref25">18, 19, 23, 25</xref>
        ]. In contrast, here, we investigate
composite images that are acceptable, but not realistic enough to deceive.
Scene Change photos leverage the flexibility of human
interpretation, e.g., it is known that in artistic work, implausible lighting and
colors do not interfere with the viewer’s understanding of the scene
and often go unnoticed at first glance [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Scene Change photos
can be considered “shallow fakes” to emphasize the contrast with
deepfakes, which conventionally target complete visual realism.
      </p>
      <p>Participants in the Scene Change Task would be provided with
images of people and asked to change the background scene to Paris.
Participants develop approaches that create image composites. The
background of the composite should be recognizable as the original
background image. Overall, the composite should appear visually
realistic to users at the first glance, but be identifiable as a composite
if inspected for more than two seconds.</p>
      <p>
        With this task, we wish to gain a better understanding of
deceptiveness and realism in multimedia. Our goal is to develop methods
that would allow people to enjoy a new genre of creations, while at
the same time being aware of the fabrication. Our hope is that fun
faux photos will, within the formal theory on creativity, fun, and
intrinsic motivation, cf. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], provide a sort of intrinsic reward. If
people pick up the practice of fun faux photos, it has clear potential
to address the negative aspects of tourism, including
environmental impact and personal risk. Social media enthusiasts go to great
lengths to take pictures at popular locations, waiting in line,
making a lot of efort, and sometimes taking extreme risks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Fun
faux photos allow social media users to avoid queues, stay safer
(physically and in terms of privacy), and prevent negative impact
on local ecosystems, without sacrificing their holiday pictures.
      </p>
      <p>
        We propose that the task initially focuses on Paris because it
is a highly popular tourist destination. In 2017, France was the
most visited country in the world [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], with Paris having a total of
23,6 million hotel visits [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Focus on Paris allows us to leverage
the already-existing Paris Dataset [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Other backgrounds, going
beyond landmarks and urban scenes, will be interesting to explore
in the future.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>This section reviews recent work on image compositing. The papers
included in this section have been selected based on what, to the
best of our knowledge, we find to be most relevant for participants
developing approaches for the Scene Change task.</p>
      <p>Realistic compositing Approaches focusing on realistically
compositing images can be partitioned into approaches optimizing
style consistency, spatial consistency, or both jointly.</p>
      <p>
        Style consistency: Several compositing approaches have focused
on style consistency of the composed image, i.e., the position of the
foreground is given, and the algorithm only alters the style. Tsai
et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] propose a end-to-end convolutional neural network for
image harmonization, which takes into account global, contextual,
and semantic information. The approach uses fixed-size images.
(There is a demo1 available.) In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Luan et al. improve on this
by incorporating local information. This builds on earlier work
concerning realism in composed images [
        <xref ref-type="bibr" rid="ref24 ref28 ref8">8, 24, 28</xref>
        ].
      </p>
      <p>
        Spatial consistency: Other approaches primarily investigate the
spatial consistency of the image composition. Lin et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] propose
ST-GAN, which uses a Generative Adversarial Network (GAN) and
a Spatial Transformer Network operating in the geometric warp
parameter space. The method works with a fixed image resolution
and does not take other factors such as style into account. Tripathi
et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] also use a form of adversarial learning to learn realistic
compositions.
      </p>
      <p>
        Joint style and spatial consistency: Zhan et al. [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] and Chen et
al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] propose GAN architectures for joint optimization of style
and spatial consistency.
      </p>
      <p>
        Retrieval There has also been related retrieval-based research,
where either the foreground segment or background scene is
retrieved from a collection of images. Lalonde et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] created a
system to retrieve objects into a background given a position. The
objects are selected based on properties that match the background,
including camera position, lightning and resolution.
1https://github.com/wasidennis/DeepHarmonization
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>TASK DEFINITION AND DATA</title>
      <p>The main task of Scene Change is image compositing, defined as:
Given a foreground segment and a background
image, develop an approach to combine them to
create a Scene Change photo.</p>
      <p>The foreground segments are specified and the background images
are sourced from an image collection containing images of several
popular landmarks in Paris.</p>
      <p>It is dificult define a fair comparison of Scene Change approaches
that introduce radical visual changes in the process of combining
the foreground and background images. For this reason, we
propose adding a constraint to the task formulation. Specifically, Scene
Change photo must be creating by changing the foreground
segment, but not the background image. In the future, other constraints
can also be explored.</p>
      <p>Participants are also encouraged to develop approaches for two
sub-tasks:</p>
      <p>Background image retrieval: given a foreground segment and
a background image collection, the participant should retrieve a
suitable background image to blend the foreground segment with
respect to. The suitability of a background image is determined by, e.g.,
lighting conditions and perspective. The retrieval method might
provide acceptable results with a far lower complexity than
applying the latest developments, e.g., GANs, to adapt the foreground
segment to a specific background image. Given the availability of
large collections of images of popular landmarks, it is cheaper to
select than to modify.</p>
      <p>
        Foreground segmentation: Image segmentation has seen
remarkable advances recently [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], but remains a dificult task,
especially with respect to details [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Participants could refine the
foreground segmentation to gain more insight. Recent unsupervised
approaches might prove interesting, such as [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        Participants would be provided with two subsets of existing
datasets containing foreground segments and background images
respectively. The foreground segments are chosen from the ADE20k
dataset [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. Images were manually selected based on these criteria:
(1) the label is“person, individual, someone, somebody, mortal, soul”
(2) the foreground segment is facing the camera and in an upright
position (3) the foreground segment is not occluded by other objects
(e.g., by a guitar or desk) (4) the foreground segment is a coherent
social group, i.e., no crowds. This procedure resulted in a subset
consisting of 60 segments (40 for validation and 20 for test). We
chose the ADE20K dataset, due to the limitations that we discovered
while exploring various other existing datasets as foreground
images. An example is segmentation quality in MHP-v2, Densepose,
and COCO. Upon inspection, ADE20K segmentations appear to
be more refined compared to the rough polygon annotations, cf.
COCO 2017.
      </p>
      <p>
        The background collection is a subset of the Paris Dataset [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ],
which consists of images labelled as a particular Paris landmark.
The images are sampled in two stratified sets of approximately
equal size (2455 for validation and 2460 for test).
      </p>
      <p>Furthermore, a small novel dataset was collected for the
evaluation of the approaches. This evaluation set, called People in Paris
dataset, consists of 147 images of people posing with the landmarks
in the Paris dataset for the purpose of evaluating participant
submissions in an user study. The images are collected using Creative
Commons (CC) Search, which aggregates works from providers
such as Flickr that are CC licensed. We searched for combinations
of the landmark name and words as ‘selfie’ and ‘in front of the’. The
dataset will be publicly released after evaluation.</p>
      <p>The participants use the validation sets to develop their
approaches, which are then evaluated using the test sets.
4</p>
    </sec>
    <sec id="sec-4">
      <title>EVALUATING SCENE CHANGE PHOTOS</title>
      <p>
        To evaluate Scene Changes photos, we defined two unpaired user
studies: time restricted and time unrestricted. An approach
produces successful Scene Change photos if it demonstrates a high
error rate on the time-restricted experiment and a low error rate on
the unrestricted experiment. Recent work indicates that
adversarial examples can fool time-limited humans [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which shapes our
evaluation setup.
      </p>
      <p>
        Setup Task participants submit Scene Change compositions for
the 20 images in the test set, which are evaluated with the user
studies. During the studies, study participants are randomly a mixture
of real and Scene Change photos and are asked “Who was really
there?”, i.e., to identify the photos that are real. We propose two
unpaired user studies, one time-restricted (2 seconds per photo),
similar to [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and one unrestricted. All study participants, after
being instructed, start with two practice questions. Submissions
are ranked on the diference in error rates between the two
experiments.
      </p>
      <p>
        Platform Amazon Mechanical Turk (MTurk) would be used for
recruiting participants. We would use the Qualtrics software for
the survey itself. We aim to have a sample size of 30 per experiment
(i.e., 60 study participants per submission), which is chosen based
on the study budget. Participants on mobile phones are excluded.
Photo pairs are randomized. Previous work has identified the issue
of worker seriousness [
        <xref ref-type="bibr" rid="ref15 ref9">9, 15</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], simple measures were able
to increase the correlation of responses to expert ratings. We will
explicitly state that we check for invalid responses. Furthermore,
the dwell time for each page will be measured (i.e., to filter bots or
people who do not read the instructions). Finally, we add a text box
where participants state if anything about the survey was unclear
so that we can gather feedback.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSION</title>
      <p>Scene Change is a benchmarking task on fun faux photos. It explores
a diferent notion of realism than what is commonly targeted by
image compositing approaches. A Scene Change photo is considered
successful if it looks real on first glance, but can be identified as a
composite upon closer inspection. We propose to evaluate Scene
Change photos via two user studies, one time-restricted and one
unrestricted. The diference between the two studies reflects the
success of approaches creating Scene Change photos.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENTS</title>
      <p>The concept for the Scence Change Task was developed withing
‘Pixel Privacy’ project of the Open Mind research program, financed
by the Netherlands Organization for Scientific Research (NWO).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Agam</given-names>
            <surname>Bansal</surname>
          </string-name>
          , Chandan Garg, Abhijith Pakhare, and
          <string-name>
            <given-names>Samiksha</given-names>
            <surname>Gupta</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Selfies: A boon or bane? Journal of family medicine</article-title>
          and
          <source>primary care 7</source>
          ,
          <issue>4</issue>
          (
          <year>2018</year>
          ),
          <fpage>828</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Cavanagh</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>The artist as neuroscientist</article-title>
          .
          <source>Nature</source>
          <volume>434</volume>
          ,
          <issue>7031</issue>
          (
          <year>2005</year>
          ),
          <fpage>301</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bor-Chun Chen</surname>
            and
            <given-names>Andrew</given-names>
          </string-name>
          <string-name>
            <surname>Kae</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Toward Realistic Image Compositing with Adversarial Learning</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>8415</fpage>
          -
          <lpage>8424</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Liang-Chieh</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Yukun Zhu, George Papandreou, Florian Schrof, and
          <string-name>
            <given-names>Hartwig</given-names>
            <surname>Adam</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation</article-title>
          .
          <source>In Proceedings of the European Conference on Computer Vision</source>
          . 801-
          <fpage>818</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5] Ofice de tourisme et des congres (Paris).
          <year>2017</year>
          . Le Tourisme a Paris.
          <source>(June</source>
          <year>2017</year>
          ). https://fr.zone-secure.net/42102/1019605/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Gamaleldin</given-names>
            <surname>Elsayed</surname>
          </string-name>
          , Shreya Shankar, Brian Cheung, Nicolas Papernot, Alexey Kurakin, Ian Goodfellow, and
          <string-name>
            <surname>Jascha</surname>
          </string-name>
          Sohl-Dickstein.
          <year>2018</year>
          .
          <article-title>Adversarial Examples that Fool both Computer Vision and Time-Limited Humans</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          .
          <volume>3914</volume>
          -
          <fpage>3924</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Yossi</given-names>
            <surname>Gandelsman</surname>
          </string-name>
          , Assaf Shocher, and
          <string-name>
            <given-names>Michal</given-names>
            <surname>Irani</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>"DoubleDIP": Unsupervised Image Decomposition via Coupled Deep-ImagePriors</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>11026</fpage>
          -
          <lpage>11035</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Micah</surname>
            <given-names>K Johnson</given-names>
          </string-name>
          , Kevin Dale, Shai Avidan, Hanspeter Pfister, William T Freeman, and
          <string-name>
            <given-names>Wojciech</given-names>
            <surname>Matusik</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>CG2Real: Improving the Realism of Computer Generated Images using a Large Collection of Photographs</article-title>
          .
          <source>IEEE Transactions on Visualization and Computer Graphics</source>
          <volume>17</volume>
          ,
          <issue>9</issue>
          (
          <year>2011</year>
          ),
          <fpage>1273</fpage>
          -
          <lpage>1285</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Aniket</given-names>
            <surname>Kittur</surname>
          </string-name>
          , Ed H Chi, and
          <string-name>
            <given-names>Bongwon</given-names>
            <surname>Suh</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Crowdsourcing user studies with Mechanical Turk</article-title>
          .
          <source>In Proceedings of the SIGCHI conference on Human Factors in Computing Systems</source>
          .
          <volume>453</volume>
          -
          <fpage>456</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Jean-François</surname>
            <given-names>Lalonde</given-names>
          </string-name>
          , Derek Hoiem, Alexei A Efros, Carsten Rother, John Winn, and Antonio Criminisi.
          <year>2007</year>
          .
          <article-title>Photo Clip Art</article-title>
          .
          <source>ACM Transactions on Graphics (TOG) 26</source>
          ,
          <issue>3</issue>
          (
          <year>2007</year>
          ),
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Liunian</given-names>
            <surname>Harold</surname>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Mark Yatskar</given-names>
            , Da Yin,
            <surname>Cho-Jui Hsieh</surname>
          </string-name>
          , and
          <string-name>
            <surname>Kai-Wei Chang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>VisualBERT: A Simple and Performant Baseline for Vision and Language</article-title>
          . arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>03557</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Chen-Hsuan</surname>
            <given-names>Lin</given-names>
          </string-name>
          , Ersin Yumer, Oliver Wang,
          <string-name>
            <surname>Eli Shechtman</surname>
            , and
            <given-names>Simon</given-names>
          </string-name>
          <string-name>
            <surname>Lucey</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>9455</fpage>
          -
          <lpage>9464</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Fujun</surname>
            <given-names>Luan</given-names>
          </string-name>
          , Sylvain Paris, Eli Shechtman, and Kavita Bala.
          <year>2018</year>
          .
          <article-title>Deep Painterly Harmonization</article-title>
          . In Computer Graphics Forum, Vol.
          <volume>37</volume>
          . Wiley Online Library,
          <fpage>95</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Rafał</surname>
            <given-names>K</given-names>
          </string-name>
          <string-name>
            <surname>Mantiuk</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Quantifying image quality in graphics: Perspective on subjective and objective metrics and their performance</article-title>
          .
          <source>In Human Vision and Electronic Imaging XVIII</source>
          , Vol.
          <volume>8651</volume>
          . International Society for Optics and Photonics,
          <year>86510K</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Winter</given-names>
            <surname>Mason</surname>
          </string-name>
          and
          <string-name>
            <given-names>Siddharth</given-names>
            <surname>Suri</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Conducting behavioral research on Amazon's Mechanical Turk</article-title>
          .
          <source>Behavior Research Methods</source>
          <volume>44</volume>
          ,
          <issue>1</issue>
          (
          <issue>01</issue>
          <year>Mar 2012</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Philbin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Chum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Isard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sivic</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1-8.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Jürgen</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Formal Theory of Creativity, Fun, and Intrinsic Motivation (</article-title>
          <year>1990</year>
          -
          <fpage>2010</fpage>
          ).
          <source>IEEE Transactions on Autonomous Mental Development</source>
          <volume>2</volume>
          ,
          <issue>3</issue>
          (
          <year>2010</year>
          ),
          <fpage>230</fpage>
          -
          <lpage>247</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Justus</surname>
            <given-names>Thies</given-names>
          </string-name>
          , Michael Zollhofer, Marc Stamminger,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Theobalt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Matthias</given-names>
            <surname>Nießner</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Face2Face: Real-time Face Capture and Reenactment of RGB Videos</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>2387</fpage>
          -
          <lpage>2395</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Justus</surname>
            <given-names>Thies</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Zollhöfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Theobalt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Marc</given-names>
            <surname>Stamminger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Matthias</given-names>
            <surname>Nießner</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>HeadOn: Real-time Reenactment of Human Portrait Videos</article-title>
          .
          <source>ACM Transactions on Graphics (TOG) 37</source>
          ,
          <issue>4</issue>
          (
          <year>2018</year>
          ),
          <fpage>164</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Shashank</surname>
            <given-names>Tripathi</given-names>
          </string-name>
          , Siddhartha Chandra, Amit Agrawal, Ambrish Tyagi,
          <string-name>
            <surname>James M Rehg</surname>
            ,
            <given-names>and Visesh</given-names>
          </string-name>
          <string-name>
            <surname>Chari</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Learning to Generate Synthetic Data via Compositing</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>461</fpage>
          -
          <lpage>470</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Yi-Hsuan</surname>
            <given-names>Tsai</given-names>
          </string-name>
          , Xiaohui Shen,
          <string-name>
            <given-names>Zhe</given-names>
            <surname>Lin</surname>
          </string-name>
          , Kalyan Sunkavalli, Xin Lu, and
          <string-name>
            <surname>Ming-Hsuan Yang</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Deep Image Harmonization</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>3789</fpage>
          -
          <lpage>3797</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>[22] World Tourism Organization (UNWTO)</source>
          .
          <year>2017</year>
          .
          <article-title>UNWTO Tourism Highlights: 2017 Edition</article-title>
          . Madrid.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Chaowei</surname>
            <given-names>Xiao</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jun-Yan</surname>
            <given-names>Zhu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Bo</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Warren He</surname>
          </string-name>
          , Mingyan Liu, and
          <string-name>
            <given-names>Dawn</given-names>
            <surname>Song</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Spatially Transformed Adversarial Examples</article-title>
          . arXiv preprint arXiv:
          <year>1801</year>
          .
          <volume>02612</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Su</surname>
            <given-names>Xue</given-names>
          </string-name>
          , Aseem Agarwala, Julie Dorsey, and
          <string-name>
            <given-names>Holly</given-names>
            <surname>Rushmeier</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Understanding and Improving the Realism of Image Composites</article-title>
          .
          <source>ACM Transactions on Graphics (TOG) 31</source>
          ,
          <issue>4</issue>
          (
          <year>2012</year>
          ),
          <fpage>84</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Jiahui</surname>
            <given-names>Yu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Zhe</given-names>
            <surname>Lin</surname>
          </string-name>
          , Jimei
          <string-name>
            <surname>Yang</surname>
            , Xiaohui Shen,
            <given-names>Xin</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
          </string-name>
          , and Thomas S Huang.
          <year>2018</year>
          .
          <article-title>Generative Image Inpainting with Contextual Attention</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>5505</fpage>
          -
          <lpage>5514</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Fangneng</surname>
            <given-names>Zhan</given-names>
          </string-name>
          , Hongyuan Zhu, and
          <string-name>
            <given-names>Shijian</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Spatial Fusion GAN for Image Synthesis</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>3653</fpage>
          -
          <lpage>3662</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Bolei</surname>
            <given-names>Zhou</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Hang</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Xavier</given-names>
            <surname>Puig</surname>
          </string-name>
          , Sanja Fidler, Adela Barriuso, and Antonio Torralba.
          <year>2017</year>
          .
          <article-title>Scene Parsing through ADE20K Dataset</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>633</fpage>
          -
          <lpage>641</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Jun-Yan</surname>
            <given-names>Zhu</given-names>
          </string-name>
          , Philipp Krahenbuhl,
          <source>Eli Shechtman, and Alexei A Efros</source>
          .
          <year>2015</year>
          .
          <article-title>Learning a Discriminative Model for the Perception of Realism in Composite Images</article-title>
          .
          <source>In Proceedings of the IEEE International Conference on Computer Vision</source>
          . 3943-
          <fpage>3951</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>