<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Synthesis and Visualization of Photorealistic Textures for 3D Face Reconstruction of Prehistoric Human?</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Moscow Institute of Physics and Technology (MIPT)</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>State Res. Institute of Aviation Systems (GosNIIAS)</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Reconstruction of face 3D shape and its texture is a challenging task in the modern anthropology. While a skilled anthropologist could reconstruct an appearance of a prehistoric human from its skull, there are no automated methods to date for automatic anthropological face 3D reconstruction and texturing. We propose a deep learning framework for synthesis and visualization of photorealistic textures for 3D face reconstruction of prehistoric human. Our framework leverages a joint face-skull model based on generative adversarial networks. Specifically, we train two image-to-image translation models to separate 3D face reconstruction and texturing. The first model translates an input depth map of a human skull to a possible depth map of its face and its semantic parts labeling. The second model, performs a multimodal translation of the generated semantic labeling to multiple photorealistic textures. We generate a dataset consisting of 3D models of human faces and skulls to train our 3D reconstruction model. The dataset includes paired samples obtained from computed tomography and unpaired samples representing 3D models of skulls of prehistoric human. We train our texture synthesis model on the CelebAMask-HQ dataset. We evaluate our model qualitatively and quantitatively to demonstrate that it provides robust 3D face reconstruction of prehistoric human with multimodal photorealistic texturing.</p>
      </abstract>
      <kwd-group>
        <kwd>Photogrammetry</kwd>
        <kwd>3D reconstruction</kwd>
        <kwd>facial approximation</kwd>
        <kwd>machine learning</kwd>
        <kwd>generative adversarial networks</kwd>
        <kwd>anthropology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Reconstruction of face 3D shape and its texture is a challenging task in the modern
anthropology. While a skilled anthropologist could reconstruct an appearance of a
prehistoric human from its skull, there are no automated methods to date for automatic
anthropological face 3D reconstruction and texturing.</p>
      <p>We propose a deep learning framework for synthesis and visualization of
photorealistic textures for 3D face reconstruction of prehistoric human. Our framweork leverages
a joint face-skull model based on generative adversarial networks. Specifically, we train
two image-to-image translation models to separate 3D face reconstruction and
texturing. The first model translates an input depth map of a human skull to a possible depth
map of its face and its semantic parts labeling. The second model, performs a
multimodal translation of the generated semantic labeling to multiple photorealistic textures.</p>
      <p>We generate a dataset consisting of 3D models of human faces and skulls to train our
3D reconstruction model. The dataset includes paired samples obtained from computed
tomography and unpaired samples representing 3D models of skulls of prehistoric
human. We train our texture synthesis model on the CelebAMask-HQ dataset. We evaluate
our model qualitatively and quantitatively to demonstrate that it provides robust 3D face
reconstruction of prehistoric human with multimodal photorealistic texturing.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>The development of modern technologies and the implementation of the new
technologies in computer vision and deep learning have opened up wide opportunities for
developing human face 3D reconstruction.
2.1</p>
      <p>
        Human Face 3D Reconstruction
Manual facial approximation now is presented by the three main techniques:
anthropometrical (American) method, anatomical (Russian) method, combination (British)
method. The first one is based on soft tissue data and requires highly experienced stuff.
Russian method [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is performed by modeling muscles, glands and cartilage placing
them onto a skull sequentially. This technique requires sufficient anatomical
knowledge for accurate facial approximation. British method exploits the data of both soft
tissue thickness and facial muscles.
      </p>
      <p>
        The using of the computer-aided techniques for digital data processing has opened
new possibilities for achieving realistic facial reconstruction. The facial approximation
can be carried out through a programmatic face modeling by a surface approximation
based on a skull 3D model and tissue thickness [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ]. The 3D reconstruction of the face
of Ferrante Gonzaga (1507 – 1557) has been performed using the physical model of the
skull obtained by methods of computed tomography of his embalmed body and rapid
prototyping [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The facial approximation of a 3,000-year-old ancient Egyptian woman
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] has been made with the use of medical imaging data.
      </p>
      <p>
        Recent possibilities for collecting and processing big amounts of digital
anthropological data allow to involve statistical and machine learning techniques for face
approximation problem. The applying statistical shape models representing the skull and
face morphology for the face approximation problem has been studied [
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        ] by fitting
them to a set of magnetic resonance images of the head. A large scale facial model – a
3D Morphable Model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] has been automatically constructed from 9663 distinct facial
identities. The 3D Morphable Model contains statistical information about a huge
variety of the human population. A novel method for co-registration of two independent
statistical shape models was presented in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].A face model is consistent with a skull
model using stochastic optimization based on Markov Chain Monte Carlo (MCMC). A
facial reconstruction is posed as a conditional distribution of plausible face shapes given
a skull shape. Also deep learning models appear that are capable of multi-modal data
translation [
        <xref ref-type="bibr" rid="ref10 ref11">10,11</xref>
        ] or generating object’s shape 3D reconstruction basing on a single
image [
        <xref ref-type="bibr" rid="ref12 ref13">12,13</xref>
        ].These approaches are also can be applied for facial approximation.
2.2
      </p>
      <p>
        Generative Adversarial Networks
A new type of neural networks known as generative adversarial networks (GANs) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
made it possible to take a significant step forward in the field of image processing.
GANs consist of two deep convolutional neural networks: a Generator network tries to
synthesize an image that visually indistinguishable from a given sample of images from
the target domain. A Discriminator network tries to distinguish the ‘fake’ images
generated by the Generator network from the real images in the target domain. Generator and
Discriminator networks are trained simultaneously. This approach can be considered as
an adversarial game of two players.
      </p>
      <p>
        One of the first goals solved using GANs was image synthesis. Image-to-image
translation problem was solved using conditional GAN termed pix2pix [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Such
network learns a mapping G : (x; z) ! y from observed image x and random noise
vector z, to output y. This method also uses a sum of two loss functions: a conditional
adversarial objective function and an L1 distance. However, for many tasks it is not
possible to generate paired training datasets for image-to-image translation tasks.
      </p>
      <p>
        To overcome this difficulty a CycleGAN [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] was proposed. The CycleGAN
leverage a cycle consistency loss for learning a translation from a source domain X to a
target domain Y in the absence of paired examples. Therefore, the CycleGAN model
detects special features in one image domain and learns to translate them to the
target domain. A new StyleGAN model was proposed in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] that provides a superior
performance in the perceptual realism and quality of the reconstructed image. Unlike
the common generator architecture that feeds the latent code through the input layer,
the StyleGAN appends a mapping of the input to an intermediate latent space, which
controls the generator. Moreover, an adaptive instance normalization (AdaIN) is used
at each convolution layer. Gaussian noise is injected after each convolution facilitating
generation of stochastic features such as hair-dress or freckles. The problems of the first
StyleGAN model were partially eliminated in the second StyleGANv2 model [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
In this model parameters are optimized and the neural network training pipeline was
adjusted. The changes made have improved the quality of the results.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <p>
        Our aim is training two deep generative adversarial model for joint 3D face
reconstruction and photorealistic texturing of prehistoric human. We use pix2pixHD [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and
MaskGAN [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] models as a starting point to develop our skull2photo framework.
We also use assumptions of Knyaz et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. We provide two key contribution to the
original skull2face framework. Firstly, we add a new GAN model for
photorealistic multimodal texturing of the reconstructed 3D face. Secondly, we replace the original
pix2pix generator with a deeper pix2pixHD model.
3.1
      </p>
      <p>skull2photo Framework Overview
Our aim is 3D reconstruction and texture generation of a prehistoric human face from
a single depth map of its skull. We consider four domains: the skull depth map domain
A 2 RW H , the face depth map domain B 2 RW H , the face semantic labeling
domain C 2 RW H 3, and the face texture domain D 2 RW H 3.</p>
      <p>
        We train two generator models: depth map generator G1, and texture generator G2.
The aim of our depth map generator G1 is learning a mapping G : (A; N ) ! (B; C),
where N is a random vector drawn from a standard Gaussian distribution N (0; I)),
A 2 A is the input the skull depth map, B 2 B is the output face depth map, and C 2 C
is the semantic labeling of the face parts similar to [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Our texture generator G2 aims
learning a mapping G : C ! D from the semantic labeling C to the photorealistic
face texture D 2 D.
      </p>
      <p>The multimodal adversarial loss governs the training process of our texture
generator G2</p>
      <p>G ; E
= arg minG;E maxD LVGAAEN(G; D; E) +
+LGAN(G; D)</p>
      <p>latent(G; E) +
+ latentL1</p>
      <p>L1VAE(G; E)</p>
      <p>
        KLLKL(E)
;
(1)
where E(D) is the latent code generated by an encoder network similar to [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], and
LKL is the Kullback–Leibler-divergence (KL-divergence) loss
      </p>
      <p>LKL(E) = ED p(D) [DKL(E(D)kN (0; I))] ;
and DKL(pkq) is an integral over a latent distribution encoded by E(D)
DKL(pkq) =</p>
      <p>Z
p(z) log
p(z)
q(z)
dz
(2)
(3)
Overview of the proposed framework is presented in Figure 1.
3.2</p>
      <p>
        Dataset Generation
For training the developed skull2photo framework a special crania-to-facial (C2F)
dataset was created [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. The C2F dataset includes data of two modalities: skull 3D
models and face 3D models. For model training these 3D model were translated in
depth map form. The C2F dataset has two parts. The first part is paired samples subset,
containing the corresponding 3D models of a face and a skull, generated by processing
computer tomography data. The paired samples subset contains 24 pairs of skull and
face 3D models.
We evaluate our skull2photo framework qualitatively and quantitatively using the
C2F and the CelebAMask-HQ [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Firstly, we present implementation details in
Section 4.1. After that, we demonstrate qualitative results for face 3D reconstruction and
texturing in Section 4.2. Finally, we explore quantitative results in terms of 3D shape
accuracy in Section 4.3.
4.1
      </p>
      <p>
        Network Training
Our framework is contained two GAN networks. The first of them is the pix2pixHD
framework [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The pix2pixHD framework was designed to perform an arbitrary
image-to-image transformation.We train the generator G1 synthesized face depth map
and semantic labels. The input images were skull depth map. We collected the original
dataset that includes paired and unpaired skull depth map images. The unpaired samples
subset contains 316 skull depth map images. The paired samples subset contains 200
pair depth map images.
      </p>
      <p>
        The second neural networks is the MaskGAN [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The generator G2 trained to
reconstruct realistic photographs of human faces from semantic segmentation images.
For this goals we used the CelebA dataset [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. It is a large-scale face image dataset
that has 30,000 high-resolution face images. Each image has a segmentation mask of
facial attributes.
      </p>
      <p>The network was trained and tested using the PyTorch library. It was trained using
two NVIDIA RTX2080Ti captured GPU and was 200 epochs.This dataset was divided
into independent training and test splits. The training of the generator G1 was completed
in 27 hours and the generator G2 in 45 hours.
4.2</p>
      <p>Qualitative Evaluation
The trained model was tested on independent testing dataset to reconstruct unseen faces.
Firstly, for the qualitative evaluation we reconstructed modern human faces. We used
a small part of CelebAMask-HQ dataset. Secondly, we tried to reconstructed the
ancient man’s face. This task is not easy because there are significant differences between
modern man’s face and ancient man’s face.</p>
      <p>Initially we generated face depth map images and semantic segmentation images
using generator G1. Then, we used previous received images as input for generator G2
and reconstructed face photo texture. Finally, we selected several random style codes of
face and synthesized several face samples. Examples were presented in Figure 2</p>
      <sec id="sec-3-1">
        <title>Input Skull</title>
        <p>Depth Map</p>
      </sec>
      <sec id="sec-3-2">
        <title>Face</title>
        <p>Depth Map</p>
      </sec>
      <sec id="sec-3-3">
        <title>Labeling</title>
      </sec>
      <sec id="sec-3-4">
        <title>Sample 1</title>
      </sec>
      <sec id="sec-3-5">
        <title>Sample 2</title>
      </sec>
      <sec id="sec-3-6">
        <title>Sample 3</title>
        <p>1
l
e
d
o
M
2
l
e
d
o
M</p>
        <p>Fig. 2. Examples of input data and the results of the neural network.
4.3</p>
        <p>
          Quantitative Evaluation
We present quantitative results on the independent test split of our C2F dataset in in
Table 1. Depth maps predicted by the network are normalized to the range [0; 1], where 0
is the front clipping plane located at 0 mm from virtual camera, and 1 is the far clipping
plane located at 100 mm from camera. We use the L2 distance between the ground truth
face depth map and the reconstructed depth map. We compare our skull2photo
model to the skull2face [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] baseline. Experimental results demonstrate the the
modified generator G1 improves the quality of 3D reconstruction by 11%.
We demonstrated that generative adversarial models can learn a challenging task of
3D face reconstruction and texturing of prehistoric human. Furthermore, we explored
the possibility to generate different possible faces from a single skull using the
KLdivergence loss function. Our main observation that the multimodal texture
reconstruction model trained on images of the modern people can generalize to prehistoric human.
We developed a two-stage framework for reconstruction of depth map and texture of a
prehistoric human from a single depth map of its skull. The model was implemented
using the PyTorch library and trained using three datasets. A paired dataset consisting
of depth maps of human faces and corresponding skull was generated from computed
tomography data. An unpaired dataset was developed by generating 3D reconstructions
of skulls of prehistoric humans. A publicly available dataset CelebAMask-HQ dataset
was used for training texture generation model. Both qualitative and quantitative
evaluation proved that the our framework is capable of generating realistic 3D reconstructions
of prehistoric human faces from a single depth map of a skull.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>The reported study was funded by Russian Foundation for Basic Research (RFBR)
according to the research project 17-29-04509.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gerasimov</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The face finder</article-title>
          . London: Hitchinson &amp;
          <string-name>
            <surname>Co</surname>
          </string-name>
          (
          <year>1971</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Knyaz</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheltov</surname>
            ,
            <given-names>S.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stepanyants</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saltykova</surname>
            ,
            <given-names>E.B.</given-names>
          </string-name>
          :
          <article-title>Virtual face reconstruction based on 3D skull model</article-title>
          . In: Corner,
          <string-name>
            <given-names>B.D.</given-names>
            ,
            <surname>Pargas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.P.</given-names>
            ,
            <surname>Nurre</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.H</surname>
          </string-name>
          . (eds.)
          <article-title>ThreeDimensional Image Capture</article-title>
          and Applications V. vol.
          <volume>4661</volume>
          , pp.
          <fpage>182</fpage>
          -
          <lpage>190</lpage>
          . International Society for Optics and Photonics,
          <string-name>
            <surname>SPIE</surname>
          </string-name>
          (
          <year>2002</year>
          ), https://doi.org/10.1117/12.460172
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Knyaz</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maksimov</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novikov</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          :
          <article-title>Vision based automated anthropological measurements and analysis</article-title>
          .
          <source>ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W12</source>
          ,
          <fpage>117</fpage>
          -
          <lpage>122</lpage>
          (
          <year>2019</year>
          ), https://www.int
          <article-title>-arch-photogramm-remote-sens-spatial-inf-sci</article-title>
          . net/XLII-2-
          <issue>W12</issue>
          /117/2019/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Benazzi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertelli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lippi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bedini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caudana</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gruppioni</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mallegni</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Virtual anthropology and forensic arts: the facial reconstruction of ferrante gonzaga</article-title>
          .
          <source>Journal of Archaeological Science</source>
          <volume>37</volume>
          (
          <issue>7</issue>
          ),
          <fpage>1572</fpage>
          -
          <lpage>1578</lpage>
          (
          <year>2010</year>
          ), http://www.sciencedirect. com/science/article/pii/S0305440310000233
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lindsay</surname>
            ,
            <given-names>K.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruhli</surname>
            ,
            <given-names>F.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deleon</surname>
          </string-name>
          , V.B.:
          <article-title>Revealing the face of an ancient egyptian: Synthesis of current and traditional approaches to evidence-based facial approximation</article-title>
          .
          <source>The Anatomical Record</source>
          <volume>298</volume>
          (
          <issue>6</issue>
          ),
          <fpage>1144</fpage>
          -
          <lpage>1161</lpage>
          (
          <year>2015</year>
          ), https://anatomypubs.onlinelibrary. wiley.com/doi/abs/10.1002/ar.23146
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Paysan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , L u¨thi,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Albrecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Lerch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Amberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Santini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Vetter</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Face reconstruction from skull shapes and physical attributes</article-title>
          . In: Denzler,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Notni</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          , Su¨ße, H. (eds.) Pattern Recognition. pp.
          <fpage>232</fpage>
          -
          <lpage>241</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2009</year>
          ), https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -03798-6{_}
          <fpage>24</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Paysan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knothe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amberg</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romdhani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vetter</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A 3d face model for pose and illumination invariant face recognition</article-title>
          .
          <source>In: 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance</source>
          . pp.
          <fpage>296</fpage>
          -
          <lpage>301</lpage>
          (
          <year>Sep 2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Booth</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roussos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponniah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dunaway</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zafeiriou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Large scale 3d morphable models</article-title>
          .
          <source>International Journal of Computer Vision</source>
          <volume>126</volume>
          (
          <issue>2</issue>
          ),
          <fpage>233</fpage>
          -
          <lpage>254</lpage>
          (
          <year>Apr 2018</year>
          ), https: //doi.org/10.1007/s11263-017-1009-7
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Madsen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Lu¨ thi,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Vetter</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Probabilistic joint face-skull modelling for facial reconstruction</article-title>
          .
          <source>In: 2018 IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2018</year>
          ,
          <article-title>Salt Lake City</article-title>
          ,
          <string-name>
            <surname>UT</surname>
          </string-name>
          , USA, June 18-22,
          <year>2018</year>
          . pp.
          <fpage>5295</fpage>
          -
          <lpage>5303</lpage>
          (
          <year>2018</year>
          ), http://openaccess.thecvf.com/content_cvpr_2018/html/ Madsen_Probabilistic_Joint_
          <article-title>Face-Skull_CVPR_2018_paper</article-title>
          .html
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Isola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Efros</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Image-to-Image Translation with Conditional Adversarial Networks</article-title>
          .
          <source>In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          . pp.
          <fpage>5967</fpage>
          -
          <lpage>5976</lpage>
          . IEEE (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kniaz</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knyaz</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          , Hladu˚ vka, J.,
          <string-name>
            <surname>Kropatsch</surname>
            ,
            <given-names>W.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mizginov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset</article-title>
          . In: Leal-Taixe´,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Roth</surname>
          </string-name>
          , S. (eds.) Computer Vision - ECCV 2018 Workshops. pp.
          <fpage>606</fpage>
          -
          <lpage>624</lpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2019</year>
          ), https://link.springer. com/chapter/10.1007/978-3-
          <fpage>030</fpage>
          -11024-6_
          <fpage>46</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kniaz</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Remondino</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knyaz</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          :
          <article-title>Generative adversarial networks for single photo 3d reconstruction</article-title>
          . ISPRS - International
          <source>Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W9</source>
          ,
          <fpage>403</fpage>
          -
          <lpage>408</lpage>
          (
          <year>2019</year>
          ), https://www.int
          <article-title>-arch-photogramm-remote-sens-spatial-inf-sci</article-title>
          . net/XLII-2-
          <issue>W9</issue>
          /403/2019/
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Knyaz</surname>
          </string-name>
          , V.:
          <article-title>Machine learning for scene 3d reconstruction using a single image</article-title>
          .
          <source>Proc. SPIE 11353</source>
          ,
          <string-name>
            <surname>Optics</surname>
          </string-name>
          ,
          <source>Photonics and Digital Technologies for Imaging Applications VI</source>
          <volume>11353</volume>
          ,
          <issue>1135321</issue>
          (
          <year>2020</year>
          ), https://doi.org/10.1117/12.2556122
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pouget-Abadie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mirza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warde-Farley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozair</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Generative adversarial nets</article-title>
          . In: Ghahramani,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Welling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Cortes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Lawrence</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.D.</given-names>
            ,
            <surname>Weinberger</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.Q</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>27</volume>
          , pp.
          <fpage>2672</fpage>
          -
          <lpage>2680</lpage>
          . Curran Associates, Inc. (
          <year>2014</year>
          ), http://papers.nips.cc/ paper/5423-generative-adversarial-nets.pdf
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Isola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Efros</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Image-to-image translation with conditional adversarial networks</article-title>
          .
          <source>CVPR</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Efros</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Unpaired image-to-image translation using cycleconsistent adversarial networks</article-title>
          .
          <source>In: Computer Vision</source>
          (ICCV),
          <year>2017</year>
          IEEE International Conference on (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Karras</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laine</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aila</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A style-based generator architecture for generative adversarial networks</article-title>
          .
          <source>CoRR abs/1812</source>
          .04948 (
          <year>2018</year>
          ), http://arxiv.org/abs/
          <year>1812</year>
          .04948
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Karras</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laine</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aittala</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellsten</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehtinen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aila</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Analyzing and improving the image quality of stylegan (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>M.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kautz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Catanzaro</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>High-resolution image synthesis and semantic manipulation with conditional gans</article-title>
          .
          <source>In: Proceedings of the IEEE Conference on Computer Vision</source>
          and Pattern
          <string-name>
            <surname>Recognition</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Maskgan:
          <article-title>Towards diverse and interactive facial image manipulation</article-title>
          .
          <source>In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Knyaz</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kniaz</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novikov</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Galeev</surname>
            ,
            <given-names>R.M.:</given-names>
          </string-name>
          <article-title>Machine learning for approximating unknown face</article-title>
          . ISPRS - International
          <source>Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B2-2020</source>
          ,
          <fpage>857</fpage>
          -
          <lpage>862</lpage>
          (
          <year>2020</year>
          ), https://www.int
          <article-title>-arch-photogramm-remote-sens-spatial-inf-sci. net/XLIII-B2-</article-title>
          <year>2020</year>
          /857/2020/
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Zhang, R.,
          <string-name>
            <surname>Pathak</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Efros</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shechtman</surname>
          </string-name>
          , E.:
          <article-title>Toward multimodal image-to-image translation</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems</source>
          <year>2017</year>
          ,
          <fpage>4</fpage>
          -9
          <source>December</source>
          <year>2017</year>
          , Long Beach, CA, USA. pp.
          <fpage>465</fpage>
          -
          <lpage>476</lpage>
          (
          <year>2017</year>
          ), http://papers.nips.cc/ paper/6650-toward
          <article-title>-multimodal-image-to-image-translation</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Deep learning face attributes in the wild</article-title>
          .
          <source>In: Proceedings of International Conference on Computer Vision</source>
          (ICCV) (
          <year>December 2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>