<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Two-Stage High-Resolution Image Inpainting?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Deep Two-Stage High-Resolution Image Inpainting</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lomonosov Moscow State University</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In recent years, the field of image inpainting has developed rapidly, learning based approaches show impressive results in the task of filling missing parts in an image. But most deep methods are strongly tied to the resolution of the images on which they were trained. A slight resolution increase leads to serious artifacts and unsatisfactory filling quality. These methods are therefore unsuitable for interactive image processing. In this article, we propose a method that solves the problem of inpainting arbitrary-size images. We also describe a way to better restore texture fragments in the filled area. For this, we propose to use information from neighboring pixels by shifting the original image in four directions. Moreover, this approach can work with existing inpainting models, making them almost resolution independent without the need for retraining. We also created a GIMP plugin that implements our technique. The plugin, code, and model weights are available at https://github.com/a-mos/ High Resolution Image Inpainting.</p>
      </abstract>
      <kwd-group>
        <kwd>Image inpainting</kwd>
        <kwd>Image restoration</kwd>
        <kwd>High-resolution</kwd>
        <kwd>Deep learning</kwd>
        <kwd>CNN</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Image inpainting is the process of realistically filling unknown or damaged regions of
an image. An inpainting algorithm receives as input a corrupted image and a mask; its
output is a restored image.</p>
      <p>In recent years, the progress of neural networks has led to the development of deep
inpainting methods. Neural-network methods are strongly tied to the resolution at which
they are trained, owing to the lack of receptive field. Most models have an input size less
than or equal to 512 pixels. As a result, they are unable to handle images of arbitrary
shape—for instance, those in interactive image-processing tools. When the resolution
increases, serious artifacts appear in the models. Fig. 1 shows several examples.</p>
      <p>2
1
5
2
1
5
4
2
0
1
4
2
0
1
2
1
5
2
1
5
4
2
0
1
4
2
0
1</p>
    </sec>
    <sec id="sec-2">
      <title>DFNet</title>
      <p>In this article, we describe a method that can restore images regardless of resolution.
It uses the coarse-to-fine approach, restoring the image structure at low resolution and
the texture at high resolution. Also, to improve texture filling, we propose using shifts
of the original image that fill the hole and that artificially expand the receptive field by
the shift amount. Our approach theoretically works for any inpainting method without
retraining.
2</p>
      <sec id="sec-2-1">
        <title>Related Work</title>
        <p>
          The solution to image inpainting problem can take the classical approach of choosing
the most suitable patch [
          <xref ref-type="bibr" rid="ref1 ref2 ref3">1,2,3</xref>
          ] from the image and sequentially filling the hole. Such
methods are good at filling the texture component but poor at filling the structural
component. In recent years, the development of neural networks has led to the creation of
various deep inpainting methods [
          <xref ref-type="bibr" rid="ref4 ref5 ref6">4,5,6</xref>
          ]. Such algorithms avoid using external memory
and operate only on the basis of knowledge gained during training. They are better than
classical algorithms at restoring an image’s structural features.
        </p>
        <p>Adding to the difficulty of training neural-network methods is that the solution to
the inpainting problem is nonunique. Thus, formulating the most suitable loss function</p>
        <sec id="sec-2-1-1">
          <title>Deep Two-Stage High-Resolution Image Inpainting 3</title>
          <p>
            for training is difficult. Deep-neural-network features are the best way to evaluate
holefilling quality [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ].
          </p>
          <p>
            Also, the appearance of generative adversarial networks (GANs) [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] formed the
basis for creating generative image-filling methods [
            <xref ref-type="bibr" rid="ref10 ref11 ref4 ref5 ref9">9,4,5,10,11</xref>
            ], which use adversarial
loss as one component of their loss functions.
2.1
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Gated Convolutions and Contextual Attention Module</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], researchers proposed replacing some of the neural network’s classical
convolutions with gated convolutions, an extension of partial convolutions [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. They also
introduced a contextual-attention module, which is a neural-network analog of
patchbased algorithms for filling image areas. Their method employs a GAN. Instead of a
high-computational-cost contextual-attention module, we propose common shifts as a
means of filling texture.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Deep Fusion Network</title>
      <p>
        The authors of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] created a fusion block, which allows the network to, at its output,
alpha blend each pixel in accordance with the predicted alpha map. They implemented
the U-Net [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] architecture in a non-generative-adversarial manner, using the high-level
features of VGG16 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] as loss functions. Our network implements a pretrained DFNet
in the first stage, yielding the structural component in low resolution. We avoided a
fusion block in our refinement network, since it changes even the unmasked area.
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>High-Resolution Inpainting</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], researchers proposed modified gated convolutions that decrease the
computational complexity, reducing the number of weights. They also suggested splitting the
image at high and low frequencies by subtracting the blurry version from the image and
using the modified contextual-attention module for aggregation. They trained their
refinement network on the small but full images. The goal of our refinement network, on
the other hand, is only to restore texture, so we train it on small patches of the original
image; when testing, we use the entire image.
3
3.1
      </p>
      <sec id="sec-5-1">
        <title>Proposed Approach</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Data Preparation</title>
      <p>
        For training, we selected all images from DIV2K [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and some from the Internet.
From each image, we cut out three random largest squares. In total, the training sample
contained 7,218 images, with an additional 1,650 for validation. We applied to each
image a random irregular mask from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Our testing used natural images with a square
mask at the center.
      </p>
      <p>Inputs
1 19
anMdashskifts</p>
      <p>DFNet
51C2xo51a2rse</p>
      <p>Patch extraction (train)
or ful size (test)</p>
      <p>Refinement
Upscale
+
Replace</p>
      <p>20
Generate</p>
      <p>Shifts
20 3</p>
      <p>Output
Input patch</p>
      <p>Conv2D no
activation</p>
      <p>Conv2D
+ ReLU</p>
      <p>Concat +
Conv2D +
LeakyReLU</p>
      <p>Upsampling</p>
      <p>Conv2D +</p>
      <p>
        Sigmoind
Stage one The first stage of our algorithm restores the image structure at a low
resolution. We initially downscale the image and mask to 512 512, then apply the pretrained
DFNet [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to get the coarse result in low resolution. Next we perform an upscale and
replace the known region with the corresponding region from the high-resolution image.
      </p>
      <p>Finally we generate shifts of the original image in four directions: left, right, down,
up. For our experiments, the shifts were 20% of the image size. Note that during this
process, we also recount the masks and mark the pixels in the open areas as invalid.
Thus, the first stage yields a 20-channel image: five RGB images (main plus four shifts)
and five masks.</p>
      <sec id="sec-6-1">
        <title>Deep Two-Stage High-Resolution Image Inpainting 5</title>
        <p>Stage two The second stage restores the texture. To prevent the network from being
attached to the structure and to increase training efficiency, we cut out random 512 512
patches from the input tensor of depth 20, with the condition that the masked area (from
the main mask) is at least 10% but not more than 90% of the patch area. Note that
during testing, we skip the patch extraction and simply transmit images in full resolution.
The refinement network’s output is a fine filled result. Fig. 2 shows the pipeline.
3.3</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Network Architecture</title>
      <p>
        The refinement-network architecture implements the U-Net [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] approach. (An
illustration appears in Fig. 3). Note that after each layer — except for the last one — we used
Batch Normalization [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The figure shows encoder filter sizes; all decoder filters are
3 3.
3.4
      </p>
    </sec>
    <sec id="sec-8">
      <title>Loss Function</title>
      <p>
        We trained our network in a non-generative-adversarial manner. Following [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] as a loss
function, we used a linear combination:
      </p>
      <p>L = 0:1 Ltv + 6:0 L1 + 0:1 Lp + 240:0 Ls
Where for reference image I and predicted ^I:
Ltv — total variation distance in the masked area
L1 — distance which is calculated as</p>
      <p>
        L1 = CH1W I ^I 1
Where C; W; H are the number of channels, width, and height respectively.
Lp; Ls — Perceptual and Style Losses [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]:
      </p>
      <p>Lp =
Ls =</p>
      <p>X
j2J
X
j2J
j (I)</p>
      <p>^
j I
Gj (I)</p>
      <p>^
Gj I
1
1
(1)
(2)
(3)
(4)
Where J is set of indices in VGG16, j is j
of j th feature layer.
4</p>
      <sec id="sec-8-1">
        <title>Experiments</title>
        <p>
          Our network training used the Adam [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] optimizer with default settings. It took two
days on two Nvidia Tesla P100 GPUs with a batch size of 18 images. Note that we
trained only the network from the second stage. For optimization, we calculated the
first stage’s output separately.
th feature layer. And Gj is Gram matrix
        </p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>DFNet</title>
    </sec>
    <sec id="sec-10">
      <title>DeepFillv2</title>
    </sec>
    <sec id="sec-11">
      <title>HiFill</title>
    </sec>
    <sec id="sec-12">
      <title>Photoshop</title>
    </sec>
    <sec id="sec-13">
      <title>Ours GT</title>
      <p>Fig. 1, Fig. 4, and Fig. 6 show examples of our work. Although the method functions
almost identically at any resolution, we limited ourselves to 1024 1024. More pictures
and resolutions appear in the repository.</p>
      <p>
        Subjective evaluation To conduct a subjective evaluation, we selected a set of 34
natural images mostly from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] with a full resolution of 2048 2048. Participants were
shown two images and asked to choose the one they preferred. We also added two
validation questions comparing the result of DeepFillv2 with the ground-truth image. In
total, 150 people participated, yielding 3,750 valid votes. Our evaluation used
BradleyTerry [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] for the ranking model. The results appear in Fig. 5.
      </p>
      <p>
        Objective evaluation Due to the complexity of subjective evaluation, we only
conducted objective comparisons for other resolutions, although this may not be confirmed
by observers ratings. For the objective comparison, we also added output images from
Adobe Photoshop 2020, a commercial package that implements the classical inpainting
approach. Our quality metrics were the mean L1 distance, SSIM [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and PSNR. To
reduce resolution, we used Nearest-Neighbor downsampling. Table 1 shows the results
for this objective comparison.
      </p>
      <p>Ground-truth</p>
      <p>Ours
HiFill</p>
      <p>
        DFNet
DeepFill v2
Inspired by [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], we embedded our method in the GNU Image Manipulation Program
(GIMP). The final plugin, which implements our method, appears in the repository.
5
      </p>
      <sec id="sec-13-1">
        <title>Conclusion</title>
        <p>This work proposes an inpainting technique that handles images of different sizes. We
conducted both objective and subjective comparisons with existing learning-based
models and a popular commercial package, showing that our method produces a more
satisfactory result. Also, our approach can theoretically apply to any inpainting model,
making it resolution independent. In the future, we would like to train a new model
using the proposed method, but in an end-to-end manner with dynamic shift size.
6</p>
      </sec>
      <sec id="sec-13-2">
        <title>Acknowledgments</title>
        <p>This work was partially supported by Russian Foundation for Basic Research under
Grant 19-01-00785 a. Model training for this work employed the IBM Polus computing
cluster of the Faculty of Computational Mathematics and Cybernetics at Moscow State
University.</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>Ours GT</title>
    </sec>
    <sec id="sec-15">
      <title>Ours GT</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Drori</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen-Or</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeshurun</surname>
          </string-name>
          , H.:
          <article-title>Fragment-based image completion</article-title>
          .
          <source>ACM Transactions on Graphics</source>
          <volume>22</volume>
          (08
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Criminisi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toyama</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Region filling and object removal by exemplar-based image inpainting</article-title>
          .
          <source>IEEE Transactions on Image Processing</source>
          <volume>13</volume>
          (
          <issue>9</issue>
          ),
          <fpage>1200</fpage>
          -
          <lpage>1212</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Barnes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shechtman</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finkelstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Patchmatch: A randomized correspondence algorithm for structural image editing</article-title>
          .
          <source>ACM Trans. Graph</source>
          .
          <volume>28</volume>
          (
          <issue>08</issue>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reda</surname>
            ,
            <given-names>F.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shih</surname>
            ,
            <given-names>K.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Catanzaro</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Image inpainting for irregular holes using partial convolutions</article-title>
          .
          <source>In: The European Conference on Computer Vision</source>
          (ECCV) (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>T.S.:</given-names>
          </string-name>
          <article-title>Generative image inpainting with contextual attention</article-title>
          .
          <source>2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (Jun</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
          </string-name>
          , H.:
          <article-title>Deep fusion network for image completion</article-title>
          .
          <source>In: Proceedings of the 27th ACM International Conference on Multimedia</source>
          . pp.
          <fpage>2033</fpage>
          -
          <lpage>2042</lpage>
          . MM '19,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Molodetskikh</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erofeev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vatolin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Perceptually motivated method for image inpainting comparison (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pouget-Abadie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mirza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warde-Farley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozair</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Generative adversarial nets</article-title>
          .
          <source>ArXiv (06</source>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Iizuka</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simo-Serra</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ishikawa</surname>
          </string-name>
          , H.:
          <article-title>Globally and locally consistent image completion</article-title>
          .
          <source>ACM Transactions on Graphics 36</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          (07
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Free-form image inpainting with gated convolution</article-title>
          .
          <source>2019 IEEE/CVF International Conference on Computer Vision</source>
          (ICCV) (
          <year>Oct 2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Yi</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azizi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Contextual residual aggregation for ultra highresolution image inpainting (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ronneberger</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brox</surname>
          </string-name>
          , T.:
          <article-title>U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing</article-title>
          and
          <string-name>
            <surname>Computer-Assisted</surname>
            <given-names>Intervention - MICCAI</given-names>
          </string-name>
          <year>2015</year>
          p.
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          .
          <source>arXiv 1409</source>
          .
          <volume>1556</volume>
          (
          <issue>09</issue>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Timofte</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Van Gool</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haris</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Ntire 2018 challenge on single image super-resolution: Methods and results</article-title>
          .
          <source>In: The IEEE Conference on Computer Vision</source>
          and Pattern
          <string-name>
            <surname>Recognition (CVPR) Workshops</surname>
          </string-name>
          (
          <year>June 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ioffe</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Batch normalization: Accelerating deep network training by reducing internal covariate shift (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Alahi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fei-Fei</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Perceptual losses for real-time style transfer and superresolution</article-title>
          . Lecture Notes in Computer Science p.
          <fpage>694</fpage>
          -
          <lpage>711</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization (</article-title>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Bradley</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terry</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          :
          <article-title>Rank analysis of incomplete block designs: I. the method of paired comparisons</article-title>
          .
          <source>Biometrika</source>
          <volume>39</volume>
          (
          <issue>3</issue>
          /4),
          <fpage>324</fpage>
          -
          <lpage>345</lpage>
          (
          <year>1952</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bovik</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheikh</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simoncelli</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Image quality assessment: From error visibility to structural similarity</article-title>
          .
          <source>Image Processing, IEEE Transactions on 13</source>
          ,
          <fpage>600</fpage>
          -
          <lpage>612</lpage>
          (05
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Soman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Gimp-ml: Python plugins for using computer vision models in gimp</article-title>
          . arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>13060</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>