Perceptually Motivated Method for Image Inpainting Comparison
                                  I.A. Molodetskikh1 , M.V. Erofeev1 , D.S. Vatolin1
            ivan.molodetskikh@graphics.cs.msu.ru|merofeev@graphics.cs.msu.ru|dmitriy@graphics.cs.msu.ru
                              1
                                Lomonosov Moscow State University, Moscow, Russia
     The field of automatic image inpainting has progressed rapidly in recent years, but no one has yet proposed a standard
method of evaluating algorithms. This absence is due to the problem’s challenging nature: image­inpainting algorithms
strive for realism in the resulting images, but realism is a subjective concept intrinsic to human perception. Existing objective
image­quality metrics provide a poor approximation of what humans consider more or less realistic.
    To improve the situation and to better organize both prior and future research in this field, we conducted a subjective
comparison of nine state­of­the­art inpainting algorithms and propose objective quality metrics that exhibit high correlation
with the results of our comparison.
   Keywords: image inpainting, objective quality metric, quality perception, subjective evaluation, deep learning.

1. Introduction                                                        detection has seen moderate research, including both classi­
                                                                       cal and deep­learning­based approaches. This field focuses
    Image inpainting, or hole filling, is the task of filling
                                                                       on detecting altered image regions, usually involving a set
in missing parts of an image. Given an incomplete image
                                                                       of common manipulations: copy­move (copying an image
and a hole mask, an inpainting algorithm must generate the
                                                                       fragment and pasting it elsewhere in the same image), splic­
missing parts so that the result looks realistic. Inpainting is
                                                                       ing (pasting a fragment from another image), fragment re­
a widely researched topic. Many classical algorithms have
                                                                       moval (deleting an image fragment and then performing ei­
been proposed [5, 26], but over the past few years most re­
                                                                       ther a copy­move or inpainting to fill in the missing area),
search has focused on using deep neural networks to solve
                                                                       various effects such as Gaussian blur, and recompression.
this problem [12, 16, 17, 19, 23, 31, 32].
                                                                       Among these manipulations, the most interesting for this
    Because of the many avenues of research in this field,             work is fragment removal with inpainting.
the need to evaluate algorithms emerges. The goal of an
                                                                           The approaches to image­manipulation detection can
inpainting algorithm is to make the final image as realis­
                                                                       be divided into classical [13, 20], and deep­learning­based
tic as possible, but image realism is a concept intrinsic to
                                                                       approaches [2, 21, 34, 35]. These algorithms aim to locate
humans. Therefore, the most accurate way to evaluate an
                                                                       the manipulated image regions by outputting a mask or a set
algorithm’s performance is a subjective experiment where
                                                                       of bounding boxes enclosing suspicious regions. Unfortu­
many participants compare the outcomes of different algo­
                                                                       nately, they are not directly applicable to inpainting­quality
rithms and choose the one they consider the most realistic.
                                                                       estimation because they have a different goal: whereas an
    Unfortunately, conducting a subjective experiment in­
                                                                       objective quality­estimation metric should strive to accu­
volves considerable time and resources, so many authors re­
                                                                       rately compare realistically inpainted images similar to the
sort to evaluating their proposed methods using traditional
                                                                       originals, a forgery­detection algorithm should strive to ac­
objective image­similarity metrics such as PSNR, SSIM
                                                                       curately tell one apart from the other.
and mean l2 loss relative to the ground­truth image. This
strategy, however, is inadequate. One reason is that eval­             3. Inpainting subjective evaluation
uation by measuring similarity to the ground­truth image
assumes that only a single, best inpainting result exists—a                The gold standard for evaluating image­inpainting al­
false assumption in most cases.                                        gorithms is human perception, since each algorithm strives
    Thus, a perceptually motivated objective metric for                to produce images that look the most realistic to hu­
inpainting­quality assessment is desirable. The objective              mans. Thus, to obtain a baseline for creating an objective
metric should approximate the notion of image realism and              inpainting­quality metric, we conducted a subjective evalu­
yield results similar to those of a subjective study when              ation of multiple state­of­the­art algorithms, including both
comparing outputs from different algorithms.                           classical and deep­learning­based ones. To assess the over­
    We conducted a subjective evaluation of nine state­of­             all quality and applicability of the current approaches and
the­art classical and deep­learning­based approaches to im­            to see how they compare with manual photo editing, we
age inpainting. Using the results, we examine different                also asked professional photo editors to fill in missing re­
methods of objective inpainting­quality evaluation, includ­            gions of the test photos.
ing both full­reference methods (taking both the resulting
image and the ground­truth image as an input) and no­
reference methods (taking the resulting image as an input).            3.1    Test data set
                                                                           Since human photo editors were to perform inpainting,
2. Related work
                                                                       our data set could not include publicly available images.
   Little work has been done on objective image                        We therefore created our own private set of test images by
inpainting­quality evaluation or on inpainting detection in            taking photographs of various outdoor scenes, which are
general. The somewhat related field of manipulated­image               the most likely target for inpainting.


Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
  Fig. 1. Images for the subjective inpainting comparison. The black square in the center is the area to be inpainted.


                                                     Overall (3 images)                                                                Overall (33 images)
                       Ground Truth                                                                     Ground Truth
                            Artist #2
                            Artist #3                                                Generative Inpainting (Places2)
                            Artist #1                                                             Content­Aware Fill
    Generative Inpainting (Places2)
                 Content­Aware Fill                                                 Generative Inpainting (ImageNet)
          Statistics of Patch Offsets                                                      Statistics of Patch Offsets
    Exemplar­Based (patch size 13)
     Exemplar­Based (patch size 9)                                                   Exemplar­Based (patch size 13)
   Generative Inpainting (ImageNet)                                                              Partial Convolutions
                Partial Convolutions                                                     High­Resolution Inpainting
        High­Resolution Inpainting
    Globally and Locally Consistent                                                  Globally and Locally Consistent
                            Shift­Net                                                                        Shift­Net
                   Deep Image Prior
                                                                                                    Deep Image Prior
                                        0   1            2         3       4   5
                                                       Subjective Score                                                  0     1          2          3        4
                        Ground Truth            Classical Method                                                                        Subjective Score
                        Human Artist            Deep Learning­Based Method                 Ground Truth             Classical Method           Deep Learning­Based Method


Fig. 2. Subjective­comparison results across three images                              Fig. 3. Subjective­comparison results for 33 images
                inpainted by human artists.                                                     inpainted using automatic methods.

    Each test image was 512 × 512 pixels with a square
hole in the middle measuring 180 × 180 pixels. We chose a                                         Artist #1                     Statistics of Patch Offsets [7]
square instead of a free­form shape because one algorithm
in our comparison [30] lacks the ability to fill in free­form
holes. The data set comprised 33 images in total. Fig. 1
shows examples.

3.2 Inpainting methods
    We evaluated three classical [1, 5, 7] and six deep­
learning­based approaches [10, 16, 27, 29, 30, 32]. Ad­
ditionally, we hired three professional photo­restoration
and photo­retouching artists to manually inpaint three ran­
domly selected images from our test data set.
                                                                                     Fig. 4. Comparison of inpainting results from Artist #1
                                                                                        and statistics of patch offsets [7] (preferred in the
3.3 Test method                                                                                       subjective comparison).
    The subjective evaluation took place through the
http://subjectify.us platform.    Human observers were                             automatic algorithms, and out of the deep­learning­based
shown pairs of images and asked to pick from each pair                             methods, only generative image inpainting [32] outper­
the one they found most realistic. Each pair consisted of                          formed the classical inpainting methods.
two different inpainting results for the same picture (the                             The individual results for each of these three images ap­
set also contained the original image). In total, 6945 valid                       pear in Fig. 5. In only one case did an algorithm beat an
pairwise judgements were collected from 215 participants.                          artist: statistics of patch offsets [7] scored higher than one
    The judgements were then used to fit a Bradley­Terry                           artist on the “Urban Flowers” photo. Fig. 4 shows the
model [3]. The resulting subjective scores maximize like­                          respective results. Additionally, for the “Splashing Sea”
lihood given the pairwise judgements.                                              photo, two artists actually “outperformed” the original im­
                                                                                   age: their results turned out to be more realistic.
3.4 Results of the subjective comparison                                               We additionally performed a subjective comparison of
                                                                                   various inpainting algorithms among the entire 33­image
      Fig. 2 shows the results for the three images in­                            test set, collecting 3969 valid pairwise judgements across
painted by the human artists. The artists outperformed all                         147 participants. The overall results appear in Fig. 3.
                                             Urban Flowers                                                           Splashing Sea                                                    Forest Trail
                    Ground Truth                                                              Artist #3                                                    Ground Truth
                         Artist #3                                                            Artist #2                                                         Artist #2
                         Artist #2                                                       Ground Truth                                                           Artist #1
       Statistics of Patch Offsets                                                            Artist #1                                                         Artist #3
                         Artist #1                                     Exemplar­Based (patch size 9)                                                Partial Convolutions
              Content­Aware Fill                                      Exemplar­Based (patch size 13)                                    Generative Inpainting (Places2)
 Generative Inpainting (Places2)                                      Generative Inpainting (Places2)                                                Content­Aware Fill
 Exemplar­Based (patch size 13)                                      Generative Inpainting (ImageNet)                                    Exemplar­Based (patch size 9)
Generative Inpainting (ImageNet)                                                   Content­Aware Fill                                   Globally and Locally Consistent
     High­Resolution Inpainting                                             Statistics of Patch Offsets                                 Exemplar­Based (patch size 13)
  Exemplar­Based (patch size 9)                                           High­Resolution Inpainting                                        High­Resolution Inpainting
             Partial Convolutions                                                 Partial Convolutions                                 Generative Inpainting (ImageNet)
 Globally and Locally Consistent                                                     Deep Image Prior                                                           Shift­Net
                         Shift­Net                                    Globally and Locally Consistent                                         Statistics of Patch Offsets
                Deep Image Prior                                                              Shift­Net                                                Deep Image Prior
                                     0   1    2 3 4 5           6                                         0     1     2 3 4 5                                               0   1    2 3 4 5           6
                                             Subjective Score                                                       Subjective Score                                                Subjective Score
                                                      Ground Truth              Human Artist                  Classical Method         Deep Learning­Based Method


          Fig. 5. Results of the subjective study comparing images inpainted by human artists with images inpainted by
                                          conventional and deep­learning­based methods.


They confirm our observations from the first comparison:                                                       layer deep), ResNet­V1­50 [8], ResNet­V2­50 [9], Incep­
among the deep­learning­based approaches we evaluated,                                                         tion­V3 [25], Inception­V4 [24] and PNASNet­Large [15].
generative image inpainting [32] seems to be the only one                                                          For training, we used clean and inpainted images based
that can outperform the classical methods.                                                                     on the COCO [14] data set. To create the inpainted images,
                                                                                                               we used five inpainting algorithms [5, 7, 10, 29, 32] in eight
4. Objective inpainting­quality estimation                                                                     total configurations.
    Using the results we obtained from the subjective com­                                                         The network architectures take a square image as an in­
parison, we evaluated several approaches to objective                                                          put and output the score—a single number where 0 means
inpainting­quality estimation. In particular, we used these                                                    the image contains inpainted regions and 1 means the im­
objective metrics to estimate the inpainting quality of the                                                    age is “clean.” The loss function was mean squared error.
images from our test set and then compared them with the                                                       Some network architectures were additionally trained to
subjective results. For each of the 33 images, we applied                                                      output the predicted class using one­hot encoding (similar
every tested metric to every inpainting result (as well as                                                     to binary classification); the loss function for this case was
to the ground­truth image) and computed the Pearson and                                                        softmax cross­entropy.
Spearman correlation coefficients with the subjective re­                                                          The network architectures were identical to the ones
sult. The final value was an average of the correlations                                                       used for image classification, with one difference: we al­
over all 33 test images.                                                                                       tered the number of outputs from the last fully connected
                                                                                                               layer. This change allowed us to initialize the weights of all
                                                                                                               previous layers from the models pretrained on ImageNet,
4.1       Full­reference metrics                                                                               greatly improving the results compared with training from
    To construct a full­reference metric that encourages se­                                                   random initialization.
mantic similarity rather than per­pixel similarity, as in [11],                                                    For some experiments we tried using the RGB noise
we evaluated metrics that compute the difference between                                                       features [34] and the spectral weight normalization [18].
the ground­truth and inpainted­image feature maps pro­                                                             In addition to the typical validation on part of the data
duced by an image­classification neural network. We se­                                                        set, we also monitored correlation of network predictions
lected five of the most popular architectures: VGG [22]                                                        with the subjective scores collected in Section 3. We used
(16­ and 19­layer deep variants), ResNet­V1­50 [8], Incep­                                                     the networks to estimate the inpainting quality of the 33­
tion­V3 [25], Inception­ResNet­V2 [24] and Xception [4].                                                       image test set, then computed correlations with subjective
We used the models pretrained on the ImageNet [6] data                                                         results in the same way as the final comparison. The train­
set. The mean squared error between the feature maps was                                                       ing of each network was stopped once the correlation of the
the metric result.                                                                                             network predictions with the subjective scores peaked and
    We additionally included the structural­similarity                                                         started to decrease (possibly because the networks were
(SSIM) index [28] as a full­reference metric. SSIM is                                                          overfitting to the inpainting results of the algorithms we
widely used to compare image quality, but it falls short                                                       used to create the training data set).
when applied to inpainting­quality estimation.
                                                                                                               4.3 Results
4.2       No­reference metrics
                                                                                                                   Fig. 6 shows the overall results. The no­reference
   We picked several popular image­classification neural­                                                      methods achieve slightly weaker correlation with the
network architectures and trained them to differentiate im­                                                    subjective­evaluation responses than do the best full­
ages without any inpainting from partially inpainted im­                                                       reference methods. But the results of most no­reference
ages. The architectures included VGG [22] (16­ and 19­                                                         methods are still considerably better than those of the
                                                   Pearson                                                                      Spearman
            VGG­16 (block5_conv3)                                                          VGG­16 (block5_pool)
    ResNet­V1­50 (res5c_branch2c)                                                Inception­ResNet­V2 (conv_7b)
            VGG­19 (block5_conv4)                                              Xception (block14_sepconv2_act)
            VGG­16 (block5_conv2)                                                        VGG­19 (block5_conv4)
              VGG­19 (block5_pool)                                               ResNet­V1­50 (res5c_branch2c)
  Xception (block14_sepconv2_act)                                                        VGG­16 (block5_conv3)
             Inception­V3 (mixed10)                                                        VGG­19 (block5_pool)
              VGG­16 (block5_pool)                                                       VGG­16 (block5_conv2)
      Xception (block14_sepconv2)                                            Inception­ResNet­V2 (conv_7b_ac)
    Inception­ResNet­V2 (conv_7b)                                                  Xception (block14_sepconv2)
                Inception­ResNet­V2                                                       Inception­V3 (mixed10)
         Inception­V4 (spec. norm.)                                                   Inception­V4 (spec. norm.)
            VGG­16 (block5_conv1)                                                      Inception­V3 (RGB noise)
Inception­ResNet­V2 (conv_7b_ac)                                                             Inception­ResNet­V2
         Inception­V3 (spec. norm.)                                             Inception­ResNet­V2 (two­class)
                        Inception­V3                                                     Inception­V3 (two­class)
          Inception­V3 (RGB noise)                                                       VGG­16 (block5_conv1)
   Inception­ResNet­V2 (two­class)                                                   ResNet­V1­50 (RGB noise)
        ResNet­V1­50 (RGB noise)                                                         Inception­V4 (two­class)
            Inception­V3 (two­class)                                                  Inception­V3 (spec. norm.)
                               SSIM                                                         VGG­16 (spec. norm.)
               VGG­16 (spec. norm.)                                                                  Inception­V3
        ResNet­V1­50 (spec. norm.)                                                                   Inception­V4
                       ResNet­V1­50                                                                 ResNet­V2­50
                       ResNet­V2­50                                                                 ResNet­V1­50
            Inception­V4 (two­class)                                                 ResNet­V1­50 (spec. norm.)
                            VGG­16                                                                       VGG­16
 Inception­ResNet­V2 (spec. norm.)                                                   ResNet­V2­50 (RGB noise)
                        Inception­V4                                          Inception­ResNet­V2 (spec. norm.)
        ResNet­V2­50 (RGB noise)                                                                            SSIM
                   PNASNET­Large                                                                PNASNET­Large
                                   0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9                                      0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
                                                              Full­Reference             No­Reference

   Fig. 6. Mean Pearson and Spearman correlations between objective inpainting­quality metrics and subjective human
                              comparisons. The error bars show the standard deviations.


full­reference SSIM. The best correlation among the no­                        7. References
reference methods came from the Inception­V4 model with
spectral weight normalization.                                                  [1] https://research.adobe.com/project/content­aware­
    It is important to emphasize that we did not train the                          fill/.
networks to maximize correlation with human responses.                          [2] J. H. Bappy, A. K. Roy­Chowdhury, J. Bunk,
We trained them to distinguish “clean” images from in­                              L. Nataraj, and B. S. Manjunath. Exploiting spatial
painted images, yet their output showed good correlation                            structure for localizing manipulated image regions. In
with human responses. This confirms the observations                                The IEEE International Conference on Computer Vi­
made in [33] that deep features are good for modelling hu­                          sion (ICCV), Oct 2017.
man perception.                                                                 [3] R. A. Bradley and M. E. Terry. Rank analysis of in­
                                                                                    complete block designs: I. the method of paired com­
5. Conclusion                                                                       parisons. Biometrika, 39(3/4):324–345, 1952.
                                                                                [4] F. Chollet. Xception: Deep learning with depthwise
    We have proposed a number of perceptually moti­                                 separable convolutions. In The IEEE Conference on
vated no­reference and full­reference objective metrics for                         Computer Vision and Pattern Recognition (CVPR),
image­inpainting quality. We evaluated the metrics by cor­                          July 2017.
relating them with human responses from a subjective com­                       [5] A. Criminisi, P. Pérez, and K. Toyama. Region fill­
parison of state­of­the­art image­inpainting algorithms.                            ing and object removal by exemplar­based image in­
    The results of the subjective comparison indicate that                          painting. IEEE Transactions on Image Processing,
although a deep­learning­based approach to image inpaint­                           13(9):1200–1212, 2004.
ing holds the lead, classical algorithms remain among the                       [6] J. Deng, W. Dong, R. Socher, L.­J. Li, K. Li, and
best in the field.                                                                  L. Fei­Fei. Imagenet: A large­scale hierarchical im­
    We achieved good correlation with the subjective­                               age database. In 2009 IEEE conference on computer
comparison results without specifically training our                                vision and pattern recognition, pages 248–255, 2009.
proposed objective quality­evaluation metrics on the                            [7] K. He and J. Sun. Statistics of patch offsets for image
subjective­comparison response data set.                                            completion. In European Conference on Computer
                                                                                    Vision, pages 16–29. Springer, 2012.
6. Acknowledgement                                                              [8] K. He, X. Zhang, S. Ren, and J. Sun. Deep resid­
                                                                                    ual learning for image recognition. In The IEEE Con­
    This work was partially supported by Russian Founda­                            ference on Computer Vision and Pattern Recognition
tion for Basic Research under Grant 190100785 a.                                    (CVPR), June 2016.
 [9] K. He, X. Zhang, S. Ren, and J. Sun. Identity map­              AAAI Conference on Artificial Intelligence, 2017.
     pings in deep residual networks. In European confer­       [25] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and
     ence on computer vision, pages 630–645, 2016.                   Z. Wojna. Rethinking the inception architecture for
[10] S. Iizuka, E. Simo­Serra, and H. Ishikawa. Globally             computer vision. In The IEEE Conference on Com­
     and locally consistent image completion. ACM Trans­             puter Vision and Pattern Recognition (CVPR), June
     actions on Graphics (ToG), 36(4):107, 2017.                     2016.
[11] J. Johnson, A. Alahi, and L. Fei­Fei. Perceptual losses    [26] A. Telea. An image inpainting technique based on
     for real­time style transfer and super­resolution. In           the fast marching method. Journal of Graphics Tools,
     European conference on computer vision, pages 694–              9(1):23–34, 2004.
     711. Springer, 2016.                                       [27] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Deep
[12] H. Li, G. Li, L. Lin, H. Yu, and Y. Yu. Context­aware           image prior. In The IEEE Conference on Computer
     semantic inpainting. IEEE Transactions on Cybernet­             Vision and Pattern Recognition (CVPR), June 2018.
     ics, 2018.                                                 [28] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli,
[13] H. Li, W. Luo, X. Qiu, and J. Huang. Image                      et al. Image quality assessment: from error visibility
     forgery localization via integrating tampering pos­             to structural similarity. IEEE transactions on image
     sibility maps. IEEE Transactions on Information                 processing, 13(4):600–612, 2004.
     Forensics and Security, 12(5):1240–1252, 2017.             [29] Z. Yan, X. Li, M. Li, W. Zuo, and S. Shan. Shift­
[14] T.­Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona,           net: Image inpainting via deep feature rearrangement.
     D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft             In The European Conference on Computer Vision
     coco: Common objects in context. In European con­               (ECCV), September 2018.
     ference on computer vision, pages 740–755, 2014.           [30] C. Yang, X. Lu, Z. Lin, E. Shechtman, O. Wang, and
[15] C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua,                 H. Li. High­resolution image inpainting using multi­
     L.­J. Li, L. Fei­Fei, A. Yuille, J. Huang, and K. Mur­          scale neural patch synthesis. In The IEEE Conference
     phy. Progressive neural architecture search. In The             on Computer Vision and Pattern Recognition (CVPR),
     European Conference on Computer Vision (ECCV),                  July 2017.
     September 2018.                                            [31] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S.
[16] G. Liu, F. A. Reda, K. J. Shih, T.­C. Wang, A. Tao,             Huang. Free­form image inpainting with gated con­
     and B. Catanzaro. Image inpainting for irregular                volution. arXiv preprint arXiv:1806.03589, 2018.
     holes using partial convolutions. In The European          [32] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S.
     Conference on Computer Vision (ECCV), 2018.                     Huang. Generative image inpainting with contextual
[17] P. Liu, X. Qi, P. He, Y. Li, M. R. Lyu, and I. King.            attention. In The IEEE Conference on Computer Vi­
     Semantically consistent image completion with fine­             sion and Pattern Recognition (CVPR), June 2018.
     grained details. arXiv preprint arXiv:1711.09345,          [33] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and
     2017.                                                           O. Wang. The unreasonable effectiveness of deep fea­
[18] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida.               tures as a perceptual metric. In The IEEE Conference
     Spectral normalization for generative adversarial net­          on Computer Vision and Pattern Recognition (CVPR),
     works. arXiv preprint arXiv:1802.05957, 2018.                   June 2018.
[19] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and      [34] P. Zhou, X. Han, V. I. Morariu, and L. S. Davis.
     A. A. Efros. Context encoders: Feature learning by              Learning rich features for image manipulation detec­
     inpainting. In The IEEE Conference on Computer Vi­              tion. In The IEEE Conference on Computer Vision
     sion and Pattern Recognition (CVPR), June 2016.                 and Pattern Recognition (CVPR), June 2018.
[20] C.­M. Pun, X.­C. Yuan, and X.­L. Bi. Image forgery         [35] X. Zhu, Y. Qian, X. Zhao, B. Sun, and Y. Sun. A
     detection using adaptive oversegmentation and fea­              deep learning approach to patch­based image inpaint­
     ture point matching. IEEE Transactions on Informa­              ing forensics. Signal Processing: Image Communica­
     tion Forensics and Security, 10(8):1705–1716, 2015.             tion, 67:90–99, 2018.
[21] R. Salloum, Y. Ren, and C.­C. J. Kuo. Image splicing
     localization using a multi­task fully convolutional net­
     work (mfcn). Journal of Visual Communication and
     Image Representation, 51:201–209, 2018.
[22] K. Simonyan and A. Zisserman. Very deep convo­
     lutional networks for large­scale image recognition.
     arXiv preprint arXiv:1409.1556, 2014.
[23] Y. Song, C. Yang, Z. Lin, H. Li, Q. Huang, and C. J.
     Kuo. Image inpainting using multi­scale feature im­
     age translation. arXiv preprint arXiv:1711.08590, 2,
     2017.
[24] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A.
     Alemi. Inception­v4, inception­resnet and the impact
     of residual connections on learning. In Thirty­First