<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ORCID:</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>An Effective Approach to Image Embeddings for E-Commerce</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Volodymyr Kubytskyi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Taras Panchenko</string-name>
          <email>taras.panchenko@knu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>LUN.UA</string-name>
          <email>vk@lun.ua</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ukraine</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>Akademika Glushkova ave., 4d, Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1945</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>There are lots of images in the e-commerce world. They need to be analyzed, classified, stored efficiently, compared, described in the text, searched through, and so on. The task of image deduplication or searching near similar images is important and challenging. One of the efficient approaches to these tasks is to have an image descriptor, which helps to identify distinctive features of pictures and to organize them. The model of a such descriptor is proposed in this work. Here we describe its structure and the successful experience of its application to real-life tasks in LUN.UA. Corresponding measurements of effectiveness in comparison with other approaches are also provided. The F1 score appeared to be higher for the proposed model. The estimation and future work are also fixed. eCommerce, image embedding, image representation vector, convolutional neural network, CNN, CNN layer combination, image classification, image descriptor, image deduplication, near duplicate image search, de-duplication problem</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
have it;
</p>
      <p>The e-commerce sphere consists of many topics, including internet marketing, automated data
gathering, and others. It deals a lot with different kinds of multimedia materials: images and videos.
Considering the huge amount of such kind of data, there are many challenges to dealing with these data
in an efficient way: to store, to process, to find, and so on. The modern world produces hundreds of
millions of images every day. There is a general question about the possibility of “comparing” images
with each other. For example, to optimize storage because, on hundreds of terabytes of data,
optimization saves thousands of dollars. Also, qualitative image embedding helps to solve the tasks of
image “stylistics” recognition, scanned copies of documents classification, visual navigation,
identification of diseases from X-ray or MRI images, and even three-dimensional reconstruction from
a set of two-dimensional images. In e-commerce we have such situations everywhere:
e-commerce platforms wish to deny the copies of the same goods (or to group them), which
could be identified by comparing the images particularly, – or, at least, to aggregate such “duplicates”;
it’s a good idea to have text descriptions for images (image-to-text, image description or
annotation, or title generation task) to compare automatically with the given description, – or just to
de-duplication of images, goods, advertisements, etc. – to make the content more systemized,
to provide the end user with a better experience, for the platform to look better and more solid, and also,
on the other hand, to organize sellers, to avoid cheating and prohibited behavior on the platform;
search by image functionality could be of real interest for e-commerce platforms, to give a user
the functionality of finding similar goods.</p>
      <p>For example, LUN.UA, a leading e-commerce platform in Ukrainian real estate, faced the same
problems of image and advertisement duplication on their platform, and it was a crucial point and one
of the key problems – to organize all real estate objects efficiently – to coin this research. So, we use</p>
      <p>
        2022 Copyright for this paper by its authors.
this case as an exhaustive example for our research and development. Image de-duplication or near
duplicate search [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] has been an important issue for many tasks and a much broader problem indeed,
and there is much research on this topic for different applications, for example:
      </p>
      <p>
         web search [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6 ref7">2-7</xref>
        ], images search [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ] and video search [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4-6</xref>
        ], and even web documents as a
whole [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ];
 consumer and personal photo management [
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        ];
 images clustering [
        <xref ref-type="bibr" rid="ref10 ref11">10,11</xref>
        ];
 semantic indexing [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and deep semantic features analysis [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ];
 real-time image protection and analysis [
        <xref ref-type="bibr" rid="ref14 ref2">2,14</xref>
        ] and large-scale high-load and fast
analysis [
        <xref ref-type="bibr" rid="ref15 ref16">15,16</xref>
        ];
 IoT applications, for example for visual sensors [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ];
 in biology [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], medicine [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and agriculture (cropping);
 for plagiarism detection [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ];
 and even for spam detection [
        <xref ref-type="bibr" rid="ref21 ref22">21,22</xref>
        ].
      </p>
      <p>Google and other services provide search by image, the functionality powered by near duplicate
images search behind the scene. In our research, we are aimed to solve the near duplicate image search
task. We have prepared the appropriate dataset and have built an adequate model as a solution. Also,
we measured the efficiency of the model proposed. As a result, the descriptive image vector is built by
combining different levels of layers of a constructed and modified Convolutional Neural Network
(CNN), a kind of artificial neural network. In this work, we describe the structure of the solution
proposed, the results obtained, and its benefits based on the application in LUN.UA for image
classification problems. Also, we draft the next research items on the topic.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The image embedding model proposed</title>
      <p>
        To solve the issues mentioned in the Introduction section, the universal image embedding system
was developed. And its application helped to solve the challenges we faced in LUN.UA. There are
existing approaches, and some of them are described in accessible sources, while some others are with
“closed sources”. They are based on different ideas and heuristics:
 sub-image retrieval [
        <xref ref-type="bibr" rid="ref23 ref24">23,24</xref>
        ];
 local-based binary representation [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ];
 keyframe identification with interest point matching and pattern learning [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ];
 keypoint-based with scale-rotation invariant pattern entropy analysis [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ];
 geometric invariant features [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ];
 color histogram, local complexity based on entropy [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], which is fast enough, as authors claim;
 min-hash and TF-IDF weighting [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], other signatures [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ];
 affinity propagation [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ];
 CNN-based methods and ideas [
        <xref ref-type="bibr" rid="ref33 ref34 ref35">33-35</xref>
        ]: global and local features matching, and intermediate
layers aggregation;
      </p>
      <p>
         colour histograms and locality sensitive hashing [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ], SIFT [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ], approximate set intersections
between documents computing [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] and coined new datasets for state-of-the-art methods development
and benchmarks for progress tracking [
        <xref ref-type="bibr" rid="ref18 ref37 ref38 ref39 ref40 ref9">9,18,37-40</xref>
        ].
      </p>
      <p>
        There is also a common approach for such a class of tasks named “embeddings” [
        <xref ref-type="bibr" rid="ref41 ref42 ref43 ref44 ref45 ref46">41-46</xref>
        ]. This means
building a description numeric vector for each image, which is distinctive enough to catch the specifics
of each particular image. We can find similar models (“embeddings”) for texts, audio, and other kinds
of media. The developed image embedding system with a decision block consists of 3 nodes:
1. Image feature extractor – embedding builder.
2. Distributed embeddings storage.
3. Decision-making unit.
      </p>
      <p>Let’s describe it in more detail.
2.1.</p>
    </sec>
    <sec id="sec-3">
      <title>Image feature extractor – embedding builder</title>
      <p>
        The extractor is formed by inheriting a pre-trained convolutional neural network for image
classification – its architecture and weights are used. For example, let's take a ResNet50 (residual neural
network) [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ] architecture having multiple convolutional blocks and downsampling blocks. We take a
pre-trained model, trained to recognize 1000 ImageNet [
        <xref ref-type="bibr" rid="ref48">48</xref>
        ] categories.
      </p>
      <p>The fully connected output layer responsible for the image class is removed from the existing
network. The critical point of the image feature extractor is the union of N-intermediate layers of a
convolutional neural network into a single resulting vector. This step is essential to obtain a qualitatively
new level of image description. Because, on different layers of the CNN, we have highlighted different
feature types – from the most abstract at the beginning to more specific ones at the end. So, on the initial
layers of the convolutional neural network, basic shapes are selected (point, line, circle), and towards
the end, the layers can choose complex objects and attributes (teapot, sofa, iron).</p>
      <p>Concatenating convolutional layers with different receptive image fields into a single normalized
vector is one of the ways that make it possible to form a vector of characteristics sufficient for
comparisons. The resulting extractor is resistant to linear image transformations, brightness, contrast,
and rotation changes by a given angle, and is also insensitive to image noise and watermarks.</p>
      <p>An image is given at the input of the characteristics extractor. The output is a one-dimensional vector
of real numbers describing the characteristics of the picture – image embedding.</p>
      <p>The proposed extractor can universally describe the critical characteristics of images. Existing
approaches with key points descriptors, pixel comparison, or using one before the last output layer of a
convolutional neural network do not give such a high-quality result, even with a linear combination of
the ones mentioned above. See the embedding builder neural network architecture in Figure 1.
2.2.</p>
    </sec>
    <sec id="sec-4">
      <title>Distributed embeddings storage</title>
      <p>For some tasks, like multi-million or near-realtime image comparison, storing the image embeddings
in distributed storage is essential. Because the generation of the image embedding takes approximately
1-2 seconds on Nvidia 1080Ti GPU, with the next usage in the decision block, there will be no need to
re-process the image by the feature extractor. It is proposed to use a document-oriented database since
no relation between the compared objects is expected, and storing data in JSON documents is an
advantage. Thus, any key-value storage that involves storing massive things as values is suitable for
storing feature vectors. We used MongoDB in all our experiments.
2.3.</p>
    </sec>
    <sec id="sec-5">
      <title>Decision-making unit</title>
      <p>The decision-making unit entirely determines the application area of the image embedding. Let's
consider two practical applications: finding near duplicate images and clustering rooms based on photos
from different shooting angles. This model was applied primarily to the internal private datasets of
images by LUN.UA, which is a country-leading portal in real estate. To solve the proposed problem,
collecting a dataset of pairs of sample images is necessary. In the first case, we needed to collect pairs
that are considered to be near-duplicates and which are not considered as such. In the second, there are
pairs where the photo of the same room is taken from different angles, and various rooms are taken
from random view angles.Then we run the embeddings builder and form a vector for each image in the
sample dataset. Each pair from the sample forms its new vector, which is obtained by combining the
Euclidean metrics, L1 and cosine distance, etc. (It's allowed to use the proposed metrics, taking into
account the equivalence of measures in finite-dimensional spaces. However, there are non-equivalent
norms for infinite-dimensional spaces, and using a combination of metrics in the resulting vector can
significantly improve the quality of the comparison.) Image feature vectors often turn out to be sparse,
so the additional use of the cosine distance is highly influential.</p>
      <p>The new image pair vector formed in this way will be used to train a new fully-connected N-layer
neural network with one output neuron. As a result of training, the decision block is trained to compare
pairs of images. Depending on the training sample, the block can solve a particular problem.</p>
    </sec>
    <sec id="sec-6">
      <title>3. Results, estimations, and the discussion</title>
      <p>Building the image embedding in a proposed way and applying it to the task of image near-duplicate
detection showed incredible results on the private datasets (LUN.UA real estate images) – 8-10x fewer
mistakes in duplicate determination compared to SIFT, SURF, ORB keypoints algorithms. We have
proceeded with experiments on the two mentioned sub-types of the image comparison tasks, namely:
 near duplicate images of various graphical contexts (dataset size: 80 000 image pairs, see
examples in Figure 2),</p>
      <p> multi-angle photos of the (same) rooms (dataset size: 12 500 image pairs, see examples in
Figure 3).</p>
      <p>These datasets are private now, but we are working on making these data publicly available. The
comparison was done for our solution and 3 alternative techniques:</p>
      <p> image embedding formed by taking previous before the last layer of pre-trained ResNet50 –
the image feature vector,
 SIFT / SURF / ORB descriptors,
 perceptual DCT hash,
 image embedding formed by the combination of intermediate layers of ResNet50 (the proposed
method, see the scheme in Figure 4).</p>
      <p>The Precision, Recall, and F1 measure values are presented in Table 1 and Table 2 for these two
tasks.</p>
      <p>Thus, we can conclude, that the proposed model is precise enough for the task stated, and also the
method works quite effectively on the proper hardware (up to 2 sec for images up to 12 megapixels on
the NVIDIA 1080Ti GPU chip). Table 1 and 2 shows that the proposed model outperforms other known
techniques and shows the best benchmarks in the tests conducted. Also, the model should be tested over
other available datasets to ensure generality. Authors are going to do this in future work.</p>
      <p>We suggest the proposed model also should be effective for similar tasks mentioned in the
Introduction section. This should be checked in the next research.</p>
    </sec>
    <sec id="sec-7">
      <title>4. Further research</title>
      <p>We suppose the much broader applications of the model proposed to other tasks and more
applications in other spheres for similar tasks. So, further fundamental research is needed on the topic:
 to investigate the influence of the initial architecture of a convolutional neural network, with
the layers of which we make a combination to build an image embedding;
 to investigate the influence of pre-training of the selected architecture on image classification
tasks per 1000 category because the previous training let us consider the convolutional neural network
as a feature extractor;</p>
      <p> to investigate possible options for choosing layers, their number, and the method of
combinations (concatenation, averaging, difference) – often called the meta-parameters tuning;
 to investigate the applicability of vector representation for image “compression” (packing into
a vector – then transfer – and then unpacking) – to enhance the application possibilities of the model
proposed, a kind of transfer learning technique;</p>
      <p> to analyze the effectiveness of application on such classes of tasks as recognition of “stylistics”
of images, classification of forged scanned copies of documents, visual navigation, and recognition of
diseases by X-ray or MRI images, which would extend the model applicability dramatically.</p>
    </sec>
    <sec id="sec-8">
      <title>5. Conclusions</title>
      <p>In this work, we proposed a new model for image description – the image embedding vector
construction and demonstrated its applications.</p>
      <p>The task and its applications were overviewed. The known methods were outlined, namely, the DCT
hash approach, SIFT, SURF, ORB methods, key points, CNN, and ResNet-50 as the most promising
among them. The model, its inner structure, and the motivation for it were presented here. The main
idea of the model proposed is to combine the selected low-level, mid-level and high-level features from
the CNN constructed to achieve better precision and F1 score over the LUN’s dataset.</p>
      <p>
        Then this model was tested in an e-commerce task [
        <xref ref-type="bibr" rid="ref49 ref50 ref51 ref52 ref53">49-53</xref>
        ], and applied to the real-world dataset of
LUN.UA, namely, the private set of real estate images, and obtained an excellent result, which exceeds
expectations and appeared to be much better than competitors – previously known models and
approaches, being estimated by F1 measure. The benchmarks and calculations supported this
conclusion. The promising experimental results demonstrate the validity and effectiveness of the
proposed model. Now, this model is in production use in LUN.UA. Its research and development
continue. Also, the next questions for further research were highlighted here.
      </p>
    </sec>
    <sec id="sec-9">
      <title>6. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K. K.</given-names>
            <surname>Thyagharajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kalaiarasi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Review on Near-Duplicate Detection of Images using Computer Vision Techniques</article-title>
          ,
          <source>Archives of Computational Methods in Engineering 28.3</source>
          (
          <year>2021</year>
          ):
          <fpage>897</fpage>
          -
          <lpage>916</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.J.</given-names>
            <surname>Foo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Zobel,</surname>
          </string-name>
          <article-title>SICO: a system for detection of near-duplicate images during search</article-title>
          ,
          <source>in: 2007 IEEE International Conference on Multimedia and Expo</source>
          , IEEE (
          <year>2007</year>
          ):
          <fpage>595</fpage>
          -
          <lpage>598</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.J.</given-names>
            <surname>Foo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zobel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.M.M.</given-names>
            <surname>Tahaghoghi</surname>
          </string-name>
          ,
          <article-title>Detection of near-duplicate images for web search</article-title>
          ,
          <source>in: Proceedings of the 6th ACM International Conference on Image and Video Retrieval</source>
          (
          <year>2007</year>
          ):
          <fpage>557</fpage>
          -
          <lpage>564</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Hauptmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.W.</given-names>
            <surname>Ngo</surname>
          </string-name>
          ,
          <article-title>Practical elimination of near-duplicates from web video search</article-title>
          ,
          <source>in: Proceedings of the 15th ACM international conference on Multimedia</source>
          (
          <year>2007</year>
          ):
          <fpage>218</fpage>
          -
          <lpage>227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W.L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.W.</given-names>
            <surname>Ngo</surname>
          </string-name>
          ,
          <article-title>On the annotation of web videos by efficient near-duplicate search</article-title>
          ,
          <source>IEEE Transactions on Multimedia 12.5</source>
          (
          <issue>2010</issue>
          ),
          <volume>12</volume>
          .5:
          <fpage>448</fpage>
          -
          <lpage>461</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.W.</given-names>
            <surname>Ngo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Hauptmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.K.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <article-title>Real-time near-duplicate elimination for web video search with content and context</article-title>
          ,
          <source>IEEE Transactions on Multimedia 11.2</source>
          (
          <year>2009</year>
          ):
          <fpage>196</fpage>
          -
          <lpage>207</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bhavani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.A.</given-names>
            <surname>Narayana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sreevani</surname>
          </string-name>
          ,
          <article-title>A novel approach for detecting near-duplicate web documents by considering images, text, size of the document and domain</article-title>
          ,
          <source>in: ICCCE 2020</source>
          , Springer, Singapore (
          <year>2021</year>
          ):
          <fpage>1355</fpage>
          -
          <lpage>1366</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.T.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Consumer photo management and browsing facilitated by near-duplicate detection with feature filtering</article-title>
          ,
          <source>Journal of Visual Communication and Image Representation 21.3</source>
          (
          <year>2010</year>
          ):
          <fpage>256</fpage>
          -
          <lpage>268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jinda-Apiraksa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vonikakis</surname>
          </string-name>
          , S. Winkler, California-ND:
          <article-title>An annotated dataset for nearduplicate detection in personal photo collections</article-title>
          ,
          <source>in: 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX)</source>
          ,
          <source>IEEE</source>
          (
          <year>2013</year>
          ):
          <fpage>142</fpage>
          -
          <lpage>147</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.J.</given-names>
            <surname>Foo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zobel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <article-title>Clustering near-duplicate images in large collections</article-title>
          ,
          <source>in: Proceedings of the International Workshop on Multimedia Information Retrieval (MIR'07)</source>
          , Association for Computing Machinery, New York, NY, USA (
          <year>2007</year>
          ):
          <fpage>21</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kalaiarasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.K.</given-names>
            <surname>Thyagharajan</surname>
          </string-name>
          ,
          <article-title>Clustering of near duplicate images using bundled features</article-title>
          ,
          <source>Cluster Computing 22.5</source>
          (
          <year>2019</year>
          ):
          <fpage>11997</fpage>
          -
          <lpage>12007</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.G.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.W.</given-names>
            <surname>Ngo</surname>
          </string-name>
          ,
          <article-title>Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval</article-title>
          ,
          <source>Computer Vision and Image Understanding 113.3</source>
          (
          <year>2009</year>
          ):
          <fpage>405</fpage>
          -
          <lpage>414</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>An efficient hierarchical near-duplicate video detection algorithm based on deep semantic features</article-title>
          , in: International Conference on Multimedia Modeling, Springer, Cham (
          <year>2020</year>
          ):
          <fpage>752</fpage>
          -
          <lpage>763</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Secure real-time image protection scheme with near-duplicate detection in cloud computing</article-title>
          ,
          <source>Journal of Real-Time Image Processing 17.1</source>
          (
          <year>2020</year>
          ):
          <fpage>175</fpage>
          -
          <lpage>184</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kordopatis-Zilos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Patras</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kompatsiaris</surname>
          </string-name>
          ,
          <article-title>Finding near-duplicate videos in large-scale collections</article-title>
          ,
          <source>in: Video Verification in the Fake News Era</source>
          , Springer, Cham (
          <year>2019</year>
          ):
          <fpage>91</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>W.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Charikar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>High-confidence near-duplicate image detection</article-title>
          ,
          <source>in: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval</source>
          (
          <year>2012</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Fast and accurate near-duplicate image elimination for visual sensor networks</article-title>
          ,
          <source>International Journal of Distributed Sensor Networks 13.2</source>
          (
          <year>2017</year>
          ):
          <fpage>12p</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.E.</given-names>
            <surname>Koker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.S.</given-names>
            <surname>Chintapalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.A.</given-names>
            <surname>Talbot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wainstock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cicconet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.C.</given-names>
            <surname>Walsh</surname>
          </string-name>
          ,
          <article-title>On Identification and Retrieval of Near-Duplicate Biological Images: a New Dataset and Protocol</article-title>
          ,
          <source>in: 2020 the 25th International Conference on Pattern Recognition</source>
          ,
          <string-name>
            <surname>ICPR</surname>
          </string-name>
          , IEEE (
          <year>2021</year>
          ):
          <fpage>3114</fpage>
          -
          <lpage>3121</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hadipour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Aram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sadeghian</surname>
          </string-name>
          ,
          <article-title>Similar multi-modal image detection in multi-source dermatoscopic images of cancerous pigmented skin lesions</article-title>
          ,
          <source>in: Advances in Computer Vision and Computational Biology</source>
          , Springer, Cham (
          <year>2021</year>
          ):
          <fpage>109</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lall</surname>
          </string-name>
          , imPlag:
          <article-title>Detecting image plagiarism using hierarchical near duplicate retrieval</article-title>
          ,
          <source>in: 2015 Annual IEEE India Conference (INDICON)</source>
          ,
          <source>IEEE</source>
          (
          <year>2015</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nangia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gupta</surname>
          </string-name>
          , W. Nejdl,
          <article-title>Detecting image spam using visual features and near duplicate detection</article-title>
          ,
          <source>in: Proceedings of the 17th international conference on World Wide Web</source>
          (
          <year>2008</year>
          ):
          <fpage>497</fpage>
          -
          <lpage>506</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.K.</given-names>
            <surname>Josephson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Charikar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Filtering image spam with near-duplicate detection</article-title>
          , in: CEAS (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sukthankar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Huston</surname>
          </string-name>
          ,
          <article-title>Efficient near-duplicate detection and sub-image retrieval</article-title>
          ,
          <source>ACM Multimedia 4.1</source>
          (
          <year>2004</year>
          ):
          <fpage>5p</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sukthankar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Huston</surname>
          </string-name>
          ,
          <article-title>An efficient parts-based near-duplicate and sub-image retrieval system</article-title>
          ,
          <source>in: Proceedings of the 12th annual ACM International Conference on Multimedia</source>
          (
          <year>2004</year>
          ):
          <fpage>869</fpage>
          -
          <lpage>876</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>F.</given-names>
            <surname>Nian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Efficient near-duplicate image detection with a local-based binary representation</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          <volume>75</volume>
          .5 (
          <year>2016</year>
          ):
          <fpage>2435</fpage>
          -
          <lpage>2452</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>W.L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.W.</given-names>
            <surname>Ngo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.K.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Near-duplicate keyframe identification with interest point matching and pattern learning</article-title>
          ,
          <source>IEEE Transactions on Multimedia 9.5</source>
          (
          <year>2007</year>
          ):
          <fpage>1037</fpage>
          -
          <lpage>1048</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>W.L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.W.</given-names>
            <surname>Ngo</surname>
          </string-name>
          ,
          <article-title>Scale-rotation invariant pattern entropy for keypoint-based near-duplicate detection</article-title>
          ,
          <source>IEEE Transactions on Image Processing 18.2</source>
          (
          <year>2009</year>
          ):
          <fpage>412</fpage>
          -
          <lpage>423</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Geometric invariant features in the Radon transform domain for nearduplicate image detection</article-title>
          ,
          <source>Pattern Recognition 47.11</source>
          (
          <year>2014</year>
          ):
          <fpage>3630</fpage>
          -
          <lpage>3640</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A Fast Algorithm for Near-Duplicate Image Detection</article-title>
          , in: 2021
          <source>IEEE International Conference on Artificial Intelligence and Industrial Design</source>
          ,
          <string-name>
            <surname>AIID</surname>
          </string-name>
          , IEEE (
          <year>2021</year>
          ):
          <fpage>360</fpage>
          -
          <lpage>363</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Philbin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Near duplicate image detection: Min-hash and TF-IDF weighting</article-title>
          ,
          <source>in: BMVC</source>
          <volume>810</volume>
          (
          <year>2008</year>
          ):
          <fpage>812</fpage>
          -
          <lpage>815</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>L.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.Y.</given-names>
            <surname>Suen</surname>
          </string-name>
          ,
          <article-title>Variable-length signature for near-duplicate image matching</article-title>
          ,
          <source>IEEE Transactions on Image Processing 24.4</source>
          (
          <year>2015</year>
          ):
          <fpage>1282</fpage>
          -
          <lpage>1296</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Zhang,</surname>
          </string-name>
          <article-title>Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb</article-title>
          ,
          <source>Computer Vision and Image Understanding</source>
          <volume>124</volume>
          (
          <year>2014</year>
          ):
          <fpage>31</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.N.</given-names>
            <surname>Yang</surname>
          </string-name>
          , Y. Liu,
          <article-title>Near-duplicate image detection system using coarseto-fine matching scheme based on global and local CNN features</article-title>
          , Mathematics,
          <volume>8</volume>
          .4 (
          <year>2020</year>
          ):
          <fpage>644</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kordopatis-Zilos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          , I. Patras,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kompatsiaris</surname>
          </string-name>
          ,
          <article-title>Near-duplicate video retrieval by aggregating intermediate CNN layers</article-title>
          , in: International Conference on Multimedia Modeling, Springer, Cham (
          <year>2017</year>
          ):
          <fpage>251</fpage>
          -
          <lpage>263</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Zhang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Single- and
          <article-title>cross-modality near duplicate image pairs detection via spatial transformer comparing CNN</article-title>
          , Sensors,
          <volume>21</volume>
          .1 (
          <year>2021</year>
          ):
          <fpage>255</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>O.</given-names>
            <surname>Chum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Philbin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Isard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Scalable near identical image and shot detection</article-title>
          ,
          <source>in: Proceedings of the 6th ACM International Conference on Image and Video Retrieval</source>
          (
          <year>2007</year>
          ):
          <fpage>549</fpage>
          -
          <lpage>556</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>B.</given-names>
            <surname>Barz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Denzler</surname>
          </string-name>
          ,
          <article-title>Do we train on test data? Purging cifar of near-duplicates</article-title>
          ,
          <source>Journal of Imaging</source>
          ,
          <volume>6</volume>
          .6 (
          <year>2020</year>
          ):
          <fpage>41</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>H.</given-names>
            <surname>Matatov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Amir</surname>
          </string-name>
          ,
          <article-title>Dataset and case studies for visual near-duplicates detection in the context of social media, arXiv preprint (</article-title>
          <year>2022</year>
          ) arXiv:
          <fpage>2203</fpage>
          .
          <fpage>07167</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Dijana</surname>
            <given-names>Tralic</given-names>
          </string-name>
          , Iran Zupancic, Sonja Grgic, Mislar Grgic,
          <article-title>CoMoFoD: New database for copy-move forgery detection</article-title>
          ,
          <source>in: Proceedings of the 55th International Symposium ELMAR</source>
          (
          <year>2013</year>
          ):
          <fpage>49</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>L.</given-names>
            <surname>Morra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lamberti</surname>
          </string-name>
          ,
          <article-title>Benchmarking unsupervised near-duplicate image detection</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>135</volume>
          (
          <year>2019</year>
          ):
          <fpage>313</fpage>
          -
          <lpage>326</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>Bjorn</given-names>
            <surname>Barz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Joachim</given-names>
            <surname>Denzler</surname>
          </string-name>
          ,
          <article-title>Hierarchy-based Image Embeddings for Semantic Image Retrieval (</article-title>
          <year>2019</year>
          ) URL: https://arxiv.org/pdf/
          <year>1809</year>
          .09924v4.pdf
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <surname>Maxim</surname>
            <given-names>Berman</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herve</surname>
            <given-names>J</given-names>
          </string-name>
          ´egou, Andrea Vedaldi,
          <article-title>Iasonas Kokkinos and Matthijs Douze, MultiGrain: a unified image embedding for classes and instances (</article-title>
          <year>2019</year>
          ) URL: https://arxiv.org/pdf/
          <year>1902</year>
          .05509v2.pdf
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <surname>Zehao</surname>
            <given-names>Yu</given-names>
          </string-name>
          , Jia Zheng, Dongze Lian,
          <article-title>Zihan Zhou and Shenghua Gao, Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding (</article-title>
          <year>2019</year>
          ) URL: https://arxiv.org/pdf/
          <year>1902</year>
          .09777v3.pdf
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <surname>Anita</surname>
            <given-names>Rau</given-names>
          </string-name>
          , Guillermo Garcia-Hernando, Danail Stoyanov, Gabriel J. Brostow and
          <string-name>
            <given-names>Daniyar</given-names>
            <surname>Turmukhambetov</surname>
          </string-name>
          ,
          <article-title>Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings (</article-title>
          <year>2020</year>
          ) URL: https://arxiv.org/pdf/
          <year>2008</year>
          .05785v1.pdf
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <surname>Guang</surname>
            <given-names>Feng</given-names>
          </string-name>
          , Zhiwei Hu, Lihe Zhang and Huchuan Lu,
          <article-title>Encoder Fusion Network with CoAttention Embedding for Referring Image Segmentation (</article-title>
          <year>2021</year>
          ) URL: https://arxiv.org/pdf/2105.01839v1.pdf
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>Maryam</given-names>
            <surname>Asadi-Aghbolaghi</surname>
          </string-name>
          ,
          <article-title>Reza Azad, Mahmood Fathy and Sergio Escalera, Multi-level Context Gating of Embedded Collective Knowledge for Medical Image Segmentation (</article-title>
          <year>2020</year>
          ) URL: https://arxiv.org/pdf/
          <year>2003</year>
          .05056v1.pdf
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <surname>Kaiming</surname>
            <given-names>He</given-names>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep Residual Learning for Image Recognition (</article-title>
          <year>2015</year>
          ) URL: https://doi.org/10.48550/arXiv.1512.03385
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <article-title>ImageNet: A Large-Scale Hierarchical Image Database, IEEE Computer Vision</article-title>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>V.L.</given-names>
            <surname>Pleskach</surname>
          </string-name>
          , E-commerce technologies, Kyiv,
          <string-name>
            <surname>KNTEU</surname>
          </string-name>
          (
          <year>2004</year>
          ):
          <fpage>226p</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>T.I.</given-names>
            <surname>Lytvynenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.V.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.D.</given-names>
            <surname>Redko</surname>
          </string-name>
          ,
          <article-title>Sales Forecasting using Data Mining Methods</article-title>
          , Bulletin of Taras Shevchenko National University of Kyiv,
          <source>Series: physical-mathematical sciences 4</source>
          (
          <year>2015</year>
          ):
          <fpage>148</fpage>
          -
          <lpage>155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>I.</given-names>
            <surname>Bieda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <source>A Systematic Mapping Study on Artificial Intelligence Tools</source>
          Used in Video Editing,
          <source>International Journal of Computer Science and Network Security 22.3</source>
          (
          <year>2022</year>
          ):
          <fpage>312</fpage>
          -
          <lpage>318</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>I.</given-names>
            <surname>Bieda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kisil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <article-title>An Approach to Scene Change Detection</article-title>
          ,
          <source>in: Proceedings of the 11th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications</source>
          (IDAACS'
          <year>2021</year>
          ) volume
          <volume>1</volume>
          (
          <year>2021</year>
          ):
          <fpage>489</fpage>
          -
          <lpage>493</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>I.</given-names>
            <surname>Bieda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <article-title>A Comparison of Scene Change Localization Methods over the Open Video Scene Detection Dataset</article-title>
          ,
          <source>International Journal of Computer Science and Network Security 22.6</source>
          (
          <year>2022</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>