<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Multimodal Approach for Semantic Patent Image Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kader Pustu-Iren</string-name>
          <email>kader.pustu@tib.eu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gerrit Bruns</string-name>
          <email>gerrit.bruns@tib.eu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ralph Ewerth∗</string-name>
          <email>ralph.ewerth@tib.eu</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patent Image Similarity Search, Deep Learning, Mulitmodal Feature</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Representations</institution>
          ,
          <addr-line>Scene Text Spotting</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TIB - Leibniz Information Centre for, Science and Technology</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TIB - Leibniz Information Centre for, Science and Technology</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>TIB - Leibniz Information Centre for, Science and Technology</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>45</fpage>
      <lpage>49</lpage>
      <abstract>
        <p>Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in ifgures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of diferent modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Image search; Content analysis and
feature selection; • Computing methodologies → Visual
contentbased indexing and retrieval; Image representations.</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Patent experts and researchers often encounter language and
terminology barriers when conducting searches to identify research
or patent gaps, (newly) emerging technology developments, or to
check the patentability of research results. Existing patent retrieval
methods are primarily based on textual searches and largely exclude
illustrations and the relationship between text and image. Often,
however, the innovation of a patent can be identified with the help
of an illustration, and patents with similar or related innovations
can be quickly analysed by looking at illustrations in a comparative
way. In this context, a survey with patent experts confirms the
importance of illustrations in their high informative value and the
demand for an image-based search [
        <xref ref-type="bibr" rid="ref71">8</xref>
        ]. Moreover, with the
continuous refinement of already patented research, the terminology
used changes [
        <xref ref-type="bibr" rid="ref26 ref73">3</xref>
        ], making it more dificult to find corresponding
patents. This problem is exacerbated when cross-linguistic searches
are conducted. Therefore, illustrations provide an alternative way
to enable the identification of relevant results in patents,
regardless of language and terminology. The use of illustrations is also
advantageous for domain and patent class independent searches.
In this way, intellectual property (IP) rights can be evaluated for
further application domains, which is only possible to a limited
extent with a purely textual search. This is especially relevant for
basic and technical patents, whose scope of application is often not
clear at the beginning of the creation of an exploitation strategy.
      </p>
      <p>In this paper, we present a novel multimodal system for semantic
patent image retrieval in a query-by-example scenario. To extract
visually relevant features from images, pre-trained embeddings
using deep neural networks are used. Furthermore, scene text spotting
is applied in order to extract numerals from the images and map
them to their mentions in the patent text. Next, we derive textual
features from the relevant sentences in the text utilizing sentence
transformers. Finally, textual and visual features are used to index
the represented illustrations. Experimental results are presented for
semantic image search investigating both unimodal and multimodal
feature sets.</p>
      <p>The rest of the paper is organized as follows. We review related
work in Section 2. Section 3 introduces the proposed approach
for multimodal patent image search. We provide an experimental
evaluation of the proposed solution in Section 4 and conclude the
paper with a short discussion of results in Section 5.
2</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        Previous approaches to patent information retrieval have been
largely limited to textual information [
        <xref ref-type="bibr" rid="ref10">19</xref>
        ]. However, terminology
in patents changes continuously due to the constant evolution
of the presented content and is inconsistent for this reason [
        <xref ref-type="bibr" rid="ref26 ref73">3</xref>
        ].
Often, innovative terminology is "invented" along with the actual
invention. One result of this evolution is that search results are often
incomplete and do not display all relevant patents. The (additional)
evaluation of non-textual information in the form of illustrations,
such as technical drawings, graphs and diagrams, can facilitate and
significantly improve the search for similar or relevant patents. In
addition, references to the relevant text passages are often given in
numerical form in these illustrations, so that automatic recognition
of these image-text references can also significantly improve the
quality of the (multimodal) search results.
      </p>
      <p>
        The more general problem of searching in image databases
(image retrieval) has been intensively researched in the last decades.
Simpler methods for search in image databases are usually based on
so-called low-level features, which technical descriptions of shape,
color, or texture. However, results based on such features very often
do not meet the search needs of users, which are mostly of a content
or semantic nature ("semantic gap") [21]. In recent years,
significant progress has been made to automatically recognize content in
images (denoted as object recognition or visual concept detection)
[22], especially through deep learning approaches [
        <xref ref-type="bibr" rid="ref3 ref82">5, 10, 30</xref>
        ]. In
this way, search queries of a content-related nature can be more
accurately answered.
      </p>
      <p>An important aspect of the presented approach is the similarity
search that follows feature extraction. Current similarity search
approaches learn compact codes to replace images [18, 27, 28]. The
compact codes usually compress high-dimensional features of a
Convolutional Neural Network (CNN) trained on specific datasets
suitable for the given task. However, these methods are not
optimized for the technical and schematic illustrations in patents, so
there is a need for research and development in this area.</p>
      <p>
        So far, there are relatively few specific approaches for searching
visual information in patents [29]. An example is the Patmedia
method for similarity search [25], extensions of this [20, 23, 24], or
other approaches for concept-based graphical search [
        <xref ref-type="bibr" rid="ref21">11, 13</xref>
        ]. These
methods generally extract textual and visual low-level features from
patent images and train detectors that identify a limited number
of predefined concepts. Experiments of these works show that the
combination of visual and textual features works best for the task
of concept detection. More recent approaches [
        <xref ref-type="bibr" rid="ref78">9, 14</xref>
        ] establish the
references of figures and related text passages using an automatic
detection of the corresponding numerical referencing in the figures.
Another approach [4] uses SIFT-like local histograms as features
and represents the images in patents using Fisher vectors. In the
experiments based on the 2011 CLEF-IP evaluation [15], the best
retrieval results were achieved by late fusion of textual and
nontextual results. Bhatti and Hanbury [
        <xref ref-type="bibr" rid="ref26 ref73">3</xref>
        ] provide an overview of
further research regarding specific figure types (photo, flow chart,
technical drawings, diagrams, graphs) that may also be relevant
for patent retrieval. However, to date no integrated patent retrieval
system exploiting multimodal search does exist. The representation
and quality of the images in patents as well as their schematic and
sketchy character require specific approaches or the recognition of
special objects that are particularly relevant in patents.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>MULTIMODAL PATENT IMAGE SEARCH</title>
      <p>We now discuss the proposed system that incorporates multimodal
patent features to establish a similarity search based on illustrations.
Figure 1 illustrates the individual steps. First, we extract visual and
textual features (Section 3.1, 3.2) from the patent images. Then,
based on each modality an index of corresponding image feature
vectors is built (Section 3.3). Finally, the most similar results to a
query image can be retrieved by re-ranking results based on both
indexes.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Image Feature Extraction</title>
      <p>Patent images are a special category of images that have
sketchlike characteristics. They usually consist of technical drawings,
diagrams, or graphs and are mostly black and white. While smaller
details can often be of great relevance for interpretation, they often
also contain redundant patterns. To represent these kind of images,
features are extracted using deep neural network. We use the
Contrastive Language-Image Pre-training (CLIP) [16] model that was
trained on a multimodal dataset of 400 million image-text-pairs
collected from the internet. The CLIP model is aimed at learning
visual concepts from natural language supervision and is primarily
designed for flexible zero-shot computer vision classification on
arbitrary image datasets by providing simple textual image
descriptions. This powerful approach has improved the state of the art
on several benchmark datasets including ImageNet Sketch [26],
which contains sketch images with characteristics similar to patent
images. This motivates us to utilize CLIP embeddings for the task of
patent image similarity search. In particular, we use the pre-trained
vision transformer (ViT-B/32) to extract visual features and embed
the patent images.
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Textual Feature Extraction</title>
      <p>Patent figures usually contain image text, particularly numbers that
can be used to link illustrated concepts to a description in the patent
document. To use these textual descriptions, we first apply scene
text spotting methods (Section 3.2.1). After relevant sentences have
been identified, they are embedded using sentence transformers
(Section 3.2.2).
3.2.1 Image-Text Relations using OCR. Optical Character
Recognition (OCR) aims to recognize characters in images. Recently, scene
text recognition methods based on neural networks have emerged.
We use a two-step approach in which we first detect text blocks
and then recognize the text they contain. For scene text detection,
the CRAFT (Character Region Awareness For Text detection) [2]
model for character-level text detection is applied. Subsequently,
a four-stage deep scene text recognition (STR) framework [1] is
employed to extract the text. Although these methods were trained
for recognizing text in real-world scenes, they prove to be very
accurate on patent images, for which text detection and
recognition is generally easier than for scene text. Once the image text
is extracted, we keep the numbers and prune irrelevant text. The
numbers are then used to identify the relevant sentences in the
XML file of the corresponding patent document. For this purpose,
we tokenize the text, search for the numbers and keep all matching
sentences that provide a description for the illustrated concepts.
Exemplary text mappings resulting from the scene text recognition
can be seen in Figure 2.
3.2.2 Sentence Transformers. Sentence transformer neural
networks were recently introduced and can be used to compute dense
vector representations for sentences. We use a RoBERTa [12] model
that was pre-trained to produce semantically meaningful sentence
embeddings (accordingly to Sentence-BERT[17]) and optimized
for semantic textual similarity (STS) in the English language. We
embed all the sentences found in the previous image-text mapping
step. Finally, an average vector over all related sentences is created
to represent an patent image.
3.3</p>
    </sec>
    <sec id="sec-7">
      <title>Similarity Search</title>
      <p>Based on the extracted feature representations, indexes are built
using the FAISS library[7]. An index is based on product
quantization [6] and allows for the eficient comparisons between query
vectors and stored vectors based on cosine similarity and returns
nearest neighbors. We built separate indexes for both the image and
textual feature modalities based on a dataset comprised of 30, 379
patent images. Subsequently, the nearest neighbors of a query
image can be retrieved by similarity search based a) on the stored
visual features, b) on the stored textual features, or c) on the basis
of a combination of ranking results of both indexes. For the last
option we explore two diferent re-ranking approaches. The first
one is based on averaging the resulting similarity scores of each
modality, whereas in the second strategy the final ranking is based
on reordering according to maximum scores.
4</p>
    </sec>
    <sec id="sec-8">
      <title>EVALUATION AND DISCUSSION</title>
      <p>In this section, the patent image retrieval approaches are evaluated
according to the experimental setup in Section 4.2) and based on a
predefined patent collection (Section 4.1). We discuss outcomes of
the experiments in Section 4.3.
4.1</p>
    </sec>
    <sec id="sec-9">
      <title>Patent Dataset</title>
      <p>We conduct our retrieval experiments on a patent collection from
the European Patent Ofice (EPO) focusing on the exemplary fields
of autonomous driving and wind power. To this end, we download
patents from the time period 2007 to 2020 and ensure that each
patent contains an XML file to parse the structured text and image
information. After excluding formulas, our final patent collection
comprises 2, 858 patent documents with a total of 30, 379 figures
of technical drawings, diagrams and graphs. Analogously, another
3, 770 images from 300 patent documents are reserved as test data.
4.2</p>
    </sec>
    <sec id="sec-10">
      <title>Experiments</title>
      <p>The performance of our system is evaluated using the average
precision (AP) score which is the most commonly used quality
measure for retrieval approaches. The AP score is calculated from
a list of ranked documents as follows:</p>
      <p>AP = Õ (1)</p>
      <p>( − −1)

where  and  are the precision and recall at the  th threshold.
In general, AP is the average of the precision scores at each
relevant document. To evaluate the overall performance, the mean AP
(mAP) score is calculated by taking the mean value of the AP scores
across diferent queries. To verify the performance of our system,
we randomly selected 20 query images along with their
descriptions (described in Section 3.2.1) from the test data and evaluated</p>
      <p>mAP results up to rank 50 for randomly chosen
queries. Re-ranking (avg) denotes the averaging of the
different modalities’ scores. Re-ranking (max) denotes the
reordering according to maximum scores.</p>
      <p>Textual Features</p>
      <p>Visual Features
4.3</p>
    </sec>
    <sec id="sec-11">
      <title>Discussion</title>
      <p>The results of our experiments are shown in Table 1. Using only
visual features for image retrieval yields a slightly higher mAP score
of 0.696 compared to using textual features. The combination of
both modalities yields the highest mAP score of 0.715 when scores
of the textual and visual similarity search are averaged. Reordering
the similarity values according to the maximum scores for both
feature sets had a smaller efect on the similarity search performance.
The results suggest that combining both modalities can help
increase the quality of retrieval results. In general, results based on
visual features were easier to annotate since the visual embeddings
retrieve mostly visually similar results. It should also be noted that
results based on textual features were harder to inspect and thus
annotated with the additional help of the sentences representing
the retrieved image. In general, it was observed that textual
features retrieved semantically relevant images. Thus, the combination
of both feature representations presents a good mixture of both
visually and semantically related patent images.
5</p>
    </sec>
    <sec id="sec-12">
      <title>CONCLUSIONS</title>
      <p>The discussion of related work for patent image retrieval revealed
that existing work is either outdated or insuficient when it comes
to exploiting the multimodal information that patents provide. In
this paper, we have presented a framework that exploits multimodal
features to enable semantic patent image search. Image-text
relations are identified through scene text spotting and OCR yielding
a mapping of in-figure numbers to the corresponding text. This
allowed us to embed relevant text passages in feature vector
representations. Additionally, we successfully embedded the shape
and topological information in images using powerful deep neural
networks. We exploit both textual and image features in order to
facilitate semantic similarity for patent images. Experimental results
demonstrated the feasibility of the approach, while suggesting that
the combination of both modalities is beneficial.</p>
      <p>In the future, we plan to exploit further information in images
such as non-numeric image text. Moreover, we plan to incorporate
multimodal information in an end-to-end network and have a joint
framework to conduct patent search. Thereby, we intent to fuse
features by exploiting multimodal machine learning architectures.</p>
    </sec>
    <sec id="sec-13">
      <title>ACKNOWLEDGEMENTS</title>
      <p>We would like to sincerely thank the reviewers for their valuable and
comprehensive comments. This work is financially supported by
the Federal Ministry of Education and Research (BMBF,
Bundesministerium für Bildung und Forschung, project reference 01IO2004A).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>2019 IEEE/CVF International Conference on Computer Vision</source>
          , ICCV 2019, Seoul,
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Korea</surname>
          </string-name>
          (South),
          <source>October 27 - November 2</source>
          ,
          <year>2019</year>
          . IEEE,
          <fpage>4714</fpage>
          -
          <lpage>4722</lpage>
          . https://doi.org/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          10.1109/ICCV.
          <year>2019</year>
          .
          <volume>00481</volume>
          [2]
          <string-name>
            <given-names>Youngmin</given-names>
            <surname>Baek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Bado</given-names>
            <surname>Lee</surname>
          </string-name>
          , Dongyoon Han,
          <string-name>
            <given-names>Sangdoo</given-names>
            <surname>Yun</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Hwalsuk</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Vision</surname>
            and Pattern Recognition,
            <given-names>CVPR</given-names>
          </string-name>
          <year>2019</year>
          , Long Beach, CA, USA, June 16-20,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Computer</surname>
          </string-name>
          Vision Foundation / IEEE,
          <fpage>9365</fpage>
          -
          <lpage>9374</lpage>
          . https://doi.org/10.1109/CVPR.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <year>2019</year>
          .
          <volume>00959</volume>
          [3]
          <string-name>
            <given-names>Naeem</given-names>
            <surname>Bhatti</surname>
          </string-name>
          and
          <string-name>
            <given-names>Allan</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Image search in patents: a review.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>International Journal on Document Analysis and Recognition</source>
          <volume>16</volume>
          ,
          <issue>4</issue>
          (
          <year>2013</year>
          ),
          <fpage>309</fpage>
          -
          <lpage>329</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          https://doi.org/10.1007/s10032-012-0197-
          <issue>5</issue>
          [4]
          <string-name>
            <given-names>Gabriela</given-names>
            <surname>Csurka</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jean-Michel Renders</surname>
            , and
            <given-names>Guillaume</given-names>
          </string-name>
          <string-name>
            <surname>Jacquet</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>XRCE's</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>Tasks of the Clef-IP 2011</article-title>
          .
          <source>In CLEF 2011 Labs and Workshop</source>
          , Notebook Papers,
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          19-
          <issue>22</issue>
          <year>September 2011</year>
          , Amsterdam, The Netherlands (CEUR Workshop Proceedings,
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          Vol.
          <volume>1177</volume>
          ), Vivien Petras, Pamela Forner, and Paul D. Clough (Eds.).
          <source>CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1177</volume>
          /
          <article-title>CLEF2011wn-CLEF-IP-CsurkaEt2011</article-title>
          .pdf [5]
          <string-name>
            <given-names>Gao</given-names>
            <surname>Huang</surname>
          </string-name>
          , Zhuang Liu, Laurens van der Maaten, and
          <string-name>
            <surname>Kilian</surname>
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Weinberger</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          2017.
          <article-title>Densely Connected Convolutional Networks</article-title>
          . In 2017 IEEE Conference on
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Computer</given-names>
            <surname>Vision</surname>
          </string-name>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2017</year>
          ,
          <article-title>Honolulu</article-title>
          ,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA, July
          <volume>21</volume>
          -26,
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          2017. IEEE Computer Society,
          <fpage>2261</fpage>
          -
          <lpage>2269</lpage>
          . https://doi.org/10.1109/CVPR.
          <year>2017</year>
          .
          <volume>243</volume>
          [6]
          <string-name>
            <given-names>Hervé</given-names>
            <surname>Jégou</surname>
          </string-name>
          , Matthijs Douze, and
          <string-name>
            <given-names>Cordelia</given-names>
            <surname>Schmid</surname>
          </string-name>
          .
          <year>2011</year>
          . Product Quantization
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>Intelligence</source>
          <volume>33</volume>
          ,
          <issue>1</issue>
          (
          <year>2011</year>
          ),
          <fpage>117</fpage>
          -
          <lpage>128</lpage>
          . https://doi.org/10.1109/TPAMI.
          <year>2010</year>
          .
          <volume>57</volume>
          [7]
          <string-name>
            <given-names>Jef</given-names>
            <surname>Johnson</surname>
          </string-name>
          , Matthijs Douze, and
          <string-name>
            <given-names>Hervé</given-names>
            <surname>Jégou</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Billion-scale similarity</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>search with GPUs</article-title>
          .
          <source>CoRR abs/1702</source>
          .08734 (
          <year>2017</year>
          ). arXiv:
          <volume>1702</volume>
          .08734 http://arxiv.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>org/abs/1702</source>
          .08734 [8]
          <string-name>
            <given-names>Hideo</given-names>
            <surname>Joho</surname>
          </string-name>
          , Leif Azzopardi, and
          <string-name>
            <given-names>Wim</given-names>
            <surname>Vanderbauwhede</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>A survey of</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          requirements. In Information Interaction in Context Symposium,
          <source>IIiX</source>
          <year>2010</year>
          , New
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Brunswick</surname>
          </string-name>
          , NJ, USA,
          <year>August</year>
          18-
          <issue>21</issue>
          ,
          <year>2010</year>
          ,
          <string-name>
            <given-names>Nicholas J.</given-names>
            <surname>Belkin</surname>
          </string-name>
          and Diane Kelly (Eds.).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <fpage>13</fpage>
          -
          <lpage>24</lpage>
          . https://doi.org/10.1145/1840784.1840789 [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kramer</surname>
          </string-name>
          and
          <string-name>
            <given-names>U.</given-names>
            <surname>Döring</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>CLEF-IP 2011: Tool zur Unterstützung der bil-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Viewers</surname>
          </string-name>
          .
          <source>In Big Data - Chancen und Herausforderungen</source>
          .
          <volume>38</volume>
          . Kolloquium der Tech-
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Proceedings</surname>
          </string-name>
          . PATINFO.
          <fpage>209</fpage>
          -
          <lpage>2019</lpage>
          . [10]
          <string-name>
            <surname>Alex</surname>
            <given-names>Krizhevsky</given-names>
          </string-name>
          , Ilya Sutskever, and
          <string-name>
            <given-names>Geofrey E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <year>2012</year>
          . ImageNet
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <source>Neural Information Processing Systems</source>
          <volume>25</volume>
          : 26th Annual Conference on Neural
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>Information Processing Systems 2012. Proceedings of a meeting held</source>
          Decem-
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>ber 3-6</source>
          ,
          <year>2012</year>
          ,
          <string-name>
            <given-names>Lake</given-names>
            <surname>Tahoe</surname>
          </string-name>
          , Nevada, United States,
          <string-name>
            <surname>Peter L. Bartlett</surname>
          </string-name>
          , Fernando
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <source>berger (Eds.)</source>
          .
          <fpage>1106</fpage>
          -
          <lpage>1114</lpage>
          . https://proceedings.neurips.cc/paper/2012/hash/
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>c399862d3b9d6b76c8436e924a68c45b-Abstract</surname>
          </string-name>
          .html [11]
          <string-name>
            <surname>Dimitris</surname>
            <given-names>Liparas</given-names>
          </string-name>
          , Anastasia Moumtzidou, Stefanos Vrochidis, and
          <string-name>
            <surname>Ioannis</surname>
          </string-name>
          Kompat-
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>siaris.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Concept-oriented labelling of patent images based on Random Forests</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>Workshop on Vision and Language</source>
          ,
          <source>VL@COLING</source>
          <year>2014</year>
          , Dublin, Ireland,
          <source>August</source>
          <volume>23</volume>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          2014,
          <string-name>
            <given-names>Anja</given-names>
            <surname>Belz</surname>
          </string-name>
          , Darren Cosker, Frank Keller, William Smith, Kalina Bontcheva,
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>tion for Computational Linguistics</source>
          ,
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
          . https://doi.org/10.3115/v1/
          <fpage>W14</fpage>
          -5404 [12]
          <string-name>
            <surname>Yinhan</surname>
            <given-names>Liu</given-names>
          </string-name>
          , Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>Mike</given-names>
          </string-name>
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>Luke</given-names>
          </string-name>
          <string-name>
            <surname>Zettlemoyer</surname>
            , and
            <given-names>Veselin</given-names>
          </string-name>
          <string-name>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2019</year>
          . RoBERTa: A
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <surname>Robustly Optimized BERT Pretraining</surname>
          </string-name>
          <article-title>Approach</article-title>
          . CoRR abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          arXiv:
          <year>1907</year>
          .11692 http://arxiv.org/abs/
          <year>1907</year>
          .
          <volume>11692</volume>
          [13]
          <string-name>
            <surname>Hui</surname>
            <given-names>Ni</given-names>
          </string-name>
          , Zhenhua Guo, and
          <string-name>
            <given-names>Biqing</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <year>2015</year>
          . Binary Patent Image Retrieval
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <source>ence on Service Science, ICSS</source>
          <year>2015</year>
          , Weihai, Shandong, China, May 8-
          <issue>9</issue>
          ,
          <year>2015</year>
          . IEEE
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <source>Computer Society</source>
          ,
          <fpage>23</fpage>
          -
          <lpage>27</lpage>
          . https://doi.org/10.1109/ICSS.
          <year>2015</year>
          .
          <volume>12</volume>
          [14]
          <string-name>
            <given-names>Jeong</given-names>
            <surname>Beom Park</surname>
          </string-name>
          , Thomas Mandl, and Do Wan Kim.
          <year>2017</year>
          . Patent Document
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <source>International Journal of Contents</source>
          <volume>13</volume>
          (
          <issue>4</issue>
          ) (
          <year>2017</year>
          ),
          <fpage>70</fpage>
          -
          <lpage>79</lpage>
          . [15]
          <string-name>
            <surname>Florina</surname>
            <given-names>Piroi</given-names>
          </string-name>
          , Mihai Lupu, Allan Hanbury, and
          <string-name>
            <given-names>Veronika</given-names>
            <surname>Zenz</surname>
          </string-name>
          .
          <year>2011</year>
          . CLEF-
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <string-name>
            <surname>IP</surname>
          </string-name>
          <year>2011</year>
          :
          <article-title>Retrieval in the Intellectual Property Domain</article-title>
          .
          <source>In CLEF 2011 Labs and</source>
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <string-name>
            <surname>Workshop</surname>
          </string-name>
          , Notebook Papers,
          <fpage>19</fpage>
          -22
          <source>September</source>
          <year>2011</year>
          , Amsterdam, The Netherlands
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <source>(CEUR Workshop Proceedings</source>
          , Vol.
          <volume>1177</volume>
          ), Vivien Petras, Pamela Forner, and
          <string-name>
            <surname>Paul D.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <string-name>
            <surname>Clough</surname>
          </string-name>
          (Eds.).
          <article-title>CEUR-WS.org</article-title>
          . http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1177</volume>
          /CLEF2011wn-CLEF-
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <string-name>
            <surname>IP-PiroiEt2011</surname>
          </string-name>
          .pdf [16]
          <string-name>
            <surname>Alec</surname>
            <given-names>Radford</given-names>
          </string-name>
          , Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh,
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <string-name>
            <given-names>Gretchen</given-names>
            <surname>Krueger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <year>2021</year>
          . Learning Transferable Visual
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <string-name>
            <given-names>Models</given-names>
            <surname>From Natural Language</surname>
          </string-name>
          <article-title>Supervision</article-title>
          .
          <source>CoRR abs/2103</source>
          .00020 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <source>arXiv:2103</source>
          .00020 https://arxiv.org/abs/2103.00020 [17]
          <string-name>
            <given-names>Nils</given-names>
            <surname>Reimers</surname>
          </string-name>
          and
          <string-name>
            <given-names>Iryna</given-names>
            <surname>Gurevych</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Sentence-BERT: Sentence Embed-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <article-title>dings using Siamese BERT-Networks</article-title>
          .
          <source>In Proceedings of the 2019 Conference</source>
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <source>on Empirical Methods in Natural Language Processing and the 9th International</source>
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <source>Joint Conference on Natural Language Processing, EMNLP-IJCNLP</source>
          <year>2019</year>
          , Hong
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <string-name>
            <surname>Kong</surname>
          </string-name>
          , China, November 3-
          <issue>7</issue>
          ,
          <year>2019</year>
          ,
          <string-name>
            <given-names>Kentaro</given-names>
            <surname>Inui</surname>
          </string-name>
          , Jing Jiang, Vincent Ng, and
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          <string-name>
            <given-names>Xiaojun</given-names>
            <surname>Wan</surname>
          </string-name>
          (Eds.).
          <source>Association for Computational Linguistics</source>
          ,
          <fpage>3980</fpage>
          -
          <lpage>3990</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          https://doi.org/10.18653/v1/
          <fpage>D19</fpage>
          -1410 [18]
          <string-name>
            <surname>Josiane</surname>
            <given-names>Rodrigues</given-names>
          </string-name>
          , Marco Cristo, and Juan G Colonna.
          <year>2020</year>
          .
          <article-title>Deep hashing for</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          <article-title>multi-label image retrieval: a survey</article-title>
          .
          <source>Artificial Intelligence Review</source>
          <volume>53</volume>
          ,
          <issue>7</issue>
          (
          <year>2020</year>
          ),
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          5261-
          <fpage>5307</fpage>
          . [19]
          <string-name>
            <given-names>Walid</given-names>
            <surname>Shalaby</surname>
          </string-name>
          and
          <string-name>
            <given-names>Wlodek</given-names>
            <surname>Zadrozny</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Patent retrieval: a literature review</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          <string-name>
            <surname>Knowl</surname>
          </string-name>
          . Inf. Syst.
          <volume>61</volume>
          ,
          <issue>2</issue>
          (
          <year>2019</year>
          ),
          <fpage>631</fpage>
          -
          <lpage>660</lpage>
          . https://doi.org/10.1007/s10115-018-1322-7 [20]
          <string-name>
            <surname>Panagiotis</surname>
            <given-names>Sidiropoulos</given-names>
          </string-name>
          , Stefanos Vrochidis, and
          <string-name>
            <given-names>Ioannis</given-names>
            <surname>Kompatsiaris</surname>
          </string-name>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          histogram.
          <source>Pattern Recognition</source>
          <volume>44</volume>
          ,
          <issue>4</issue>
          (
          <year>2011</year>
          ),
          <fpage>739</fpage>
          -
          <lpage>750</lpage>
          . https://doi.org/10.1016/j.
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          <string-name>
            <surname>patcog.</surname>
          </string-name>
          <year>2010</year>
          .
          <volume>09</volume>
          .
          <volume>014</volume>
          [21]
          <string-name>
            <surname>Arnold</surname>
            <given-names>WM Smeulders</given-names>
          </string-name>
          , Marcel Worring, Simone Santini, Amarnath Gupta, and
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          <string-name>
            <given-names>Ramesh</given-names>
            <surname>Jain</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Content-based Image Retrieval at the End of the Early Years</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>22</volume>
          ,
          <issue>12</issue>
          (
          <year>2000</year>
          ),
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          1349-
          <fpage>1380</fpage>
          . [22]
          <string-name>
            <surname>Cees</surname>
            <given-names>G. M.</given-names>
          </string-name>
          <string-name>
            <surname>Snoek</surname>
          </string-name>
          and
          <string-name>
            <surname>Arnold W. M. Smeulders</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Visual-Concept Search</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          <source>Solved? Computer</source>
          <volume>43</volume>
          ,
          <issue>6</issue>
          (
          <year>2010</year>
          ),
          <fpage>76</fpage>
          -
          <lpage>78</lpage>
          . https://doi.org/10.1109/
          <string-name>
            <surname>MC</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <volume>183</volume>
          [23]
          <string-name>
            <surname>Stefanos</surname>
            <given-names>Vrochidis</given-names>
          </string-name>
          , Anastasia Moumtzidou, and
          <string-name>
            <given-names>Ioannis</given-names>
            <surname>Kompatsiaris</surname>
          </string-name>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          <article-title>Concept-based patent image retrieval</article-title>
          .
          <source>World Patent Information</source>
          <volume>34</volume>
          (
          <year>2012</year>
          ),
          <fpage>292</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          303. [24]
          <string-name>
            <surname>Stefanos</surname>
            <given-names>Vrochidis</given-names>
          </string-name>
          , Anastasia Moumtzidou, and
          <string-name>
            <given-names>Ioannis</given-names>
            <surname>Kompatsiaris</surname>
          </string-name>
          .
          <year>2014</year>
          . En-
        </mixed-citation>
      </ref>
      <ref id="ref64">
        <mixed-citation>
          <source>in the Modern World - COST Action IC1002 on Multilingual and Multifaceted</source>
          In-
        </mixed-citation>
      </ref>
      <ref id="ref65">
        <mixed-citation>
          <string-name>
            <surname>Hansen</surname>
          </string-name>
          (Eds.).
          <source>Lecture Notes in Computer Science</source>
          , Vol.
          <volume>8830</volume>
          . Springer,
          <fpage>250</fpage>
          -
          <lpage>273</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref66">
        <mixed-citation>
          https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -12511-4_
          <fpage>12</fpage>
          [25]
          <string-name>
            <given-names>Stefanos</given-names>
            <surname>Vrochidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          , Anastasia Moumtzidou, Panagiotis
        </mixed-citation>
      </ref>
      <ref id="ref67">
        <mixed-citation>
          <string-name>
            <surname>Sidiropoulos</surname>
            ,
            <given-names>Emanuelle</given-names>
          </string-name>
          <string-name>
            <surname>Pianta</surname>
            , and
            <given-names>Ioannis</given-names>
          </string-name>
          <string-name>
            <surname>Kompatsiaris</surname>
          </string-name>
          .
          <year>2010</year>
          . Towards content-
        </mixed-citation>
      </ref>
      <ref id="ref68">
        <mixed-citation>
          <volume>32</volume>
          (
          <year>2010</year>
          ),
          <fpage>94</fpage>
          -
          <lpage>106</lpage>
          . [26]
          <string-name>
            <surname>Haohan</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Songwei Ge, Zachary C. Lipton, and
          <string-name>
            <given-names>Eric P.</given-names>
            <surname>Xing</surname>
          </string-name>
          .
          <year>2019</year>
          . Learn-
        </mixed-citation>
      </ref>
      <ref id="ref69">
        <mixed-citation>
          <source>In Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          : Annual Confer-
        </mixed-citation>
      </ref>
      <ref id="ref70">
        <mixed-citation>
          <source>ence on Neural Information Processing Systems</source>
          <year>2019</year>
          ,
          <article-title>NeurIPS 2019</article-title>
          , Decem-
        </mixed-citation>
      </ref>
      <ref id="ref71">
        <mixed-citation>
          <source>ber 8-14</source>
          ,
          <year>2019</year>
          , Vancouver, BC, Canada,
          <string-name>
            <given-names>Hanna M.</given-names>
            <surname>Wallach</surname>
          </string-name>
          , Hugo Larochelle,
        </mixed-citation>
      </ref>
      <ref id="ref72">
        <mixed-citation>
          <source>nett (Eds.)</source>
          .
          <fpage>10506</fpage>
          -
          <lpage>10518</lpage>
          . https://proceedings.neurips.cc/paper/2019/hash/
        </mixed-citation>
      </ref>
      <ref id="ref73">
        <mixed-citation>
          3eefceb8087e964f89c2d59e8a249915-
          <fpage>Abstract</fpage>
          .html [27]
          <string-name>
            <surname>Jun</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Wei Liu, Sanjiv Kumar, and
          <string-name>
            <surname>Shih-Fu Chang</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Learning to hash</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref74">
        <mixed-citation>
          <article-title>for indexing big data - A survey</article-title>
          .
          <source>Proc. IEEE 104</source>
          ,
          <issue>1</issue>
          (
          <year>2015</year>
          ),
          <fpage>34</fpage>
          -
          <lpage>57</lpage>
          . [28]
          <string-name>
            <surname>Jingdong</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Ting Zhang, Nicu Sebe, Heng Tao Shen, et al.
          <year>2017</year>
          . A survey on
        </mixed-citation>
      </ref>
      <ref id="ref75">
        <mixed-citation>
          <volume>40</volume>
          ,
          <issue>4</issue>
          (
          <year>2017</year>
          ),
          <fpage>769</fpage>
          -
          <lpage>790</lpage>
          . [29]
          <string-name>
            <surname>Liping</surname>
            <given-names>Yang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ming</given-names>
            <surname>Gong</surname>
          </string-name>
          , and
          <string-name>
            <surname>Vijayan</surname>
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Asari</surname>
          </string-name>
          .
          <year>2020</year>
          . Diagram Image Retrieval
        </mixed-citation>
      </ref>
      <ref id="ref76">
        <mixed-citation>
          <string-name>
            <surname>and</surname>
          </string-name>
          <article-title>Analysis: Challenges and Opportunities</article-title>
          . In 2020 IEEE/CVF Conference on
        </mixed-citation>
      </ref>
      <ref id="ref77">
        <mixed-citation>
          <string-name>
            <given-names>Computer</given-names>
            <surname>Vision</surname>
          </string-name>
          and Pattern Recognition,
          <source>CVPR Workshops</source>
          <year>2020</year>
          , Seattle, WA, USA,
        </mixed-citation>
      </ref>
      <ref id="ref78">
        <mixed-citation>
          <string-name>
            <surname>June</surname>
          </string-name>
          14-19,
          <year>2020</year>
          . IEEE,
          <fpage>685</fpage>
          -
          <lpage>698</lpage>
          . https://doi.org/10.1109/CVPRW50498.
          <year>2020</year>
          .
          <volume>00098</volume>
          [30]
          <string-name>
            <surname>Barret</surname>
            <given-names>Zoph</given-names>
          </string-name>
          , Vijay Vasudevan, Jonathon Shlens, and
          <string-name>
            <surname>Quoc</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
          </string-name>
          .
          <year>2018</year>
          . Learn-
        </mixed-citation>
      </ref>
      <ref id="ref79">
        <mixed-citation>
          <article-title>ing Transferable Architectures for Scalable Image Recognition</article-title>
          . In 2018 IEEE
        </mixed-citation>
      </ref>
      <ref id="ref80">
        <mixed-citation>
          <source>Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2018</year>
          , Salt Lake
        </mixed-citation>
      </ref>
      <ref id="ref81">
        <mixed-citation>
          <string-name>
            <surname>City</surname>
            ,
            <given-names>UT</given-names>
          </string-name>
          , USA, June 18-22,
          <year>2018</year>
          . IEEE Computer Society,
          <fpage>8697</fpage>
          -
          <lpage>8710</lpage>
          . https:
        </mixed-citation>
      </ref>
      <ref id="ref82">
        <mixed-citation>
          //doi.org/10.1109/CVPR.
          <year>2018</year>
          .00907
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>