<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Matcher Results for OAEI 2021</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>SAP SE</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Walldorf</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>leon.knorr</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>jan.portischg@sap.com</string-name>
          <email>jan@informatik.uni-mannheim.de</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data and Web Science Group, University of Mannheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>In this paper, the Fine-Tuned Transformes for Ontology matching (Fine-TOM) matching system is presented along with the results it achieved during its rst participation in the Ontology Alingment Evaluation Initiative (OAEI) campaign (2021). The system uses the publicly available albert-base-v2 model, which has been ne-tuned with a training dataset that includes 20% of each reference alignment from the Anatomy, Conference, and Knowledge Graph track, as well as a wide variety of generated false examples. The model is then used by a separate matching pipeline which calculates a con dence score for each correspondence. In the submitted docker container, only the matching pipeline with an already ne-tuned model is included.3</p>
      </abstract>
      <kwd-group>
        <kwd>Ontology Matching</kwd>
        <kwd>Ontology Alignment</kwd>
        <kwd>Language Models</kwd>
        <kwd>Transformers</kwd>
        <kwd>Fine-Tuning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Presentation of the System</title>
      <sec id="sec-2-1">
        <title>State, purpose, general statement</title>
        <p>
          Fine-Tuned Transformers for Ontology Matching (Fine-TOM) is a
transformerbased matching system. It consists of two separate pipelines, a pipeline for
generating training data and model training, and a matching pipeline which performs
the actual matching task. Both can be executed individually or in a row. Each
pipeline uses prede ned components, which are included in the Matching
Evaluation Toolkit (MELT) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], a framework for ontology matching and evaluation.
In particular, the new transformer extension of MELT [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] is used.
For the submission, only the matching pipeline was packaged in a docker
container using the Melt Web Interface4, where a ne-tuned albert-base-v2 model is
included. This model was ne-tuned beforehand with a training set that included
20% of the reference alignments of the Anatomy, Conference, and Knowledge
Graph track, as well as generated negative examples. This year's submission
marks the rst introduction of the Fine-TOM system to the OAEI.
3 Copyright © 2021 for this paper by its authors. Use permitted under Creative
        </p>
        <p>Commons License Attribution 4.0 International (CC BY 4.0).
4 https://dwslab.github.io/melt/matcher-packaging/web#</p>
        <p>web-interface-http-matching-interface</p>
      </sec>
      <sec id="sec-2-2">
        <title>Speci c Techniques Used</title>
      </sec>
      <sec id="sec-2-3">
        <title>Transformer-based language models One possible solution to solving NLP</title>
        <p>
          problems is the use of transformers. The initial transformer was introduced by
Google in 2017 and uses a, so called, Self-Attention Architecture [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], which is
said to be more parallelizable and requires signi cantly less time to train. Today,
the NLP domain mostly adapted the use of transformers and they became the de
facto standard for most NLP tasks like text translation and classi cation [
          <xref ref-type="bibr" rid="ref13 ref4">13,4</xref>
          ].
As a result, today, there are many di erent transformer models available, e.g.
bert-base-cased [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and gpt-2 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. All of them are using di erent variations of
the initial self-attention architecture.
        </p>
        <p>
          Fine-Tuning In order to achieve good results, a transformer needs to be
initially trained on a large amount of training data. This process is also called
pre-training. As it requires a vast amount of data as well as processing power to
pre-train a transformer model, most models are pre-trained on a speci c task,
like next sentence prediction and then uploaded to huggingface5 [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] where they
are available for download as well and can be tested in web demos. This initial
training process has a great impact on how the selected model will perform later
on. As most transformers are trained for conventional tasks like text
summarization, next sentence prediction, or review classi cation [
          <xref ref-type="bibr" rid="ref13 ref14">14,13</xref>
          ], they are not
suitable for other tasks, in this case ontology matching, right out of the box.
Therefore, transformers can be re-trained or ne-tuned to perform other or
similar tasks. This process is usually computationally cheaper than the pre-training
process. However, quality training data is needed, which has to consist of positive
as well as negative examples. Because training data is currently not available,
Fine-TOM includes a training pipeline, which generates training data based on
a fraction of already known reference alignments.
        </p>
        <p>
          During the development of Fine-TOM, di erent BERT models were ne-tuned
and evaluated on the Anatomy [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], Conference [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], and Knowledge Graph [
          <xref ref-type="bibr" rid="ref5 ref8">8,5</xref>
          ]
track. Based on the data gathered, the best performing con guration was
determined which uses the albert-base-v2 model and is further explained in the
following.
1.3
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Fine-TOM architecture</title>
        <p>
          The Fine-TOM matching system consists of two individual pipelines, as shown
in Figure 1:
{ A trainging pipeline, which handles the Fine-Tuning process of a transformer
and saves it to the disk
{ a matching pipeline, which will perform the actual matching task and is
based on the architecture presented in the TOM paper [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>This architecture can also be used to run transformers with a zero shot approach,
by only executing the matching pipeline with a pre-trained model.
5 https://huggingface.co</p>
        <p>
          Training Pipeline The Training Pipeline, shown in Figure 2, consists of
several prede ned components of the MELT [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] framework. First, a recall matcher
will create an alignment between the two ontologies O1 and O2, which acts as
a the basis for generating training data. It usually does not feature a high
precision score, but a good recall. Thus, many correspondences included are not a
correct match. Therefore, it marks a good starting point for generating training
data. After that, a mechanism for generating negatives will create the actual
training dataset, by sampling a con gurable fraction f from a already known
reference alignment. Internal experiments showed that 20-40% of a reference
alignment have the best work-to-performance ratio. Thus, the model included
in Fine-TOM has been trained with a sampling rate of 20%. These sampled
correspondences mark the positive examples that a training set has to include.
In order to add negatives examples to this training set, the mechanism takes
the alignment generated by the Recall Matcher as an input. On the assumption
that the perfect solution is of a one-to-one parity, and since for some entities
the correct match is known through sampling the reference alignment, negative
examples can now be picked from the alignment of the recall matcher, thus
resulting in a training set that includes positive as well as negative examples. This
training set is then passed on to the transformer ne-tuning component of the
MELT Framework [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], which will then ne-tune the selected model and save it
to the disk.
        </p>
        <p>
          Matching Pipeline The Matching Pipeline, shown in Figure 3, also consists
of several prede ned components of the MELT Framework. As in the Training
Pipeline, a recall matcher is used as a starting point, thus, marking the
theoretically highest recall that can be achieved with this matching system. The resulting
alignment will then be processed by a con dence splitter, which will delete all
correspondences that are simple string matches and have a con dence level of
1.0, as well as their entities from the alignment returned by the recall matcher.
These correspondences are then saved temporarily into a separate alignment,
so they will not get reclassi ed by the transformer model. Then the cleaned up
alignment is passed on to a transformer lter, which will load the previously
ne-tuned transformer model from the disk and add another con dence score to
each correspondence in the alignment. In order to make use of this newly added
con dence level, and to eliminate correspondences the transformer classi ed as a
bad match by a low con dence score, a con dence lter is used. It will \cut o "
the alignment by a certain threshold which can be con gured. Fine-TOM uses
the same threshold of 0.8 as proposed by the TOM paper [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. After all matches
with a lower con dence score have been removed from the processed alignment,
the previously removed correspondences with a con dence score of 1.0 are added
to the alignment again. Since most OAEI datasets are typically of one-to-one
arity, an e cient implementation of the Hungarian method, known as Maximum
Weight Bipartite Matching (MWBM) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] was used to create the nal alignment
and therefore the nal result. All matching components are explained in more
detail below.
Recall Matcher The recall matcher uses a variety of string comparisons in
order to generate an alignment, which marks a high recall on the expense of
a rather low precision. It includes a simple string matching mechanism which
compares each textual representation of an entity character by character, if a
match is found, it is added to the result alignment and a con dence of 1.0 is
assigned to this correspondence. Besides this mechanism it also counts how often
each word of a text representation is included in the other one, if this similarity
surpasses a con gurable threshold, the correspondence is also added to the result
alignment but only with a low con dence of 0.1.
        </p>
        <p>Con dence Splitter As described earlier, the con dence splitter takes an
alignment as input and removes every correspondence with a con dence score of
1.0, as well as every other correspondence of the entities included in the removed
correspondence. This is done in order to prevent a reclassi cation of these rather
\save" matches by another component in the pipeline. Therefore, the con dence
splitter is also able to add the alignment, which was saved during the splitting
process, to an alignment that has been passed on to it as an input.
Transformer Filter The transformer lter iterates over the alignment, which
has been passed on to it as an input, and processes each correspondence
individually by calling a separate Python server which is running locally in the
background. This is needed because the transformer models themselves are
implemented in Python, where as the matching components and pipeline is
implemented in Java. Each pair of textual representations received by the Python
server is processed by the selected model, which can either be loaded from the
disk or it can be sourced from the huggingface library. This transformer model
will then provide a con dence level, which is send back to the transformer
lter class and added to the actual correspondence in the alignment, therefore
classifying each correspondence.</p>
        <p>
          Con dence Filter The con dence lter will exclude every correspondence with
a con dence score lower than a con gurable threshold. This is needed since the
transformer lter itself does not remove any correspondences from the alignment,
it just reclassi es them. Therefore, in order to exclude matches that have been
marked as a bad match by a low con dence, the con dence lter is needed.
Max Weight Bipartite Extractor The alignment generated by matching
components can include multiple correspondences for an ontology element.
However, the assumption was made earlier that the solution for the posed ontology
matching problem is of a one-to-one arity. Therefore, the alignment provided as
an input to the max weight bipartite extractor needs to be converted into an
alignment with a one-to-one arity. In order to do that, an e cient
implementation of the Hungarian method, known as Maximum Weight Bipartite Matching
(MWBM) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] was used.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>
        This section discusses the results of Fine-TOM during the OAEI 2021 campaign.
Only the Anatomy [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Conference [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and Knowledge Graph [
        <xref ref-type="bibr" rid="ref5 ref8">8,5</xref>
        ] tracks are
included, since the matching system was only designed and trained for them.
2.1
      </p>
      <sec id="sec-3-1">
        <title>Anatomy</title>
        <p>
          The results6 of Fine-TOM on the Anatomy track are depicted in Table 1.
As shown, Fine-TOM was able to outpeform the OAEI StringEquiv matcher
in terms of recall and the f-measure, although its precision was lower. This
proves that the Fine-TOM matching system is able to nd matches that can
not be found by checking for string equivalence. However, if compared to the
TOM matching system, which is strongly related to Fine-TOM as they share a
similar architecture with regards to the matching pipeline, Fine-Tom achieved
slightly lower scores (~1-1.5%) for all measures shown. That is a rather
interesting result, as the transformers used in the TOM paper are not re-trained with
domain speci c data, nor were they pre-trained with data of an ontology
matching task. Nevertheless, TOM has one advantage: it uses the Sentence-BERT
transformer model paraphrase-TinyBERT-L6-v2 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], whereas Fine-TOM uses
a ne-tuned version of the albert-base-v2 model. These Sentence-BERT models
are pre-trained and designed to nd semantic textual similarities between input
sequences [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. The albert-base-v2 model on the other hand, is a variation of
the BERT model, and was trained for masked language modelling [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], which
is a completely di erent task compared to ontology matching. Therefore, it is
remarkable that Fine-TOM was able to achieve such a similar score to TOM.
This demonstrates the impact the ne-tuning process has on the performance
of a matching system that includes a transformer model. Since MELT did not
support Sentence-BERT transformers at the time of Fine-TOMs development,
they could not be evaluated in time for Fine-TOMs OAEI 2021 submission.
6 o cial result page: http://oaei.ontologymatching.org/2021/results/anatomy/index.html
7 o cial result page: http://oaei.ontologymatching.org/2021/results/conference/
        </p>
        <p>Precision Recall F-Measure
StringEquiv 0.76 0.41 0.53
TOM 0.69 0.48 0.57</p>
        <p>Fine-TOM 0.64 0.53 0.58</p>
        <p>Table 2. Results on the Conference track according to the OAEI 2021 campaign
2.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Knowledge Graph</title>
        <p>On Knowledge Graph, Fine-TOM was able to achieve slightly better results as
the OAEI baseline, as shown in Table 3.
In this paper, the Fine-TOM matching system has been presented. First, a new
pipeline architecture that includes a dedicated training pipeline and a
matching pipeline was introduced. This training pipeline rst generates a training set
based on reference alignments and a high recall matcher, which is then used to
re-train a selected model. The model is then injected in a so called matching
pipeline. It then performs the actual matching process by using di erent lters.
The results showed that transformers can improve the overall performance of
matching systems in terms of recall and the f-measure. Besides that, the
similar results of TOM and Fine-TOM proved that ne-tuning has a great impact
on the performance of transformer models, since the model used by Fine-TOM
has not been pre-trained for ontology matching or to nd semantic similarities
between input sequences. Therefore, the presented approach promises a lot of
potential for further increases in performance in the future, by using a di
erent model, e.g. a Sentence-BERT model, or by improving or changing di erent
pipeline components like the high recall matcher. In addition to that, this year's
submission marks the rst participation for the Fine-TOM matching system in
an OAEI campaign and the results reported are promising and motivate further
research in the area of transformer-based ontology and instance matching.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hayamizu</surname>
            ,
            <given-names>T.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ringwald</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>de Coronado</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zhang, S.:
          <article-title>Of mice and men: Aligning mouse and human anatomies</article-title>
          .
          <source>In: AMIA</source>
          <year>2005</year>
          , American Medical Informatics Association Annual Symposium, Washington, DC, USA, October
          <volume>22</volume>
          -
          <issue>26</issue>
          ,
          <year>2005</year>
          . AMIA (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cheatham</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Conference v2.
          <article-title>0: An uncertain version of the OAEI conference benchmark</article-title>
          .
          <source>In: The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23</source>
          ,
          <year>2014</year>
          . Proceedings,
          <source>Part II. Lecture Notes in Computer Science</source>
          , vol.
          <volume>8797</volume>
          , pp.
          <volume>33</volume>
          {
          <fpage>48</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antonelli</surname>
            ,
            <given-names>F.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stroe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>E cient selection of mappings and automatic quality-driven combination of matching methods</article-title>
          .
          <source>In: Proceedings of the 4th International Workshop on Ontology Matching (OM-</source>
          <year>2009</year>
          <article-title>) collocated with the 8th International Semantic Web Conference (ISWC-2009) Chantilly</article-title>
          , USA, October
          <volume>25</volume>
          ,
          <year>2009</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>551</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          . CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hertling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>The knowledge graph track at OAEI - gold standards, baselines, and the golden hammer bias</article-title>
          .
          <source>In: The Semantic Web - 17th International Conference, ESWC</source>
          <year>2020</year>
          , Heraklion, Crete, Greece, May 31-June 4,
          <year>2020</year>
          ,
          <source>Proceedings. Lecture Notes in Computer Science</source>
          , vol.
          <volume>12123</volume>
          , pp.
          <volume>343</volume>
          {
          <fpage>359</fpage>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hertling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Portisch</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>MELT - matching evaluation toolkit</article-title>
          .
          <source>In: Semantic Systems. The Power of AI and Knowledge Graphs - 15th International Conference, SEMANTiCS</source>
          <year>2019</year>
          , Karlsruhe, Germany, September 9-
          <issue>12</issue>
          ,
          <year>2019</year>
          ,
          <source>Proceedings. Lecture Notes in Computer Science</source>
          , vol.
          <volume>11702</volume>
          , pp.
          <volume>231</volume>
          {
          <fpage>245</fpage>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hertling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Portisch</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Matching with transformers in MELT</article-title>
          .
          <source>CoRR abs/2109</source>
          .07401 (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perchani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Portisch</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hertling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Dbkwik: Towards knowledge graph creation from thousands of wikis</article-title>
          .
          <source>In: Proceedings of the ISWC</source>
          <year>2017</year>
          <article-title>Posters &amp; Demonstrations and Industry Tracks co-located with 16th International Semantic Web Conference (ISWC</article-title>
          <year>2017</year>
          ), Vienna, Austria, October 23rd - to - 25th,
          <year>2017</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <year>1963</year>
          .
          <article-title>CEUR-WS.org (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kossack</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borg</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knorr</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Portisch</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>TOM matcher results for OAEI 2021</article-title>
          . In: OM@ISWC
          <year>2021</year>
          (
          <year>2021</year>
          ), to appear
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gimpel</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soricut</surname>
            ,
            <given-names>R.: ALBERT:</given-names>
          </string-name>
          <article-title>A lite BERT for self-supervised learning of language representations</article-title>
          . CoRR abs/
          <year>1909</year>
          .11942 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amodei</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Language models are unsupervised multitask learners (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Reimers</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Sentence-bert: Sentence embeddings using siamese bertnetworks</article-title>
          .
          <source>In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (11</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is all you need</article-title>
          .
          <source>CoRR abs/1706</source>
          .03762 (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debut</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaumond</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delangue</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cistac</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rault</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Louf</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Funtowicz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brew</surname>
          </string-name>
          , J.:
          <article-title>Huggingface's transformers: Stateof-the-art natural language processing</article-title>
          . CoRR abs/
          <year>1910</year>
          .03771 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>