<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>VinAI at ChEMU 2020: An accurate system for named entity recognition in chemical reactions from patents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mai Hoang Dao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dat Quoc Nguyen</string-name>
          <email>v.datnq9g@vinai.io</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Posts and Telecommunications Institute of Technology</institution>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>VinAI Research</institution>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes our VinAI system for the ChEMU task 1 of named entity recognition (NER) in chemical reactions. Our system employs a BiLSTM-CNN-CRF architecture [6] with additional contextualized word embeddings. It achieves very high performance, o cially ranking second with regards to both exact- and relaxed-match F1 scores at 94.33% and 96.84%, respectively. In a post-evaluation phase, xing a mapping bug which converts the column-based format into the brat stando format helps our system to obtain higher results. In particular, we obtain an exact-match F1 score at 95.21% and especially a relaxedmatch F1 score at 97.26%, thus achieving the highest relaxed-match F1 compared to all other participating systems. We believe our system can serve as a strong baseline for future research and downstream applications of chemical NER over chemical reactions from patents.</p>
      </abstract>
      <kwd-group>
        <kwd>Named entity recognition</kwd>
        <kwd>Chemical reactions</kwd>
        <kwd>Patents</kwd>
        <kwd>Neural network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The discovery of new chemical compounds plays an essential key role in the
chemical industry. To disclose newly discovered chemical compounds, patent
documents are often selected as the initial venues; and only a small fraction of
these chemical compounds are published in journals, but this usually takes up
to 3 years after the patent disclosure [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Thus patents containing critical and
timely information about the new chemical compounds serve as starting pointers
for chemical research in both academia and industry [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Due to a huge volume
of new chemical patent applications [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], it is becoming increasingly important
to develop automatic information extraction approaches for large-scale mining
of chemical information from these patent documents [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Chemical named-entity recognition (NER) is a fundamental step for
information extraction from chemical patents, supporting many downstream tasks such
as chemical reaction prediction [
        <xref ref-type="bibr" rid="ref12 ref17">12,17</xref>
        ], chemical syntheses [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and the like. The
ChEMU|Cheminformatics Elsevier Melbourne University|task 1 provides
participants with opportunities to develop automatic chemical NER systems from
chemical reactions in chemical patents. This task is to identify crucial elements of
a chemical reaction, including compounds, conditions and yields as well as their
speci c roles in the reaction. Details of this task can be found in the overview
paper of the ChEMU lab [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In this paper, we present our VinAI team's system for the ChEMU task
1. Our system is based on the well-known BiLSTM-CNN-CRF architecture [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
with additional contextualized word embeddings. Our system o cially obtains
the second best performance results in terms of both exact- and relaxed-match
F1 scores at 94.33% and 96.84%, respectively. In a post-evaluation phase,
xing a column-brat conversion bug then helps our system to obtain even better
results at 95.21% for exact-match F1 and especially 97.26% for relaxed-match
F1. We thus obtain the highest relaxed-match F1 score in comparison to all
other participating systems. We also provide an ablation study to investigate
the contributions of di erent types of input word representations in the full
system, recon rming the e ectiveness of the contextualized word embeddings for
chemical NER [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Task description</title>
      <p>
        The ChEMU task 1 of \Named entity recognition" involves identifying chemical
compounds and their speci c types. In particular, the task assigns the label of
a chemical compound according to the role which it plays within a chemical
reaction. In addition to identifying chemical compounds, the task also requires
identi cation of the label of the chemical reaction, the temperatures and reaction
times at which the reaction is carried out as well as yields obtained for the nal
chemical product. The task de nes 10 di erent entity type labels as listed in
Table 1, involving both entity boundary prediction and entity label classi cation.
See [
        <xref ref-type="bibr" rid="ref10 ref3">3,10</xref>
        ] for more details.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Our system</title>
      <p>
        In this section, we present our VinAI system for the ChEMU task 1. We formulate
this task as a sequence labeling problem with BIO tagging scheme. Following
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], our system employs the well-known BiLSTM-CNN-CRF model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] with
additional contextualized word embeddings.
      </p>
      <p>
        Figure 1 illustrates the architecture of our participating system. In particular,
our system represents each word token wi in an input sequence w1; w2; :::; wn by
a vector vi which is resulted by concatenating the pre-trained word embedding,
the CNN-based character-level word embedding [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and the contextualized word
embedding of the word token wi. Here, we utilize the pre-trained word
embeddings released by [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], which are trained on a corpus of 84K chemical patents
(1B word tokens) using the Word2Vec skip-gram model [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In addition, we also
utilize the contextualized word embeddings generated by a pre-trained ELMo
language model [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which is trained using the same corpus of 84K chemical
patents [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].1 Then vector representations vi are fed into a BiLSTM encoder
to extract latent feature vectors ri for input words wi. Each latent feature
vector ri is then linearly transformed into hi before being fed into a linear-chain
CRF layer for NER label prediction [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. A cross-entropy loss is computed during
training while the Viterbi algorithm is used for decoding.
4
4.1
      </p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <sec id="sec-4-1">
        <title>Experimental setup</title>
        <p>
          Dataset: For system development, the ChEMU task 1 provides a corpus of 1125
chemical reaction snippets with gold standard NER annotations using the brat
stando format [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Although this corpus is pre-split into a training set of 900
snippets and a validation set of 225 snippets, participants are free to use this
corpus in any manner they nd useful when training and tuning their systems,
e.g. using a di erent split or performing cross-validation. Thus we only employ
the rst 100 snippets in the provided validation set for validation,2 and merge
the remaining 125 snippets into the provided training set, resulting in a new
training set of 1025 snippets in total. Following [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], we employ the OpenNLP
toolkit [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] for sentence segmentation and the OSCAR4 tokenizer [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] to tokenize
training and validation sentences, then convert these sentences into the CoNLL
column-based format with the BIO tagging scheme.
        </p>
        <p>
          Implementation: Our system is implemented based on the AllenNLP
framework [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. For training, we use exactly the same hyper-parameters used in [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]
with the exception of using the batch size at 24. Pre-trained word embeddings
and the pre-trained ELMo are xed while other model parameters are updated
during training. We train our system for 50 epochs and compute the standard
exact-match F1 score after each training epoch on the validation set. We select
the model with the highest exact-match F1 score on the validation set.
Evaluation phase: For the nal evaluation phase, the ChEMU task 1 provides
a raw test set consisting of 375 patent snippets. Each test snippet is
sentencesegmented and tokenized using OpenNLP and OSCAR4, respectively. We then
convert tokenized test sentences into the column-based format and apply our
selected model to predict NER labels. We then use our own mapping script to
convert the predicted BIO-based NER outputs into the brat stando format,
and submit the brat-formatted test outputs for evaluation.
        </p>
        <p>
          Evaluation metrics: The ChEMU task 1 uses three metrics, namely
precision, recall and F1 scores for evaluation, under both \exact" and \relaxed" span
matching conditions [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
1 https://github.com/zenanz/ChemPatentEmbeddings
2 Sorted by le names: 0050{0690.
Table 2 shows the o cial results of our system's outputs on the test set which is
submitted during the evaluation phase. By employing a standard neural
architecture, our system obtains a high performance which is o cially ranked second
among 11 participating systems, using both exact- and relaxed-match F1 scores.
        </p>
        <p>Note that in the evaluation phase, we unfortunately were unaware of a bug
in our mapping script which converts the predicted test outputs in the
columnbased format into the brat stando format. Right after the evaluation phase,
we xed the bug, and reran our column-brat conversion script to produce a new
submission, and then asked the ChEMU organizers to help evaluate the new
submission. Table 3 details our post-evaluation results. Fixing the mapping bug
helps improve our exact-match F1 by 0.9% and our relaxed-match F1 by 0.4%,
absolutely; thus leading to the highest relaxed-match F1 score compared to other
participating systems.
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Ablation study</title>
        <p>Table 4 presents ablation tests over 3 factors of our system on the
development set, including (a) removing the Word2Vec-based pre-trained word
embeddings, (b) removing the CNN-based character-level word embeddings and
(c) removing the ELMo-based contextualized word embeddings. Factor (a)
degrades the exact-match F1 score by 0.8%, while factor (b) and (c) degrade the
exact-match F1 score by 0.1% and 1.0%, respectively. The contribution of the
CNN-based character-level word embeddings is not substantial because the
pretrained ELMo language model we employ also builds on character embeddings.
4.4</p>
      </sec>
      <sec id="sec-4-3">
        <title>Error analysis</title>
        <p>To understand the source of errors, we perform error analysis on the development
set. Among 56 error cases in total, 34 cases are predicted with correct entity
boundaries (i.e. exact span) but with incorrect labels (See the corresponding
confusion matrix in Figure 2), while there are 17 cases corresponding with correct
entity labels and overlapped inexact span. Figures 3 and 4 show examples of
these two types of errors. In particular, Figure 3 shows an example of exact span
and an incorrect label where a reagent catalyst entity of \HCL" is predicted
as another compound type. The reason is probably because \HCL" and other
popular chemical compounds such as \water", \citric acid" and the like play
di erent/multiple roles in chemical reactions. Note that there is no error case
corresponding with incorrect label and overlapped inexact span. The remaining
5 errors belong to the group of predicted entities in which their span is not
overlapped with the span of any gold standard entity, i.e. non-chemical
\O"labeled words are predicted as REACTION PRODUCT (RP) chemical entities
as shown in the column O in Figure 2.
In this paper, we have presented our VinAI system for participating in the
ChEMU task 1 of named entity recognition in chemical reactions from patents.
We use a BiLSTM-CNN-CRF architecture with additional ELMo-based
contextualized word embeddings to handle the task. Our system is o cially ranked the
second best performing one with regards to both the exact- and relaxed-match
F1 scores. In addition, xing the column-brat conversion bug then helps our
system to obtain the highest relaxed-match F1 score in a post-evaluation phase.
We believe our system can serve as a strong baseline for future work on chemical
NER in chemical reactions from patents.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Akhondi</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Schworer,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Maier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Toomey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.P.</given-names>
            ,
            <surname>Nau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Ilchmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Sheehan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Irmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Bobach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Doornenbal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.A.</given-names>
            ,
            <surname>Gregory</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Kors</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.A.</surname>
          </string-name>
          :
          <article-title>Automatic identi cation of relevant chemical compounds from patents</article-title>
          .
          <source>Database</source>
          <year>2019</year>
          , baz001 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grus</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tafjord</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dasigi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmitz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.S.:</given-names>
          </string-name>
          <article-title>AllenNLP: A Deep Semantic Natural Language Processing Platform</article-title>
          . In: arXiv:
          <year>1803</year>
          .
          <volume>07640</volume>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhondi</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Druckenbrodt</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thorne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoessel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Afzal</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshikawa</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Albahem</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cavedon</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohn</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verspoor</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : Overview of ChEMU 2020:
          <article-title>Named Entity Recognition and Event Extraction of Chemical Reactions from Patents</article-title>
          .
          <source>In: Proceedings of the Eleventh International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          )
          <article-title>(</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jessop</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adams</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Willighagen</surname>
            ,
            <given-names>E.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hawizy</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murray-Rust</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>Oscar4: a exible architecture for chemical text-mining</article-title>
          .
          <source>Journal of cheminformatics 3(1)</source>
          ,
          <volume>1</volume>
          {
          <fpage>12</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>La</surname>
            <given-names>erty</given-names>
          </string-name>
          , J.D.,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.C.N.</given-names>
          </string-name>
          :
          <article-title>Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data</article-title>
          .
          <source>In: Proceedings of the Eighteenth International Conference on Machine Learning</source>
          . pp.
          <volume>282</volume>
          {
          <issue>289</issue>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
          </string-name>
          , E.:
          <article-title>End-to-end sequence labeling via bi-directional LSTM-CNNsCRF</article-title>
          . In:
          <article-title>Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          . pp.
          <volume>1064</volume>
          {
          <issue>1074</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>3111</volume>
          {
          <issue>3119</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Morton</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kottmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldridge</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bierner</surname>
          </string-name>
          , G.:
          <article-title>Opennlp: A java-based nlp toolkit</article-title>
          .
          <source>In: Proceeding of the 10th Conference of the European Chapter of the Association of Computational Linguistics</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Muresan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petrov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Southan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kjellberg</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kogej</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tyrchan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varkonyi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>P.H.</given-names>
          </string-name>
          :
          <article-title>Making every sar point count: the development of chemistry connect for the large-scale integration of structure and bioactivity data</article-title>
          .
          <source>Drug Discovery Today</source>
          <volume>16</volume>
          (
          <fpage>23</fpage>
          -
          <lpage>24</lpage>
          ),
          <volume>1019</volume>
          {
          <fpage>1030</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshikawa</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Druckenbrodt</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thorne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoessel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhondi</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohn</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verspoor</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents</article-title>
          .
          <source>In: Proceedings of the 42nd European Conference on Information Retrieval</source>
          . pp.
          <volume>572</volume>
          {
          <issue>579</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long Papers). pp.
          <volume>2227</volume>
          {
          <issue>2237</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Schwaller</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaudin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lanyi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bekas</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laino</surname>
          </string-name>
          , T.:
          <article-title>\found in translation": predicting outcomes of complex organic chemistry reactions using neural sequenceto-sequence models</article-title>
          .
          <source>Chemical science</source>
          <volume>9</volume>
          (
          <issue>28</issue>
          ),
          <volume>6091</volume>
          {
          <fpage>6098</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Segler</surname>
            ,
            <given-names>M.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Preuss</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Waller</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          :
          <article-title>Planning chemical syntheses with deep neural networks and symbolic ai</article-title>
          .
          <source>Nature</source>
          <volume>555</volume>
          (
          <issue>7698</issue>
          ),
          <volume>604</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Senger</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bartek</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papadatos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaulton</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents</article-title>
          .
          <source>Journal of cheminformatics 7(1)</source>
          ,
          <volume>49</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Stenetorp</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pyysalo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Topic</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohta</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ananiadou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsujii</surname>
          </string-name>
          , J.:
          <article-title>brat: a web-based tool for NLP-assisted text annotation</article-title>
          .
          <source>In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics</source>
          . pp.
          <volume>102</volume>
          {
          <issue>107</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Verspoor</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimeno</surname>
            <given-names>Yepes</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Cavedon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>McIntosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Herten-Crabb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Thomas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Plazzer</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.P.</surname>
          </string-name>
          :
          <article-title>Annotating the biomedical literature for the human variome</article-title>
          .
          <source>Database</source>
          <year>2013</year>
          ,
          <volume>bat019</volume>
          (04
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Yoshikawa</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Druckenbrodt</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thorne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhondi</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verspoor</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Detecting Chemical Reactions in Patents</article-title>
          .
          <source>In: Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association</source>
          . pp.
          <volume>100</volume>
          {
          <issue>110</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhondi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thorne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Druckenbrodt</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohn</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gregory</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verspoor</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings</article-title>
          .
          <source>In: Proceedings of the 18th BioNLP Workshop</source>
          . pp.
          <volume>328</volume>
          {
          <issue>338</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>