=Paper= {{Paper |id=Vol-2909/paper1 |storemode=property |title=Chemical Reaction Reference Resolution in Patents |pdfUrl=https://ceur-ws.org/Vol-2909/paper1.pdf |volume=Vol-2909 |authors=Hiyori Yoshikawa,Saber Akhondi,Camilo Thorne,Christian Druckenbrodt,Ralph Hoessel,Zenan Zhai,Jiayuan He,Timothy Baldwin,Karin Verspoor }} ==Chemical Reaction Reference Resolution in Patents== https://ceur-ws.org/Vol-2909/paper1.pdf
              Chemical Reaction Reference Resolution in Patents
              Hiyori Yoshikawa*                                          Saber Akhondi                                             Camilo Thorne
                  Fujitsu Limited                                         Elsevier BV                                 Elsevier Information Systems GmbH
                   Tokyo, Japan                                     Amsterdam, Netherlands                                     Frankfurt, Germany
               y.hiyori@fujitsu.com                                 s.akhondi@elsevier.com                                  c.thorne.1@elsevier.com

          Christian Druckenbrodt                                         Ralph Hoessel                                                 Zenan Zhai
    Elsevier Information Systems GmbH                       Elsevier Information Systems GmbH                            The University of Melbourne
             Frankfurt, Germany                                      Frankfurt, Germany                                      Melbourne, Australia
       c.druckenbrodt@elsevier.com                                 r.hoessel@elsevier.com                             zenan.zhai@student.unimelb.edu.au

                    Jiayuan He†                                        Timothy Baldwin                                            Karin Verspoor‡
                  RMIT University                                The University of Melbourne                                    RMIT University
               Melbourne, Australia                                 Melbourne, Australia                                      Melbourne, Australia
               estrid.he@rmit.edu.au                              tbaldwin@unimelb.edu.au                                karin.verspoor@unimelb.edu.au
ABSTRACT                                                                                  1    INTRODUCTION
Many new chemical compounds are reported each year in patent                              Patents represent a critical source of information about new chemi-
documents, leading to increasing demand for methods for automatic                         cal compounds, and lead journal publications in both volume and
information extraction of chemical compounds and reactions from                           time [2]. Patents provide not only the name and characteristics of
patents. Chemical patents often detail a number of similar com-                           new chemical compounds, but also the reaction details for their
pounds that have a common substructure and can be synthesized in                          synthesis, including starting materials, product, reagents, catalysts,
analogous ways, and therefore contain many references connecting                          solvents, and the conditions of the reactions such as temperature
descriptions of similar chemical reactions, to avoid redundancy in                        and time. Databases such as Reaxys®1 and CASREACT [5] store
describing common reaction conditions. This leads to the problem                          a large volume of such chemical reaction information. Given the
of reaction reference resolution, where, given a reaction description,                    commercial and research value of the information in patents, as well
we need to identify links to other reaction descriptions it refers to.                    as the large numbers of available patents, developing methods for
In this paper, we formally introduce the task and propose baseline                        automatic information extraction from chemical patents has been a
methods to address it in analogy with co-reference resolution. To                         focus of recent research [17, 24].
evaluate the performance, we create a large-scale silver-standard                            A text mining system for chemical reaction information extraction
dataset based on a commercial database of chemical reactions. The                         should aim to obtain details of each reaction, requiring: (1) the
experimental results show that the approach based on a state-of-                          identification of specific spans of text describing reactions; and (2)
the-art co-reference resolution method struggles to outperform a                          the extraction of the compounds participating in, and the conditions
simple heuristic in detecting reference links, demonstrating the diffi-                   of, a reaction. The second step involves named entity recognition [6,
culty of the proposed task and its fundamentally different nature to                      8, 13, 16, 24], relation extraction [15, 31], entity linking [3, 27], and
co-reference resolution.                                                                  event extraction [8, 24, 30], which have been explored intensively in
                                                                                          previous studies. A relatively small number of studies have addressed
CCS CONCEPTS                                                                              the chemical reaction extraction task as a whole [1, 10, 19, 21].
                                                                                             The first step of locating all information relevant to a chemical re-
• Computing methodologies → Information extraction.
                                                                                          action has received relatively little attention. This step is a challenge
                                                                                          as it requires processing very long and semantically unstructured
KEYWORDS                                                                                  documents, in which a number of similar chemical reactions are
information extraction, reaction reference resolution, natural lan-                       presented. Yoshikawa et al. [32] proposed the chemical reaction
guage processing                                                                          detection task to locate descriptions of individual chemical reactions
                                                                                          in patents, and achieved promising results using a contextualized
                                                                                          document modeling method. Jessop et al. [10] also addressed the de-
* Also with The University of Melbourne.
† Also with The University of Melbourne.                                                  tection of chemical reaction descriptions in a heuristic way. However,
‡ Also with The University of Melbourne.                                                  previous work has not addressed the important step of resolving ref-
                                                                                          erences between reaction descriptions. In general, a chemical patent
                                                                                          reports a number of similar compounds that have a common chemi-
                                                                                          cal substructure and can be synthesized in analogous ways. Synthesis
PatentSemTech, July 15th, 2021, online                                                    steps that are common across similar reactions may be described
© 2021 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-
                                                                                          1 https://www.reaxys.com. Copyright ® 2021 Elsevier Life Sciences IP Limited. Reaxys®
WS.org)
                                                                                          is a trademark of Elsevier Life Sciences IP Limited, used under license.
                                                                                     10
Chemical Reaction Reference Resolution in Patents                                                                        PatentSemTech, July 15th, 2021, online


only once, and the paragraphs describing individual reactions refer                  3     METHODS
back to those descriptions to provide specific details without re-                   In this section, we introduce a baseline neural model that captures the
peating information. In our dataset described below, approximately                   context and detects reference relations between reaction descriptions.
17% of reaction descriptions contain references to other reaction                    As illustrated in Figure 2, our model is based on the end-to-end
descriptions.                                                                        neural co-reference model of Lee et al. [20]. The original model
   To fill this gap in previous work and take a step towards an                      takes a contextualized representation for each entity mention and
end-to-end method for chemical reaction information extraction,                      computes the likelihood that a pair of mentions is co-referential,
this paper focuses on the task of reaction reference resolution. Our                 whereas our model extends it to a hierarchical architecture that first
contributions are three-fold:                                                        encodes individual paragraphs and then feeds it to a document-level
     • We define the new task of reaction reference resolution, where,               encoder, so that the model can capture references between longer
       given a reaction description, we aim to identify other reaction               text spans (i.e. multiple paragraphs).
       descriptions it refers to.
     • In the absence of a large-scale gold-standard dataset, we lever-              3.1    Paragraph and Span Representation
       age Reaxys® , a large commercial chemical reaction database,
                                                                                     Our model has a hierarchical architecture that first encodes each
       to create a silver-standard dataset, of a size that supports the
                                                                                     paragraph and then encodes the entire document to obtain a contex-
       development of not only traditional rule-based methods but
                                                                                     tualized representation of each paragraph. We first encode each input
       also neural network models.
                                                                                     paragraph to obtain a context-independent paragraph encoding. A
     • We propose several baseline methods for the task and eval-                                                                           (𝑡 )        (𝑡 )
                                                                                     paragraph 𝑝𝑡 consists of a sequence of words 𝑝𝑡 = {𝑤 1 , . . . , 𝑤 |𝑝 | }.
       uate them on our silver-standard dataset. The baselines in-                                                                                        𝑡
       clude an extension of the state-of-the-art end-to-end neural                  Each word is represented by:
       co-reference resolution method of Lee et al. [20]. Experimen-                                                  (𝑡 )
                                                                                                        𝒙 𝑡,𝑘 = WE(𝑤𝑘 ) ⊕ ELMo(𝑝𝑡 )𝑘
       tal results show that, while it successfully detects reaction
                                                                                                                         (𝑡 )
       descriptions that refer to other textual spans, it struggles to                                         ⊕ CC(𝑤𝑘 ) ⊕ NER(𝑝𝑡 )𝑘 ,                     (1)
       relate them correctly.                                                                      (𝑡 )
                                                                                     where WE(𝑤𝑘 ) ∈ R𝑑WE is a pretrained word embedding, ELMo(𝑝𝑡 )𝑘 ∈
                                                                                                                                         (𝑡 )
                                                                                     R𝑑ELMo is a contextualized word embedding [25], CC(𝑤𝑘 ) ∈ R𝑑CC
2    TASK DESCRIPTION                                                                is a CNN-based character-level word encoding, and NER(𝑝𝑡 )𝑘 ∈
In this section, we formally introduce the reaction reference resolu-                R𝑑NER is a trainable embedding of the named entity label (here,
tion task. We assume a pipeline system where, given the Description                  chemical compound type). The details of the embeddings and named
section of a patent, text spans corresponding to individual reaction                 entity labels are provided in Section 5. We then encode 𝑝𝑡 into a
descriptions are extracted in advance (following the methods of                      vector using a bidirectional LSTM:
Jessop et al. [10] or Yoshikawa et al. [32]). Then, in a second step,                                          −−−−→ ←−−
                                                                                                          𝒉𝑡 = 𝒉𝑡, |𝑝𝑡 | ⊕ 𝒉𝑡,1,                   (2)
all relationships between reaction descriptions that are relevant are
identified to fully locate the details of each reaction. The output of                                  −−→                   −−−−→
                                                                                                        𝒉𝑡,𝑘 = LSTM(𝒙 𝑡,𝑘 , 𝒉𝑡,𝑘−1 ; 𝜃 PF ),       (3)
this task could then be passed to a targeted information extraction                                     ←−−                   ←−−−−
system to pull out individual reaction details.                                                         𝒉𝑡,𝑘 = LSTM(𝒙 𝑡,𝑘 , 𝒉𝑡,𝑘+1 ; 𝜃 PB ),       (4)
    An input document is represented by a tuple 𝑑 = (𝑃, 𝑅), where 𝑃                  where 𝜃 PF and 𝜃 PB denote model parameters of the forward and the
is a sequence of paragraphs 𝑃 = (𝑝 1, . . . , 𝑝 |𝑃 | ) and 𝑅 is a sequence of        backward LSTMs, respectively.
all textual spans of reaction descriptions 𝑅 = (𝑟 1, . . . , 𝑟 |𝑅 | ), dubbed           To obtain the representation of each reaction span, we first obtain
reaction spans. A reaction span 𝑟𝑖 is a sequence of consecutive                      a contextualized representation of each paragraph using a document-
paragraphs, represented by the first and last indices of paragraphs of               level bidirectional LSTM:
the span 𝑟𝑖 = (start(𝑖), end(𝑖)). For each reaction span 𝑟𝑖 , the task is                                       →
                                                                                                                − ←    −
to: (1) predict whether the reaction refers to one or more previous                                       𝒉𝑡∗ = 𝒉𝑡∗ ⊕ 𝒉𝑡∗,                              (5)
reaction descriptions; and if so, (2) detect the reaction span(s) 𝑟 𝑗 (for                                →
                                                                                                          −∗               −−∗−→
                                                                                                          𝒉𝑡 = LSTM(𝒉𝑡 , 𝒉𝑡 −1 ; 𝜃 DF ),                (6)
𝑗 < 𝑖) that 𝑟𝑖 refers to. Hereinafter, we call 𝑟 𝑗 the parent reaction                                    ←−∗              ←∗−−−
and 𝑟𝑖 the child reaction.                                                                                𝒉𝑡 = LSTM(𝒉𝑡 , 𝒉𝑡 +1 ; 𝜃 DB ),                (7)
    Figure 1 shows typical examples of reaction references. A refer-
                                                                                     where 𝜃 DF and 𝜃 DB denote model parameters of the forward and the
ence relation is often indicated by an example ID, as in example (a),
                                                                                     backward LSTMs, respectively. These paragraph representations are
or the compound label used in the previous reaction description, as
                                                                                     combined to obtain each reaction span representation:
in example (b). However, it can also be the case that there is no direct
referential expression between reaction descriptions, as in example                                         𝒈𝑖 = 𝒉∗start(𝑖) ⊕ 𝒉∗end(𝑖) .                   (8)
(c). Therefore, we formulate the task as reference resolution between
reaction descriptions (paragraphs) rather than between a referring                   3.2    Training Objectives
expression such as Example 1 and its referent, as in traditional co-                 Our goal is to identify the parent reaction span for each child re-
reference resolution tasks. Subtasks (1) and (2) are analogous to the                action span. As described in Section 2, this can be further divided
mention detection and mention clustering subtasks of co-reference                    into two subtasks: predicting whether the given span is a child of
resolution.                                                                          some reaction span, and linking the child spans to their parent spans.
                                                                                11
PatentSemTech, July 15th, 2021, online                                                                                                                                                        Yoshikawa et al.


     ID     Text
    (a) Reference with the example ID
    RX1     Example 26 11-Dioxo-3,3-dibutyl-5-phenyl-7-methylthio-8-(N-{(R)-𝛼 -[N-((S)-1-carboxy-(R)-hydroxypropyl)carbamoyl]-4-...
            1,1-Dioxo-3,3-dibutyl-5-phenyl-7-methylthio-8-[N-((R)-𝛼 -carboxy-4-hydroxybenzyl) carbamoylmethoxy]- 2,3,4,5-tetrahydro-1,2,5-benzothiadiazepine (Example 18; 100mg, 0.152mmol)
            was dissolved in ...
    RX2     Example 27 1,1-Dioxo-3,3-dibutyl-5-phenyl-7-methylthio-8-(N-{(R)-𝛼 -[N-((S)-1-carboxy-2-, methylpropyl)carbamoyl]-4-
            The title compound was synthesized by the procedure described in Example 26 starting from ...
    (b) Reference with the compound label
    RX1     A mixture of the obtained ester, ... was stirred under argon and heated at 110◦ C. for 24 h. ... Column chromatography of the residue (silica gel-hexane/ethyl acetate, 9:1) gave Compound
            B11, ...
            ...
    RX2     Using 2-ethoxyethanol and following the procedure for Compound B11 gave Compound B13, bis(2-ethoxyethyl) 3,3’-((2-(bromomethyl)-2-((3-((2-
            ethoxyethoxy)carbonyl)phenoxy)methyl)propane-1,3-diyl)bis(oxy))dibenzoate, (3.82 g, 35% yield). ...
    (c) Reference with no direct referential expression
    RX1     III: 5-fluoro-N2-(4-methyl-3-propionylaminosulfonylphenyl)-N4-[4-(prop-2-ynyloxy)phenyl]-2,4- pyrimidinediamine mono-sodium salt
            5-Fluoro-N2-(4-methyl-3-propionylaminosulfonylphenyl)-N4-[4-(prop-2-ynyloxy)phenyl]-2,4-pyrimidinediamine, II, (0.125 g, 0.258 mmol) was suspended in ...
            The following compounds were made in a similar fashion to those above.
    RX2     IV: 5-Fluoro-N2-[4-methyl-3-(N-propionylaminosulfonyl)phenyl]-N4-[4-(2-propynyloxy)phenyl]-2,4-pyrimidinediamine Potassium Salt ...
    RX3     V: 5-Fluoro-N2-[4-methyl-3-(N-propionylaminosulfonyl)phenyl]-N4-[4-(2-propynyloxy)phenyl]-2,4-pyrimidinediamine Calcium Salt ...



Figure 1: Abbreviated examples of reaction references: (a) reference with the example ID [29], (b) reference with the compound label
[26], (c) reference with no direct referential expression [22].


                                                                                                         span is a child of at least one preceding reaction span:
                                                                                                                                𝜋 NN (𝑖) := Pr(𝑦𝑖 ≠ 0 | 𝑑) = 𝜎 (𝑓 (𝑖)),                                   (9)
                                                                                                        where 𝜎 denotes the sigmoid operator, and 𝑓 (𝑖) is a span scoring
                                                                                                        function that computes the likelihood of the 𝑖-th span being a child,
                                                                                                        using a feed-forward network:
                                                                                                                                 𝑓 (𝑖) = 𝒘 ⊤
                                                                                                                                           C-NN FFNN(𝒈𝑖 ) + 𝑏 C-NN .                                     (10)
                                                                                                            Given the training dataset 𝐷 = (𝑑 1, . . . , 𝑑 |𝐷 | ), the model is trained
                                                                                                         to minimize the following binary cross-entropy loss:
                                                                                                                                                      |𝑅| 
                                                                                                                                                              1𝑦𝑖 ≠0 log 𝜋 NN (𝑖)
                                                                                                                                             Õ        Õ
                                                                                                                             LBIN = −
                                                                                                                                         𝑑=(𝑃,𝑅) ∈𝐷 𝑖=1

                                                                                                                                                      +1𝑦𝑖 =0 log(1 − 𝜋 NN (𝑖)) .
                                                                                                                                                                               
                                                                                                                                                                                                         (11)

                                                                                                            End-to-end objective. The end-to-end objective models not only
                                                                                                         the likelihood of a span being a child, but also which span the child
                                                                                                         span refers to. Analogous to Lee et al. [20], we aim to model the
                                                                                                         conditional probability distribution for document 𝑑 as a product of
                                                                                                         multinomials for individual spans:
                                                                                                                                                                 |𝑅 |
                                                                                                                                                                 Ö
                                                                                                                                Pr(𝑦1, . . . , 𝑦 |𝑅 | | 𝑑) =               Pr(𝑦𝑖 | 𝑑).                   (12)
          Figure 2: Illustration of our model architecture.                                                                                                       𝑖=1
                                                                                                        The probability of each preceding span being the parent of the given
                                                                                                        span is computed as follows:
                                                                                                                                𝜋 NN ( 𝑗, 𝑖) := Pr(𝑦𝑖 = 𝑗 | 𝑑)
To evaluate the difficulty of these individual subtasks, we propose
                                                                                                                                                                 𝑖−1
several model variants that differ in training objectives and decoding                                                                                           Õ
                                                                                                                                             =exp(𝑠 ( 𝑗, 𝑖))/             exp(𝑠 ( 𝑗 ′, 𝑖)),              (13)
methods. Specifically, we train our model using either a binary clas-                                                                                            𝑗 ′ =0
sification objective that only models whether a reaction span is child
                                                                                                        where 𝑠 ( 𝑗, 𝑖) denotes the score of the likelihood that span 𝑟𝑖 refers
or not, or an end-to-end objective that also models which span the                                      to preceding span 𝑟 𝑗 . The score is computed using the pair of corre-
child refers to. In decoding, our model uses either a heuristic method                                  sponding span representations:
or the learned end-to-end model to identify the parent reaction for                                                        𝑠 ( 𝑗, 𝑖) = 𝒘 ⊤
                                                                                                                                         P-NN 𝜙 𝑗,𝑖 + 𝑏 P-NN ,                                           (14)
each child reaction span.                                                                                                     𝜙 𝑗,𝑖 = FFNN(𝒈 𝑗 ⊕ 𝒈𝑖 ⊕ (𝒈 𝑗 ◦ 𝒈𝑖 ) ⊕ Dist( 𝑗, 𝑖))                         (15)
   Binary classification objective. The notation 𝑦𝑖 = 0 indicates the                                    for 𝑗 > 0, where ◦ denotes element-wise multiplication. Dist( 𝑗, 𝑖)
special case where the span 𝑟𝑖 does not refer to any preceding spans.                                    denotes the distance embedding, whose definition follows that of
The binary classification objective models whether or not a reaction                                     Lee et al. [20]. We define 𝑠 (0, 𝑖) = 0 for any 𝑖, corresponding to
                                                                                                  12
Chemical Reaction Reference Resolution in Patents                                                                                         PatentSemTech, July 15th, 2021, online


the case where 𝑟𝑖 has no reference. As we assume that the reaction                                                   # Documents                          143
spans are provided as part of the input, our model does not require                                                  # Paragraphs                      39,437
span likelihood scores as used in the original co-reference resolution                                               # Reaction spans                   3,072
formulation. In another departure from the co-reference resolution                                                   # Child reaction spans               549
setting, in our case the number of span pairs to be considered is                                                    # Tokens / Paragraph                73.7
tractable,2 meaning we do not need to perform candidate pruning.                             Table 1: Evaluation dataset statistics. Tokenization is based on
   Given the training dataset 𝐷 = (𝑑 1, . . . , 𝑑 |𝐷 | ), the model is trained
                                                                                             OSCAR4 [11].
to minimize the negative log-likelihood loss against the true parent
spans, where 𝑌𝑖∗ denotes the set of all correct parent span indices:
                                  Õ       |𝑅|
                                          Õ           Õ
                   LE2E = −                     log            Pr(𝑦𝑖 = 𝑗 | 𝑑).   (16)        the location information and then use the reference links to label
                              𝑑=(𝑃,𝑅) ∈𝐷 𝑖=1          𝑗 ∈𝑌𝑖∗
                                                                                             corresponding pairs of reaction spans. There are some limitations
   Note that our model is trained to predict only one parent for each                        due to the fact that the database is not originally intended to be used
child, although a child can have multiple parent spans. We observed                          for document processing. For example, the link information is not
that more than 90% of the child reaction spans in our dataset have                           always complete, especially when multiple references are involved.3
only a single parent, and this simplification has very limited effect                        It is for this reason that we consider the annotations a silver-standard,
on performance. We can easily extend our model to allow multiple                             with moderate recall and high precision.
parent spans by modeling the link probability for each candidate                                 We follow the work of Yoshikawa et al. [32] for patent document
parent span independently. Specifically, we can replace the softmax                          selection and obtaining reaction span information. We use the same
operation in Equation (13) with the sigmoid activation, and the loss                         set of patents, and extract the text from the description section of
function Equation (16) with the binary cross-entropy loss.                                   each patent. We split the dataset into five partitions, and perform
                                                                                             five-fold cross validation, using three partitions for training, one for
   Decoding. To decode with the model trained with the end-to-                               validation, and one for testing in each fold. We report the average
end objective, we use the maximum index (including 0, i.e. no                                performance across the five folds. An example and the statistics
reference) as the predicted parent index of a reaction span; we refer                        of the dataset are presented in Figure 3 and Table 1, respectively.
to this method as “E2E”. For the model trained with the binary                               We also sampled a small portion of our dataset and classified the
classification objective (Equation (11)), we use a simple heuristic                          reaction references based on the three types described in Figure 1.
to predict the parent reaction spans, called “I MM P REV ”: reaction                         Among the sampled reaction references, 72% are references with
spans that are classified as a child reaction span are linked to the                         the example ID, 20% are references with the compound label, and
immediately-preceding previous span that is not classified as a child                        8% are references with no direct referential expression.
span.
                                                                                             5 EXPERIMENTAL DETAILS
4     SILVER-STANDARD DATASET
                                                                                             5.1 Baseline Methods
To the best of our knowledge, there is no publicly available dataset
                                                                                             In addition to the neural model described in Section 3, we introduce
for reaction reference. Manually annotating a dataset of chemical
                                                                                             additional baselines, including heuristics and a traditional machine
reaction reference requires expert knowledge of chemistry and in-
                                                                                             learning method using bag-of-words features.
volves document-wise annotation of patents, with a single document
often containing hundreds of paragraphs. As a more viable alterna-                              Pattern matching baseline. We observe that there are high-precision
tive, we create a silver-standard dataset from Reaxys® , a large-scale                       patterns in child reactions that indicate their parent reaction spans.
commercial chemical reaction database which contains a large quan-                           Namely, the parent reaction is often mentioned by its “example ID”
tity of reaction information associated at the document level with                           such as Example 1, Preparation 1, or Step 1. An example is given in
patents. While our dataset relies on a proprietary database, we be-                          Figure 1(a). Based on this observation, we develop a rule-based base-
lieve that our methodology can generalize to similar patent silver                           line using regular expressions to detect such example ID patterns.
standards generated using comparable reaction databases.                                     For each reaction span that contains such a pattern, we search the
    Reaxys® contains information of chemical reactions associated                            preceding spans in reverse chronological order until we find a span
with their location information, i.e., the patents and the paragraphs                        that contains the example label in its heading or at the beginning of
where their reaction processes are described. To avoid exhaustive                            the text body.
manual extraction of the full reaction details for all compounds
                                                                                                 Immediate previous baseline. Another naïve baseline is to link
described in a patent, construction process of Reaxys® database is
                                                                                             all reaction spans except the first, to the immediately previous span,
streamlined through reuse of the conditions of a reaction for similar
                                                                                             i.e. 𝑦𝑡 = 𝑡 − 1. We label this method “NAIVE -I MM P REV ”. We also
reactions described in the same patent. For this reason, the database
                                                                                             include an oracle variant, “O RACLE -I MM P REV ”, which uses the true
includes internal reference links between similar reactions that share
                                                                                             labels for the subtask of child span detection (i.e. the oracle provides
reaction conditions. As such, the mapping process from the database
                                                                                             a perfect decision as to whether a reaction span is a child of other
to our silver-standard dataset is straightforward: we simply map each
                                                                                             reaction spans or not), and performs parent–child linking by linking
reaction into one or more paragraph sequences (reaction spans) using
                                                                                             3 Reactions that refer to multiple reaction steps usually have a link to only the final step
2 𝑂 ( |𝑅 | 2 ) , where |𝑅 | < 50 for more than 90% of the dataset.                           of the reaction sequence.
                                                                                        13
PatentSemTech, July 15th, 2021, online                                                                                                                             Yoshikawa et al.


          par_id       RX     parent_id                  text
             117        9               -1               General procedure for the synthesis of aryl N-(guanidino)imines N’-arylhydrazones 7Aa-Ag and 8Aa. ...
             118        -1                               Characterization of compounds 7Aa-Ag and 8Aa:
             119                                         E)-2-((4’-((E and Z)-(2-(2-Chlorophenyl)hydrazono)methyl)-[1,1’-biphenyl]-4-yl)methylene)hydrazine- ...
                       10               -1
             120                                         Yield: 70%; mp 328-330°C. 1H NMR, (400 MHz, DMSO-d6) 𝛿 6.81 (dt, J=8.4 and 1.6 Hz, 1H, H-4”), ...
             121                                         (E)-2-((4’-((E and Z)-(2-(2-Bromophenyl)hydrazono)methyl)-[1,1’-biphenyl]-4-yl)methylene)hydrazine- ...
                       11               10
             122                                         Yield: 64%; mp >380°C. 1H NMR (400 MHz, DMSO-d6) 𝛿 6.75 (dt, J=8.4 and 1.6 Hz, 1H, H-4”), ...
             123                                         (E)-2-((4’-((E)-(2-(4-Fluorophenyl)hydrazono)methyl)-[1,1’-biphenyl]-4-yl)methylene)hydrazine- ...
                       12               10
             124                                         Yield: 88%; mp >380°C. 1H NMR (400 MHz, DMSO-d6) 𝛿 7.07 (d, J=7.6 Hz, 2H, H-2”), ...


Figure 3: An example of our silver-standard dataset. Only a fraction of the patent document is shown due to the space limitation. The
columns represent paragraph ID (par_id), reaction ID (RX), parent reaction ID (parent_id), and paragraph text (text). “-1”
in the reaction ID means that the paragraph is not a part of a reaction description. “-1” in the parent reaction ID means that the
reaction span does not have a parent reaction span.


each given child span to its immediately preceding non-child span,                                              Method                     P       R          F1
similarly to the I MM P REV decoding method from Section 3.2. By
                                                                                                                NAIVE -I MM P REV        .189     1.00    .315
assuming the annotations for child spans as given, we can measure
                                                                                                                Pattern matching         .876     .545    .660
the complexity of the parent–child linking task separately from the
                                                                                                                Bag-of-words             .882     .617    .688
child span detection task.
                                                                                                                 RR EF B INARY           .926     .866    .893
   Bag-of-words baseline. We also implement a simple bag-of-words                                                RR EF E2E               .897     .791    .839
classifier. For each reaction span 𝑟𝑖 , we construct a vector 𝒙 𝑖 of
vocabulary size |𝑉 |, where each element is the frequency of the                                  Table 2: Results of the baseline methods on child reaction span
corresponding term in the paragraph. We calculate the score of each                               detection in terms of precision (P), recall (R), and F-score
span being a child as follows:                                                                    (F1 ). The scores are the average of five-fold cross validation.
                                                                                                  “RR EF B INARY ” and “RR EF E2E ” indicate our neural model
                𝜋 BOW (𝑖) := Pr(𝑦𝑖 ≠ 0 | 𝑑)                                                       with the binary classification objective and the end-to-end ob-
                              =𝜎 (𝒘 ⊤
                                    C-BOW 𝒙 𝑖 + 𝑏 C-BOW ),                            (17)        jective, respectively.
where 𝜎 denotes the sigmoid function. For each span 𝑟𝑖 that is clas-
sified as a child, we calculate the score of every preceding span 𝑟 𝑗
being the parent of 𝑟𝑖 as:
            𝜋 BOW ( 𝑗, 𝑖) := Pr(𝑦𝑖 = 𝑗 |𝑦𝑖 ≠ 0, 𝑑)                                    (18)        30 trigram filters to generate 25d character embeddings. For the NER
                                                     𝑖−1
                                                     Õ                                            features, we use a named entity recognizer based on the Reaxys®
                         =exp(𝑠 BOW ( 𝑗, 𝑖))/                 exp(𝑠 BOW ( 𝑗 ′, 𝑖)),               Gold chemical tag set [33], and assign 10d trainable embeddings
                                                     𝑗 ′ =1
                                                                                                  to the token-level labels. We use 5d trainable embeddings for the
             𝑠 BOW ( 𝑗, 𝑖) =𝒘 ⊤
                              P-BOW (𝒙 𝑖 ⊕ 𝒙 𝑗 ⊕ 𝒙 𝑗,𝑖 ) + 𝑏 P-BOW ,                  (19)
                                                                                                  distance embeddings in 𝑠 ( 𝑗, 𝑖). In the FFNN layer in 𝑓 (𝑖) and 𝑠 ( 𝑗, 𝑖),
where 𝒙 𝑗,𝑖 is a one-hot vector of the words appearing in both 𝑟 𝑗 and                            we use a 2-layer feed-forward network with 150d hidden layers and
𝑟𝑖 . The model is trained to optimize the following joint loss function,                          a ReLU activation. We optimized the number of LSTM layers and
with an L2 regularization term:                                                                   hidden state dimensionalities for the two bidirectional models via
                                  Õ
                    LBOW =                  (𝐿bin (𝑃, 𝑅) + 𝐿link (𝑃, 𝑅)),             (20)        grid search on the validation set.
                              (𝑃,𝑅) ∈𝐷                                                               The models are trained using the Adam optimizer [14], with
                                  |𝑅| 
                                                                                                  dropout and gradient clipping ≤ 5.0. The dropout rate, applied evenly
                                          1𝑦𝑖 ≠0 𝜋 BOW (𝑖)
                                  Õ
                𝐿bin (𝑃, 𝑅) = −
                                  𝑖=1
                                                                                                  across the model, is selected from {0.3, 0.5} based on the validation
                                                                                                  data. The minibatch size is 1, and the model is trained for up to 150
                                 +1𝑦𝑖 =0 (1 − 𝜋 BOW (𝑖)) ,
                                                          
                                                                                      (21)
                                Õ      Õ                                                          epochs, with early stopping.
               𝐿link (𝑃, 𝑅) = −    log      𝜋 BOW ( 𝑗, 𝑖).                            (22)
                                  𝑖:𝑦𝑖 ≠0       𝑗 ∈𝑌𝑖∗
                                                                                                  6 RESULTS AND DISCUSSION
5.2    Implementation Details and Optimization                                                    6.1 Main Results
Each paragraph is tokenized using the OSCAR4 tokenizer [11],                                         Child span detection. Table 2 shows the results of the baseline
which was developed for the chemical domain. We use embed-                                        methods on child span detection, which is the performance in terms
dings from word2vec skip-gram [23] (𝑑 WE = 200) and ELMo [25]                                     of binary classification predicting whether each reaction span is a
(𝑑 ELMo = 1024) for the WE and ELMo embeddings in Equation (1),                                   child of another reaction span or not. The neural methods obtain
respectively, both trained on chemical patent documents [33]. For                                 high F-scores over 0.8, whereas the bag-of-words classifier shows
the character-level word encodings, we apply a character CNN with                                 only a small gain over the rule-based baseline. This suggests that this
                                                                                             14
Chemical Reaction Reference Resolution in Patents                                                                  PatentSemTech, July 15th, 2021, online


             Method                     P     R (R m )        F1 (F1m )         sequences within a few sentences. A model that can more effectively
             NAIVE -I MM P REV        .026 .145 (.133) .044±.011 (.043)         learn to capture relations between long text spans is left for future
             Pattern matching         .332 .207 (.188) .252±.096 (.237)         work.
             Bag-of-words             .302 .236 (.217) .258±.173 (.244)
             RR EF B INARY -I MM P REV .595 .560 (.506) .576±.130 (.545)        6.2    Error Analysis
             RR EF E2E                 .390 .347 (.316) .367±.155 (.349)
                                                                                Examples of the system output are shown in Figure 4, which corre-
             O RACLE -I MM P REV .706 .706 (.643) .706±.088 (.672)
                                                                                sponds to the typical reference cases introduced in Figure 1, namely
Table 3: End-to-end performance of the baseline methods in                      (a) reference with the example ID, (b) reference with the compound
terms of precision (P), recall (R), and F-score (F1 ). R m and                  label, and (c) reference with no direct referential expression. Ob-
F1m are calculated taking multiple references into account.                     viously, the pattern matching baseline fails when the target span
The scores are the average performance of five-fold cross val-                  does not mention the example label of its parent span. While the
idation. The standard deviations are also displayed for F1 .                    performance of O RACLE -I MM P REV and RR EF E2E is less tied to
“RR EF B INARY -I MM P REV ” model is trained on the binary classi-             specific reference cases, they can fail in many “easy” cases where
fication only and performs I MM P REV decoding. “RR EF E2E ”                    there is an explicit mention of the parent example ID, indicating
model is trained with the end-to-end objective. Note that                       that it is difficult to learn such direct references without explicit
O RACLE -I MM P REV uses the true labels for detecting the child                training signal that forces the model to attend to particular patterns.
spans and thus the evaluation is not end-to-end in a strict sense.              The example in Figure 4 (c) is a special case that involves general
                                                                                conditions: the first paragraphs describe the general procedure to
                                                                                synthesize multiple similar compounds, and the examples of specific
                                                                                compound names are listed in subsequent paragraphs. We can see
subtask is ML-feasible, noting that rich document representations
                                                                                that RR EF E2E fails to detect references to general conditions that
are needed to achieve reasonable performance.
                                                                                O RACLE -I MM P REV detects successfully. In some cases, RR EF E2E
   End-to-end performance. Table 3 shows the performance on the                 even fails to recognize them as a child reaction, which might be
end-to-end task. As our methods assume a single parent for each                 because reference to general conditions is likely to be implicit, i.e.
child span, we evaluate the performance with two different measures.            the child description does not explicitly mention the example label
As the primary measure, we calculate the precision (P), recall (R),             of its parent description.
and F-scores (F1 ) by regarding the prediction as correct if the model
predicts one of the true parents of the target reaction span. We                7     RELATED WORK
also report the recall (R m ) and F-score (F1m ) based on the multiple          Krallinger et al. [17] provide an extensive survey of information
reference scenario, in which, if the target reaction span has multiple          retrieval and text mining for chemical literature, including extraction
true parents, the recall is calculated based on the total number of the         of chemical reaction information from patents. Several systems have
parents in the denominator.4                                                    been proposed that specifically target text mining from chemical
   For the end-to-end task, RR EF B INARY -I MM P REV performs better           patents [9, 19, 28]. Previous work has mainly focused on extract-
than RR EF E2E model, even though it is trained only with the bi-               ing chemical concepts and relations from individual paragraphs. In
nary classification objective and not trained on parent–child linking.          the ChEMU 2020 evaluation lab [8], state-of-the-art methods for
The RR EF E2E model struggles to detect parent–child links effec-               chemical named entity recognition and event extraction were evalu-
tively, and achieves lower performance than the simple I MM P REV               ated on patent documents. Jessop et al. [10] propose a broader-scope,
decoding. Considering the fact that the RR EF E2E model performs                multi-stage system called PatentEye, which first detects experimental
comparably with RR EF B INARY at child span detection, we hypothe-              sections in the description part of a patent, and then passes individual
size that even if the model is trained with the end-to-end objective, it        sections into downstream modules that extract reaction details (e.g.
primarily learns to detect child reaction spans from non-child ones,            title compound, reagents, and analytical data) from the input text.
largely ignoring relations between other reaction spans.                        Although they mention the presence of references between reaction
   These results reveal that it is the parent–child linking step rather         descriptions, they do not present specific ways to resolve them.
than the child detection step that makes reaction reference resolution              To our knowledge, the first study to take into account references in
difficult. I MM P REV decoding is a strong baseline (0.706 F-score              reaction descriptions was by Ai et al. [1]. Their method automatically
given oracle child reaction spans), but there is significant room for           extracts chemical reactions from the experimental section of papers
improvement. Despite its success in the traditional co-reference                in chemistry journals, representing a reaction as a synthesis frame
resolution task, the RR EF E2E method struggles to learn the pair-              containing arguments such as Product, Yield, Role, and Substance.
wise relations between reaction spans. One of the main differences              They employ template matching against parsed sentences to detect
between the traditional co-reference resolution task and the reac-              phrases related to predefined reaction events. To facilitate copying
tion reference resolution task is text length: the target spans are             of relevant information between synthesis frames, they define two
paragraph sequences in a long document rather than short word                   types of inter-paragraph references, namely general procedures and
4 P = #(correct refs) / #(predicted refs),                                      analogous syntheses. The paragraphs are linked based on certain
R = #(correct refs) / #(spans with at least one true refs),                     referring expressions such as example IDs and compound labels.
F1 = 2P R / ( P + R) ,
R m = #(correct refs) / #(all true refs),                                       Overall, their system solely relies on handcrafted patterns and rules,
F1m = F1 = 2P R m / ( P + R m )                                                 and thus the cases covered by the system are limited. Although their
                                                                           15
PatentSemTech, July 15th, 2021, online                                                                                                                                     Yoshikawa et al.


                                                                                                                 (c) No direct referential expression
     (a) Example ID                                (b) Compound label                                            RX     Text
     RX      Text                                  RX       Text                                                 9      General procedure for the synthesis of aryl N-
     23      Example 24 ...                                                                                             (guanidino)imines N’-arylhydrazones 7Aa-Ag and 8Aa.
                                                   18       A mixture of the obtained ester, ... Column                 ...
             ...                                            chromatography of the residue (silica gel-
                                                            hexane/ethyl acetate, 9:1) gave Compound                    Characterization of compounds 7Aa-Ag and 8Aa:
     25      Example 26 ...
                                                            B11, ...                                             10     (E)-2-((4’-((E and Z)-(2-(2-Chlorophenyl) hydrazono) methyl)-
                                                            ...                                                         ... (7Aa)
     26      Example 27 ...
                                                                                                                        Yield: 70%; mp 328-330◦ C. 1H NMR, ...
             The title compound was synthe-
                                                   20       Using 2-ethoxyethanol and following the
                                                            procedure for Compound B11 gave Com-
             sized by the procedure described in                                                                 11     (E)-2-((4’-((E and Z)-(2-(2-Bromophenyl) hydrazono) methyl)-
                                                            pound B13, ...
             Example 26 starting from ...                                                                               ... (7Ab)
                                                   True                             20 → 18                             Yield: 64%; mp > 380◦ C. 1H NMR ...
     True                       26 → 25
     Pattern matching           26 → 25            Pattern matching                 no reference                 True                                       11 → 10
     O RACLE -I MM P REV        26 → 25            O RACLE -I MM P REV              20 → 18                      Pattern matching                           no reference
     RR EF E2E                  26 → 23            RR EF E2E                        20 → 18                      O RACLE -I MM P REV                        11 → 10
                                                                                                                 RR EF E2E                                  11 → 7


Figure 4: Abbreviated system output examples. Only parts of the patent documents are shown due to the space limitation. “RX” is
the reaction ID, and “True” is the label in the silver set. “X → Y” indicates that the child reaction X refers to the parent reaction Y.


system is reported to produce results in the range of 80–90% for                                   Some aspects of reaction reference are not well explored in this
simple synthesis paragraphs, and 60–70% for complex paragraphs,                                 paper. For example, we did not observe many examples involving
details of their data set and evaluation procedures are not provided. It                        multiple references and references to general conditions in our silver-
is unclear whether these results would generalise to larger data sets                           standard dataset. This is partly because the database used to construct
with more variability, and we cannot directly compare their approach                            our corpus is not designed for this use. Very recently, ChEMU [7],
with ours.                                                                                      a shared task of chemical information extraction was held and the
   Finally, we compare our task formulation with text alignment                                 reaction reference resolution task was included as one of the key
task. Text alignment task [34] is arguably one of the most relevant                             tasks. As a part of the shared task, a dataset with gold annotation of
tasks to reaction reference resolution, as it requires understanding of                         reaction reference was made publicly available. 5 We will evaluate
long documents and it has an application to citation recommenda-                                the performance of our baselines on the gold-standard dataset. In
tion [4, 12]. A key difference between text alginment task and our                              addition, although we have cast the problem as a text processing task,
reaction reference task is that text alignment task mainly focuses on                           a more complete description of the reaction would involve the com-
alignment of text portions in different documents, whereas our reac-                            bination of the textual body, figures, and tables in the patent. Thus, a
tion reference task targets to find inner-document references between                           multi-modal version of the reaction reference task is a possible di-
reaction descriptions. Our task requires not only understanding of                              rection to explore in the future. Another interesting extension of this
the content of each reaction description based on its contextual infor-                         taks would be considering cross-document references of chemical
mation but also considering the document structure such as relative                             reactions.
position of reaction description pairs, as exemplified in Figure 1 (c).
A child reaction often omits a large part of the reaction description                           ACKNOWLEDGMENTS
(sometimes it is just a name of the target compound), unlike typical
                                                                                                This project is supported with funding to the ChEMU (Cheminfor-
inter-document references where the referer shares some common
                                                                                                matics Elsevier Melbourne University) project from the Australian
context with the source document it refers to. Applying core ideas
                                                                                                Research Council Linkage Project LP160101469, with partner con-
from existing methods for text alignment task to reaction reference
                                                                                                tributions from Elsevier. Some of the computations in this paper
task would be an interesting direction for future work.
                                                                                                were performed using the Spartan HPC-Cloud Hybrid [18] at the
                                                                                                University of Melbourne.
8    CONCLUSION
We have introduced the chemical reaction reference resolution prob-                             REFERENCES
lem, a largely unexplored yet critical step in information extraction                              [1] C. S. Ai, Paul E. Blower, and R. H. Ledwith. 1990. Extraction of Chemical Reac-
of complete reaction details presented in chemical patents. We pro-                                    tion Information from Primary Journal Text. Journal of Chemical Information and
                                                                                                       Computer Sciences 30, 2 (1990), 163–169. https://doi.org/10.1021/ci00066a012
vided a formal specification of this task, built an evaluation corpus,                                 arXiv:https://pubs.acs.org/doi/pdf/10.1021/ci00066a012
and adapted a neural co-reference architecture to this task, as well                               [2] Saber A. Akhondi, Hinnerk Rey, Markus Schwörer, Michael Maier,
                                                                                                       John Toomey, Heike Nau, Gabriele Ilchmann, Mark Sheehan, Matthias
as introducing several other baseline methods. Our neural method                                       Irmer, Claudia Bobach, Marius Doornenbal, Michelle Gregory, and
achieved promising results for detecting (child) reaction descriptions                                 Jan A. Kors. 2019.          Automatic Identification of Relevant Chemical
containing references, but identifying the precise (parent) reference                                  Compounds from Patents.         Database 2019 (2019).         https://doi.org/10.
                                                                                                       1093/database/baz001           arXiv:https://academic.oup.com/database/article-
of child spans is challenging for the model. Key issues for future                                     pdf/doi/10.1093/database/baz001/27636778/baz001.pdf
work are more effective learning of reference patterns in reaction
descriptions, and better handling of long text spans, as well as long-
distance relations.                                                                             5 http://chemu2021.eng.unimelb.edu.au/

                                                                                         16
Chemical Reaction Reference Resolution in Patents                                                                                        PatentSemTech, July 15th, 2021, online


 [3] Cecilia Arighi, Lynette Hirschman, Thomas Lemberger, Samuel Bayer, Robin                      49/58ead90dceaaa
     Liechti, Donald Comeau, and Cathy Wu. 2017. Bio-ID Track Overview. In                    [19] Alexander Johnston Lawson, Stefan Roller, Helmut Grotz, Janusz L Wisniewski,
     Proceedings of the BioCreative VI Workshop. 14–19.                                            and Libuse Goebels. 2011. Method and Software for Extracting Chemical Data.
 [4] Chandra Bhagavatula, Sergey Feldman, Russell Power, and Waleed Ammar. 2018.                   German patent no. DE102005020083A1.
     Content-Based Citation Recommendation. In Proceedings of the 2018 Conference             [20] Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. 2017. End-to-End
     of the North American Chapter of the Association for Computational Linguis-                   Neural Coreference Resolution. In Proceedings of the 2017 Conference on Em-
     tics: Human Language Technologies, Volume 1 (Long Papers). Association for                    pirical Methods in Natural Language Processing. Association for Computational
     Computational Linguistics, 238–251. https://doi.org/10.18653/v1/N18-1022                      Linguistics, 188–197. https://doi.org/10.18653/v1/D17-1018
 [5] James E. Blake and Robert C. Dana. 1990. CASREACT: More than a Million                   [21] Daniel M. Lowe. 2012. Extraction of Chemical Structures and Reactions from the
     Reactions. Journal of Chemical Information and Modeling 30, 4 (1990), 394–399.                Literature. Ph.D. Dissertation. University of Cambridge.
     https://doi.org/10.1021/ci00068a008                                                      [22] Daniel Magilavy and Polly Pine. 2018. 2,4 Substituted Pyrimidinediamines for
 [6] Peter Corbett, Colin Batchelor, and Simone Teufel. 2007. Annotation of Chemical               Use in Discoid Lupus. European patent no. EP2683385B1.
     Named Entities. In Biological, Translational, and Clinical Language Processing.          [23] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.
     Association for Computational Linguistics, 57–64.                                             Distributed Representations of Words and Phrases and Their Compositionality. In
 [7] Jiayuan He, Biaoyan Fang, Hiyori Yoshikawa, Yuan Li, Saber A. Akhondi, Chris-                 Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou,
     tian Druckenbrodt, Camilo Thorne, Zubair Afzal, Zenan Zhai, Lawrence Cavedon,                 M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). 3111–3119.
     Trevor Cohn, Timothy Baldwin, and Karin Verspoor. 2021. ChEMU 2021: Re-                  [24] Dat Quoc Nguyen, Zenan Zhai, Hiyori Yoshikawa, Biaoyan Fang, Christian
     action Reference Resolution and Anaphora Resolution in Chemical Patents. In                   Druckenbrodt, Camilo Thorne, Ralph Hoessel, Saber A Akhondi, Trevor Cohn,
     Advances in Information Retrieval, Djoerd Hiemstra, Marie-Francine Moens,                     Timothy Baldwin, et al. 2020. ChEMU: Named Entity Recognition and Event
     Josiane Mothe, Raffaele Perego, Martin Potthast, and Fabrizio Sebastiani (Eds.).              Extraction of Chemical Reactions from Patents. In European Conference on
     Springer International Publishing, 608–615.                                                   Information Retrieval. Springer, 572–579.
 [8] Jiayuan He, Dat Quoc Nguyen, Saber A. Akhondi, Christian Druckenbrodt,                   [25] Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark,
     Camilo Thorne, Ralph Hoessel, Zubair Afzal, Zenan Zhai, Biaoyan Fang, Hiyori                  Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representa-
     Yoshikawa, Ameer Albahem, Jingqi Wang, Yuankai Ren, Zhi Zhang, Yaoyun                         tions. In Proceedings of the 2018 Conference of the North American Chapter of
     Zhang, Mai Hoang Dao, Pedro Ruas, Andre Lamurias, Francisco M. Couto, Jenny                   the Association for Computational Linguistics: Human Language Technologies,
     Copara, Nona Naderi, Julien Knafou, Patrick Ruch, Douglas Teodoro, Daniel                     Volume 1 (Long Papers). Association for Computational Linguistics, 2227–2237.
     Lowe, John Mayfield, Abdullatif Köksal, Hilal Dönmez, Elif Özkirimli, Arzucan                 https://doi.org/10.18653/v1/N18-1202
     Özgür, Darshini Mahendran, Gabrielle Gurdin, Nastassja Lewinski, Christina               [26] Stanislaw Rachwal and Yufen Hu. 2018. Highly Photo-Stable Bis-Triazole Fluo-
     Tang, Bridget T. McInness, C.S. Malarkodi, Pattabhi Rk Rao, Sobha Lalitha Devi,               rophores. US patent no. US20180002337A1.
     Lawrence Cavedon, Trevor Cohn, Timothy Baldwin, and Karin Verspoor. 2020.                [27] Pedro Ruas, Andre Lamurias, and Francisco M. Couto. 2020. Linking Chem-
     An Extended Overview of the CLEF 2020 ChEMU Lab : Information Extraction                      ical and Disease Entities to Ontologies by Integrating PageRank with Ex-
     of Chemical Reactions from Patents. Proceedings of the CLEF 2020 conference                   tracted Relations from Literature. Journal of Cheminformatics 12, 1 (2020),
     (2020).                                                                                       57. https://doi.org/10.1186/s13321-020-00461-4
 [9] Matthias Irmer, Lutz Weber, Timo Böhme, Anett Püschel, Claudia Bobach, and Ulf           [28] Nadine Schneider, Daniel M Lowe, Roger A Sayle, Michael A Tarselli, and Gre-
     Laube. 2015. OCMiner for Patents. Extracting Chemical Information from Patent                 gory A Landrum. 2016. Big Data from Pharmaceutical Patents: A Computational
     Texts. In Proceedings of the Fifth BioCreative Challenge Evaluation Workshop.                 Analysis of Medicinal Chemists’ Bread and Butter. Journal of medicinal chemistry
     119–123.                                                                                      59, 9 (2016), 4385–4402.
[10] David M. Jessop, Sam E. Adams, and Peter Murray-Rust. 2011. Mining Chemical              [29] Ingemar Starke, Mikael Ulf Johan Dahlstrom, David Blomberg, Suzanne Alenfalk,
     Information from Open Patents. Journal of Cheminformatics 3, 1 (2011), 40.                    Tore Skjaret, and Malin Lemurell. 2017. Benzothiazepine and Benzothiadiazepine
     https://doi.org/10.1186/1758-2946-3-40                                                        Derivatives with Ileal Bile Acid Transport (Ibat) Inhibitory Activity for the Treat-
[11] David M. Jessop, Sam E. Adams, Egon L. Willighagen, Lezan Hawizy, and Peter                   ment Hyperlipidaemia. European patent no. EP1427423B9.
     Murray-Rust. 2011. OSCAR4: A Flexible Architecture for Chemical Text-Mining.             [30] Jorge A. Vanegas, Sérgio Matos, Fabio González, and José L. Oliveira. 2015.
     Journal of Cheminformatics 3, 1 (2011), 41. https://doi.org/10.1186/1758-2946-                An Overview of Biomolecular Event Extraction from Scientific Documents.
     3-41                                                                                          Computational and Mathematical Methods in Medicine 2015 (2015), 571381.
[12] Jyun-Yu Jiang, Mingyang Zhang, Cheng Li, Michael Bendersky, Nadav Golbandi,                   https://doi.org/10.1155/2015/571381
     and Marc Najork. 2019. Semantic Text Matching for Long-Form Documents.                   [31] Chih-Hsuan Wei, Yifan Peng, Robert Leaman, Allan Peter Davis, Carolyn J
     In The World Wide Web Conference (WWW ’19). Association for Computing                         Mattingly, Jiao Li, Thomas C Wiegers, and Zhiyong Lu. 2015. Overview of the
     Machinery, 795–806. https://doi.org/10.1145/3308558.3313707                                   BioCreative V Chemical Disease Relation (CDR) Task. In Proceedings of the Fifth
[13] J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii. 2003. GENIA Corpus— a Semantically             BioCreative Challenge Evaluation Workshop, Vol. 14.
     Annotated Corpus for Bio-Textmining. Bioinformatics 19, suppl_1 (2003), i180–            [32] Hiyori Yoshikawa, Dat Quoc Nguyen, Zenan Zhai, Christian Druckenbrodt,
     i182. https://doi.org/10.1093/bioinformatics/btg1023                                          Camilo Thorne, Saber A. Akhondi, Timothy Baldwin, and Karin Verspoor. 2019.
[14] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic                          Detecting Chemical Reactions in Patents. In Proceedings of the The 17th Annual
     Optimization. In 3rd International Conference on Learning Representations, ICLR               Workshop of the Australasian Language Technology Association. Australasian
     2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua                 Language Technology Association, 100–110.
     Bengio and Yann LeCun (Eds.).                                                            [33] Zenan Zhai, Dat Quoc Nguyen, Saber Akhondi, Camilo Thorne, Christian Druck-
[15] Martin Krallinger, Obdulia Rabal, Saber A Akhondi, et al. 2017. Overview of the               enbrodt, Trevor Cohn, Michelle Gregory, and Karin Verspoor. 2019. Improving
     BioCreative VI Chemical-Protein Interaction Track. In Proceedings of the Sixth                Chemical Named Entity Recognition in Patents with Contextualized Word Embed-
     BioCreative Challenge Evaluation Workshop, Vol. 1. 141–146.                                   dings. In Proceedings of the 18th BioNLP Workshop and Shared Task. Association
[16] Martin Krallinger, Obdulia Rabal, Florian Leitner, Miguel Vazquez, David Sal-                 for Computational Linguistics, 328–338.
     gado, Zhiyong Lu, Robert Leaman, Yanan Lu, Donghong Ji, Daniel M. Lowe,                  [34] Xuhui Zhou, Nikolaos Pappas, and Noah A. Smith. 2020. Multilevel Text Align-
     Roger A. Sayle, Riza Theresa Batista-Navarro, Rafal Rak, Torsten Huber, Tim                   ment with Cross-Document Attention. In Proceedings of the 2020 Conference on
     Rocktäschel, Sérgio Matos, David Campos, Buzhou Tang, Hua Xu, Tsendsuren                      Empirical Methods in Natural Language Processing (EMNLP). Association for
     Munkhdalai, Keun Ho Ryu, SV Ramanan, Senthil Nathan, Slavko Žitnik, Marko                     Computational Linguistics, 5012–5025. https://doi.org/10.18653/v1/2020.emnlp-
     Bajec, Lutz Weber, Matthias Irmer, Saber A. Akhondi, Jan A. Kors, Shuo Xu,                    main.407
     Xin An, Utpal Kumar Sikdar, Asif Ekbal, Masaharu Yoshioka, Thaer M. Dieb,
     Miji Choi, Karin Verspoor, Madian Khabsa, C. Lee Giles, Hongfang Liu, Koman-
     dur Elayavilli Ravikumar, Andre Lamurias, Francisco M. Couto, Hong-Jie Dai,
     Richard Tzong-Han Tsai, Caglar Ata, Tolga Can, Anabel Usié, Rui Alves, Isabel
     Segura-Bedmar, Paloma Martínez, Julen Oyarzabal, and Alfonso Valencia. 2015.
     The CHEMDNER Corpus of Chemicals and Drugs and Its Annotation Principles.
     Journal of Cheminformatics 7, 1 (2015), S2.
[17] Martin Krallinger, Obdulia Rabal, Anália Lourenço, Julen Oyarzabal, and Alfonso
     Valencia. 2017. Information Retrieval and Text Mining Technologies for Chem-
     istry. Chemical Reviews 117, 12 (2017), 7673–7761. https://doi.org/10.1021/acs.
     chemrev.6b00851
[18] Lev Lafayette, Greg Sauter, Linh Vu, and Bernard Meade. 2017. Spartan HPC-
     Cloud Hybrid: Delivering Performance and Flexibility. https://doi.org/10.4225/

                                                                                         17