=Paper= {{Paper |id=Vol-2909/paper1 |storemode=property |title=Chemical Reaction Reference Resolution in Patents |pdfUrl=https://ceur-ws.org/Vol-2909/paper1.pdf |volume=Vol-2909 |authors=Hiyori Yoshikawa,Saber Akhondi,Camilo Thorne,Christian Druckenbrodt,Ralph Hoessel,Zenan Zhai,Jiayuan He,Timothy Baldwin,Karin Verspoor }} ==Chemical Reaction Reference Resolution in Patents== https://ceur-ws.org/Vol-2909/paper1.pdf

Chemical Reaction Reference Resolution in Patents
Hiyori Yoshikawa* Saber Akhondi Camilo Thorne
Fujitsu Limited Elsevier BV Elsevier Information Systems GmbH
Tokyo, Japan Amsterdam, Netherlands Frankfurt, Germany
y.hiyori@fujitsu.com s.akhondi@elsevier.com c.thorne.1@elsevier.com

Christian Druckenbrodt Ralph Hoessel Zenan Zhai
Elsevier Information Systems GmbH Elsevier Information Systems GmbH The University of Melbourne
Frankfurt, Germany Frankfurt, Germany Melbourne, Australia
c.druckenbrodt@elsevier.com r.hoessel@elsevier.com zenan.zhai@student.unimelb.edu.au

Jiayuan He† Timothy Baldwin Karin Verspoor‡
RMIT University The University of Melbourne RMIT University
Melbourne, Australia Melbourne, Australia Melbourne, Australia
estrid.he@rmit.edu.au tbaldwin@unimelb.edu.au karin.verspoor@unimelb.edu.au
ABSTRACT 1 INTRODUCTION
Many new chemical compounds are reported each year in patent Patents represent a critical source of information about new chemi-
documents, leading to increasing demand for methods for automatic cal compounds, and lead journal publications in both volume and
information extraction of chemical compounds and reactions from time [2]. Patents provide not only the name and characteristics of
patents. Chemical patents often detail a number of similar com- new chemical compounds, but also the reaction details for their
pounds that have a common substructure and can be synthesized in synthesis, including starting materials, product, reagents, catalysts,
analogous ways, and therefore contain many references connecting solvents, and the conditions of the reactions such as temperature
descriptions of similar chemical reactions, to avoid redundancy in and time. Databases such as Reaxys®1 and CASREACT [5] store
describing common reaction conditions. This leads to the problem a large volume of such chemical reaction information. Given the
of reaction reference resolution, where, given a reaction description, commercial and research value of the information in patents, as well
we need to identify links to other reaction descriptions it refers to. as the large numbers of available patents, developing methods for
In this paper, we formally introduce the task and propose baseline automatic information extraction from chemical patents has been a
methods to address it in analogy with co-reference resolution. To focus of recent research [17, 24].
evaluate the performance, we create a large-scale silver-standard A text mining system for chemical reaction information extraction
dataset based on a commercial database of chemical reactions. The should aim to obtain details of each reaction, requiring: (1) the
experimental results show that the approach based on a state-of- identification of specific spans of text describing reactions; and (2)
the-art co-reference resolution method struggles to outperform a the extraction of the compounds participating in, and the conditions
simple heuristic in detecting reference links, demonstrating the diffi- of, a reaction. The second step involves named entity recognition [6,
culty of the proposed task and its fundamentally different nature to 8, 13, 16, 24], relation extraction [15, 31], entity linking [3, 27], and
co-reference resolution. event extraction [8, 24, 30], which have been explored intensively in
previous studies. A relatively small number of studies have addressed
CCS CONCEPTS the chemical reaction extraction task as a whole [1, 10, 19, 21].
The first step of locating all information relevant to a chemical re-
• Computing methodologies → Information extraction.
action has received relatively little attention. This step is a challenge
as it requires processing very long and semantically unstructured
KEYWORDS documents, in which a number of similar chemical reactions are
information extraction, reaction reference resolution, natural lan- presented. Yoshikawa et al. [32] proposed the chemical reaction
guage processing detection task to locate descriptions of individual chemical reactions
in patents, and achieved promising results using a contextualized
document modeling method. Jessop et al. [10] also addressed the de-
* Also with The University of Melbourne.
† Also with The University of Melbourne. tection of chemical reaction descriptions in a heuristic way. However,
‡ Also with The University of Melbourne. previous work has not addressed the important step of resolving ref-
erences between reaction descriptions. In general, a chemical patent
reports a number of similar compounds that have a common chemi-
cal substructure and can be synthesized in analogous ways. Synthesis
PatentSemTech, July 15th, 2021, online steps that are common across similar reactions may be described
© 2021 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-
1 https://www.reaxys.com. Copyright ® 2021 Elsevier Life Sciences IP Limited. Reaxys®
WS.org)
is a trademark of Elsevier Life Sciences IP Limited, used under license.
10
Chemical Reaction Reference Resolution in Patents PatentSemTech, July 15th, 2021, online

only once, and the paragraphs describing individual reactions refer 3 METHODS
back to those descriptions to provide specific details without re- In this section, we introduce a baseline neural model that captures the
peating information. In our dataset described below, approximately context and detects reference relations between reaction descriptions.
17% of reaction descriptions contain references to other reaction As illustrated in Figure 2, our model is based on the end-to-end
descriptions. neural co-reference model of Lee et al. [20]. The original model
To fill this gap in previous work and take a step towards an takes a contextualized representation for each entity mention and
end-to-end method for chemical reaction information extraction, computes the likelihood that a pair of mentions is co-referential,
this paper focuses on the task of reaction reference resolution. Our whereas our model extends it to a hierarchical architecture that first
contributions are three-fold: encodes individual paragraphs and then feeds it to a document-level
• We define the new task of reaction reference resolution, where, encoder, so that the model can capture references between longer
given a reaction description, we aim to identify other reaction text spans (i.e. multiple paragraphs).
descriptions it refers to.
• In the absence of a large-scale gold-standard dataset, we lever- 3.1 Paragraph and Span Representation
age Reaxys® , a large commercial chemical reaction database,
Our model has a hierarchical architecture that first encodes each
to create a silver-standard dataset, of a size that supports the
paragraph and then encodes the entire document to obtain a contex-
development of not only traditional rule-based methods but
tualized representation of each paragraph. We first encode each input
also neural network models.
paragraph to obtain a context-independent paragraph encoding. A
• We propose several baseline methods for the task and eval- (𝑡 ) (𝑡 )
paragraph 𝑝𝑡 consists of a sequence of words 𝑝𝑡 = {𝑤 1 , . . . , 𝑤 |𝑝 | }.
uate them on our silver-standard dataset. The baselines in- 𝑡
clude an extension of the state-of-the-art end-to-end neural Each word is represented by:
co-reference resolution method of Lee et al. [20]. Experimen- (𝑡 )
𝒙 𝑡,𝑘 = WE(𝑤𝑘 ) ⊕ ELMo(𝑝𝑡 )𝑘
tal results show that, while it successfully detects reaction
(𝑡 )
descriptions that refer to other textual spans, it struggles to ⊕ CC(𝑤𝑘 ) ⊕ NER(𝑝𝑡 )𝑘 , (1)
relate them correctly. (𝑡 )
where WE(𝑤𝑘 ) ∈ R𝑑WE is a pretrained word embedding, ELMo(𝑝𝑡 )𝑘 ∈
(𝑡 )
R𝑑ELMo is a contextualized word embedding [25], CC(𝑤𝑘 ) ∈ R𝑑CC
2 TASK DESCRIPTION is a CNN-based character-level word encoding, and NER(𝑝𝑡 )𝑘 ∈
In this section, we formally introduce the reaction reference resolu- R𝑑NER is a trainable embedding of the named entity label (here,
tion task. We assume a pipeline system where, given the Description chemical compound type). The details of the embeddings and named
section of a patent, text spans corresponding to individual reaction entity labels are provided in Section 5. We then encode 𝑝𝑡 into a
descriptions are extracted in advance (following the methods of vector using a bidirectional LSTM:
Jessop et al. [10] or Yoshikawa et al. [32]). Then, in a second step, −−−−→ ←−−
𝒉𝑡 = 𝒉𝑡, |𝑝𝑡 | ⊕ 𝒉𝑡,1, (2)
all relationships between reaction descriptions that are relevant are
identified to fully locate the details of each reaction. The output of −−→ −−−−→
𝒉𝑡,𝑘 = LSTM(𝒙 𝑡,𝑘 , 𝒉𝑡,𝑘−1 ; 𝜃 PF ), (3)
this task could then be passed to a targeted information extraction ←−− ←−−−−
system to pull out individual reaction details. 𝒉𝑡,𝑘 = LSTM(𝒙 𝑡,𝑘 , 𝒉𝑡,𝑘+1 ; 𝜃 PB ), (4)
An input document is represented by a tuple 𝑑 = (𝑃, 𝑅), where 𝑃 where 𝜃 PF and 𝜃 PB denote model parameters of the forward and the
is a sequence of paragraphs 𝑃 = (𝑝 1, . . . , 𝑝 |𝑃 | ) and 𝑅 is a sequence of backward LSTMs, respectively.
all textual spans of reaction descriptions 𝑅 = (𝑟 1, . . . , 𝑟 |𝑅 | ), dubbed To obtain the representation of each reaction span, we first obtain
reaction spans. A reaction span 𝑟𝑖 is a sequence of consecutive a contextualized representation of each paragraph using a document-
paragraphs, represented by the first and last indices of paragraphs of level bidirectional LSTM:
the span 𝑟𝑖 = (start(𝑖), end(𝑖)). For each reaction span 𝑟𝑖 , the task is →
− ← −
to: (1) predict whether the reaction refers to one or more previous 𝒉𝑡∗ = 𝒉𝑡∗ ⊕ 𝒉𝑡∗, (5)
reaction descriptions; and if so, (2) detect the reaction span(s) 𝑟 𝑗 (for →
−∗ −−∗−→
𝒉𝑡 = LSTM(𝒉𝑡 , 𝒉𝑡 −1 ; 𝜃 DF ), (6)
𝑗 < 𝑖) that 𝑟𝑖 refers to. Hereinafter, we call 𝑟 𝑗 the parent reaction ←−∗ ←∗−−−
and 𝑟𝑖 the child reaction. 𝒉𝑡 = LSTM(𝒉𝑡 , 𝒉𝑡 +1 ; 𝜃 DB ), (7)
Figure 1 shows typical examples of reaction references. A refer-
where 𝜃 DF and 𝜃 DB denote model parameters of the forward and the
ence relation is often indicated by an example ID, as in example (a),
backward LSTMs, respectively. These paragraph representations are
or the compound label used in the previous reaction description, as
combined to obtain each reaction span representation:
in example (b). However, it can also be the case that there is no direct
referential expression between reaction descriptions, as in example 𝒈𝑖 = 𝒉∗start(𝑖) ⊕ 𝒉∗end(𝑖) . (8)
(c). Therefore, we formulate the task as reference resolution between
reaction descriptions (paragraphs) rather than between a referring 3.2 Training Objectives
expression such as Example 1 and its referent, as in traditional co- Our goal is to identify the parent reaction span for each child re-
reference resolution tasks. Subtasks (1) and (2) are analogous to the action span. As described in Section 2, this can be further divided
mention detection and mention clustering subtasks of co-reference into two subtasks: predicting whether the given span is a child of
resolution. some reaction span, and linking the child spans to their parent spans.
11
PatentSemTech, July 15th, 2021, online Yoshikawa et al.

ID Text
(a) Reference with the example ID
RX1 Example 26 11-Dioxo-3,3-dibutyl-5-phenyl-7-methylthio-8-(N-{(R)-𝛼 -[N-((S)-1-carboxy-(R)-hydroxypropyl)carbamoyl]-4-...
1,1-Dioxo-3,3-dibutyl-5-phenyl-7-methylthio-8-[N-((R)-𝛼 -carboxy-4-hydroxybenzyl) carbamoylmethoxy]- 2,3,4,5-tetrahydro-1,2,5-benzothiadiazepine (Example 18; 100mg, 0.152mmol)
was dissolved in ...
RX2 Example 27 1,1-Dioxo-3,3-dibutyl-5-phenyl-7-methylthio-8-(N-{(R)-𝛼 -[N-((S)-1-carboxy-2-, methylpropyl)carbamoyl]-4-
The title compound was synthesized by the procedure described in Example 26 starting from ...
(b) Reference with the compound label
RX1 A mixture of the obtained ester, ... was stirred under argon and heated at 110◦ C. for 24 h. ... Column chromatography of the residue (silica gel-hexane/ethyl acetate, 9:1) gave Compound
B11, ...
...
RX2 Using 2-ethoxyethanol and following the procedure for Compound B11 gave Compound B13, bis(2-ethoxyethyl) 3,3’-((2-(bromomethyl)-2-((3-((2-
ethoxyethoxy)carbonyl)phenoxy)methyl)propane-1,3-diyl)bis(oxy))dibenzoate, (3.82 g, 35% yield). ...
(c) Reference with no direct referential expression
RX1 III: 5-fluoro-N2-(4-methyl-3-propionylaminosulfonylphenyl)-N4-[4-(prop-2-ynyloxy)phenyl]-2,4- pyrimidinediamine mono-sodium salt
5-Fluoro-N2-(4-methyl-3-propionylaminosulfonylphenyl)-N4-[4-(prop-2-ynyloxy)phenyl]-2,4-pyrimidinediamine, II, (0.125 g, 0.258 mmol) was suspended in ...
The following compounds were made in a similar fashion to those above.
RX2 IV: 5-Fluoro-N2-[4-methyl-3-(N-propionylaminosulfonyl)phenyl]-N4-[4-(2-propynyloxy)phenyl]-2,4-pyrimidinediamine Potassium Salt ...
RX3 V: 5-Fluoro-N2-[4-methyl-3-(N-propionylaminosulfonyl)phenyl]-N4-[4-(2-propynyloxy)phenyl]-2,4-pyrimidinediamine Calcium Salt ...

Figure 1: Abbreviated examples of reaction references: (a) reference with the example ID [29], (b) reference with the compound label
[26], (c) reference with no direct referential expression [22].

span is a child of at least one preceding reaction span:
𝜋 NN (𝑖) := Pr(𝑦𝑖 ≠ 0 | 𝑑) = 𝜎 (𝑓 (𝑖)), (9)
where 𝜎 denotes the sigmoid operator, and 𝑓 (𝑖) is a span scoring
function that computes the likelihood of the 𝑖-th span being a child,
using a feed-forward network:
𝑓 (𝑖) = 𝒘 ⊤
C-NN FFNN(𝒈𝑖 ) + 𝑏 C-NN . (10)
Given the training dataset 𝐷 = (𝑑 1, . . . , 𝑑 |𝐷 | ), the model is trained
to minimize the following binary cross-entropy loss:
|𝑅|
1𝑦𝑖 ≠0 log 𝜋 NN (𝑖)
Õ Õ
LBIN = −
𝑑=(𝑃,𝑅) ∈𝐷 𝑖=1

+1𝑦𝑖 =0 log(1 − 𝜋 NN (𝑖)) .

(11)

End-to-end objective. The end-to-end objective models not only
the likelihood of a span being a child, but also which span the child
span refers to. Analogous to Lee et al. [20], we aim to model the
conditional probability distribution for document 𝑑 as a product of
multinomials for individual spans:
|𝑅 |
Ö
Pr(𝑦1, . . . , 𝑦 |𝑅 | | 𝑑) = Pr(𝑦𝑖 | 𝑑). (12)
Figure 2: Illustration of our model architecture. 𝑖=1
The probability of each preceding span being the parent of the given
span is computed as follows:
𝜋 NN ( 𝑗, 𝑖) := Pr(𝑦𝑖 = 𝑗 | 𝑑)
To evaluate the difficulty of these individual subtasks, we propose
𝑖−1
several model variants that differ in training objectives and decoding Õ
=exp(𝑠 ( 𝑗, 𝑖))/ exp(𝑠 ( 𝑗 ′, 𝑖)), (13)
methods. Specifically, we train our model using either a binary clas- 𝑗 ′ =0
sification objective that only models whether a reaction span is child
where 𝑠 ( 𝑗, 𝑖) denotes the score of the likelihood that span 𝑟𝑖 refers
or not, or an end-to-end objective that also models which span the to preceding span 𝑟 𝑗 . The score is computed using the pair of corre-
child refers to. In decoding, our model uses either a heuristic method sponding span representations:
or the learned end-to-end model to identify the parent reaction for 𝑠 ( 𝑗, 𝑖) = 𝒘 ⊤
P-NN 𝜙 𝑗,𝑖 + 𝑏 P-NN , (14)
each child reaction span. 𝜙 𝑗,𝑖 = FFNN(𝒈 𝑗 ⊕ 𝒈𝑖 ⊕ (𝒈 𝑗 ◦ 𝒈𝑖 ) ⊕ Dist( 𝑗, 𝑖)) (15)
Binary classification objective. The notation 𝑦𝑖 = 0 indicates the for 𝑗 > 0, where ◦ denotes element-wise multiplication. Dist( 𝑗, 𝑖)
special case where the span 𝑟𝑖 does not refer to any preceding spans. denotes the distance embedding, whose definition follows that of
The binary classification objective models whether or not a reaction Lee et al. [20]. We define 𝑠 (0, 𝑖) = 0 for any 𝑖, corresponding to
12
Chemical Reaction Reference Resolution in Patents PatentSemTech, July 15th, 2021, online

the case where 𝑟𝑖 has no reference. As we assume that the reaction # Documents 143
spans are provided as part of the input, our model does not require # Paragraphs 39,437
span likelihood scores as used in the original co-reference resolution # Reaction spans 3,072
formulation. In another departure from the co-reference resolution # Child reaction spans 549
setting, in our case the number of span pairs to be considered is # Tokens / Paragraph 73.7
tractable,2 meaning we do not need to perform candidate pruning. Table 1: Evaluation dataset statistics. Tokenization is based on
Given the training dataset 𝐷 = (𝑑 1, . . . , 𝑑 |𝐷 | ), the model is trained
OSCAR4 [11].
to minimize the negative log-likelihood loss against the true parent
spans, where 𝑌𝑖∗ denotes the set of all correct parent span indices:
Õ |𝑅|
Õ Õ
LE2E = − log Pr(𝑦𝑖 = 𝑗 | 𝑑). (16) the location information and then use the reference links to label
𝑑=(𝑃,𝑅) ∈𝐷 𝑖=1 𝑗 ∈𝑌𝑖∗
corresponding pairs of reaction spans. There are some limitations
Note that our model is trained to predict only one parent for each due to the fact that the database is not originally intended to be used
child, although a child can have multiple parent spans. We observed for document processing. For example, the link information is not
that more than 90% of the child reaction spans in our dataset have always complete, especially when multiple references are involved.3
only a single parent, and this simplification has very limited effect It is for this reason that we consider the annotations a silver-standard,
on performance. We can easily extend our model to allow multiple with moderate recall and high precision.
parent spans by modeling the link probability for each candidate We follow the work of Yoshikawa et al. [32] for patent document
parent span independently. Specifically, we can replace the softmax selection and obtaining reaction span information. We use the same
operation in Equation (13) with the sigmoid activation, and the loss set of patents, and extract the text from the description section of
function Equation (16) with the binary cross-entropy loss. each patent. We split the dataset into five partitions, and perform
five-fold cross validation, using three partitions for training, one for
Decoding. To decode with the model trained with the end-to- validation, and one for testing in each fold. We report the average
end objective, we use the maximum index (including 0, i.e. no performance across the five folds. An example and the statistics
reference) as the predicted parent index of a reaction span; we refer of the dataset are presented in Figure 3 and Table 1, respectively.
to this method as “E2E”. For the model trained with the binary We also sampled a small portion of our dataset and classified the
classification objective (Equation (11)), we use a simple heuristic reaction references based on the three types described in Figure 1.
to predict the parent reaction spans, called “I MM P REV ”: reaction Among the sampled reaction references, 72% are references with
spans that are classified as a child reaction span are linked to the the example ID, 20% are references with the compound label, and
immediately-preceding previous span that is not classified as a child 8% are references with no direct referential expression.
span.
5 EXPERIMENTAL DETAILS
4 SILVER-STANDARD DATASET
5.1 Baseline Methods
To the best of our knowledge, there is no publicly available dataset
In addition to the neural model described in Section 3, we introduce
for reaction reference. Manually annotating a dataset of chemical
additional baselines, including heuristics and a traditional machine
reaction reference requires expert knowledge of chemistry and in-
learning method using bag-of-words features.
volves document-wise annotation of patents, with a single document
often containing hundreds of paragraphs. As a more viable alterna- Pattern matching baseline. We observe that there are high-precision
tive, we create a silver-standard dataset from Reaxys® , a large-scale patterns in child reactions that indicate their parent reaction spans.
commercial chemical reaction database which contains a large quan- Namely, the parent reaction is often mentioned by its “example ID”
tity of reaction information associated at the document level with such as Example 1, Preparation 1, or Step 1. An example is given in
patents. While our dataset relies on a proprietary database, we be- Figure 1(a). Based on this observation, we develop a rule-based base-
lieve that our methodology can generalize to similar patent silver line using regular expressions to detect such example ID patterns.
standards generated using comparable reaction databases. For each reaction span that contains such a pattern, we search the
Reaxys® contains information of chemical reactions associated preceding spans in reverse chronological order until we find a span
with their location information, i.e., the patents and the paragraphs that contains the example label in its heading or at the beginning of
where their reaction processes are described. To avoid exhaustive the text body.
manual extraction of the full reaction details for all compounds
Immediate previous baseline. Another naïve baseline is to link
described in a patent, construction process of Reaxys® database is
all reaction spans except the first, to the immediately previous span,
streamlined through reuse of the conditions of a reaction for similar
i.e. 𝑦𝑡 = 𝑡 − 1. We label this method “NAIVE -I MM P REV ”. We also
reactions described in the same patent. For this reason, the database
include an oracle variant, “O RACLE -I MM P REV ”, which uses the true
includes internal reference links between similar reactions that share
labels for the subtask of child span detection (i.e. the oracle provides
reaction conditions. As such, the mapping process from the database
a perfect decision as to whether a reaction span is a child of other
to our silver-standard dataset is straightforward: we simply map each
reaction spans or not), and performs parent–child linking by linking
reaction into one or more paragraph sequences (reaction spans) using
3 Reactions that refer to multiple reaction steps usually have a link to only the final step
2 𝑂 ( |𝑅 | 2 ) , where |𝑅 | < 50 for more than 90% of the dataset. of the reaction sequence.
13
PatentSemTech, July 15th, 2021, online Yoshikawa et al.

par_id RX parent_id text
117 9 -1 General procedure for the synthesis of aryl N-(guanidino)imines N’-arylhydrazones 7Aa-Ag and 8Aa. ...
118 -1 Characterization of compounds 7Aa-Ag and 8Aa:
119 E)-2-((4’-((E and Z)-(2-(2-Chlorophenyl)hydrazono)methyl)-[1,1’-biphenyl]-4-yl)methylene)hydrazine- ...
10 -1
120 Yield: 70%; mp 328-330°C. 1H NMR, (400 MHz, DMSO-d6) 𝛿 6.81 (dt, J=8.4 and 1.6 Hz, 1H, H-4”), ...
121 (E)-2-((4’-((E and Z)-(2-(2-Bromophenyl)hydrazono)methyl)-[1,1’-biphenyl]-4-yl)methylene)hydrazine- ...
11 10
122 Yield: 64%; mp >380°C. 1H NMR (400 MHz, DMSO-d6) 𝛿 6.75 (dt, J=8.4 and 1.6 Hz, 1H, H-4”), ...
123 (E)-2-((4’-((E)-(2-(4-Fluorophenyl)hydrazono)methyl)-[1,1’-biphenyl]-4-yl)methylene)hydrazine- ...
12 10
124 Yield: 88%; mp >380°C. 1H NMR (400 MHz, DMSO-d6) 𝛿 7.07 (d, J=7.6 Hz, 2H, H-2”), ...

Figure 3: An example of our silver-standard dataset. Only a fraction of the patent document is shown due to the space limitation. The
columns represent paragraph ID (par_id), reaction ID (RX), parent reaction ID (parent_id), and paragraph text (text). “-1”
in the reaction ID means that the paragraph is not a part of a reaction description. “-1” in the parent reaction ID means that the
reaction span does not have a parent reaction span.

each given child span to its immediately preceding non-child span, Method P R F1
similarly to the I MM P REV decoding method from Section 3.2. By
NAIVE -I MM P REV .189 1.00 .315
assuming the annotations for child spans as given, we can measure
Pattern matching .876 .545 .660
the complexity of the parent–child linking task separately from the
Bag-of-words .882 .617 .688
child span detection task.
RR EF B INARY .926 .866 .893
Bag-of-words baseline. We also implement a simple bag-of-words RR EF E2E .897 .791 .839
classifier. For each reaction span 𝑟𝑖 , we construct a vector 𝒙 𝑖 of
vocabulary size |𝑉 |, where each element is the frequency of the Table 2: Results of the baseline methods on child reaction span
corresponding term in the paragraph. We calculate the score of each detection in terms of precision (P), recall (R), and F-score
span being a child as follows: (F1 ). The scores are the average of five-fold cross validation.
“RR EF B INARY ” and “RR EF E2E ” indicate our neural model
𝜋 BOW (𝑖) := Pr(𝑦𝑖 ≠ 0 | 𝑑) with the binary classification objective and the end-to-end ob-
=𝜎 (𝒘 ⊤
C-BOW 𝒙 𝑖 + 𝑏 C-BOW ), (17) jective, respectively.
where 𝜎 denotes the sigmoid function. For each span 𝑟𝑖 that is clas-
sified as a child, we calculate the score of every preceding span 𝑟 𝑗
being the parent of 𝑟𝑖 as:
𝜋 BOW ( 𝑗, 𝑖) := Pr(𝑦𝑖 = 𝑗 |𝑦𝑖 ≠ 0, 𝑑) (18) 30 trigram filters to generate 25d character embeddings. For the NER
𝑖−1
Õ features, we use a named entity recognizer based on the Reaxys®
=exp(𝑠 BOW ( 𝑗, 𝑖))/ exp(𝑠 BOW ( 𝑗 ′, 𝑖)), Gold chemical tag set [33], and assign 10d trainable embeddings
𝑗 ′ =1
to the token-level labels. We use 5d trainable embeddings for the
𝑠 BOW ( 𝑗, 𝑖) =𝒘 ⊤
P-BOW (𝒙 𝑖 ⊕ 𝒙 𝑗 ⊕ 𝒙 𝑗,𝑖 ) + 𝑏 P-BOW , (19)
distance embeddings in 𝑠 ( 𝑗, 𝑖). In the FFNN layer in 𝑓 (𝑖) and 𝑠 ( 𝑗, 𝑖),
where 𝒙 𝑗,𝑖 is a one-hot vector of the words appearing in both 𝑟 𝑗 and we use a 2-layer feed-forward network with 150d hidden layers and
𝑟𝑖 . The model is trained to optimize the following joint loss function, a ReLU activation. We optimized the number of LSTM layers and
with an L2 regularization term: hidden state dimensionalities for the two bidirectional models via
Õ
LBOW = (𝐿bin (𝑃, 𝑅) + 𝐿link (𝑃, 𝑅)), (20) grid search on the validation set.
(𝑃,𝑅) ∈𝐷 The models are trained using the Adam optimizer [14], with
|𝑅|
dropout and gradient clipping ≤ 5.0. The dropout rate, applied evenly
1𝑦𝑖 ≠0 𝜋 BOW (𝑖)
Õ
𝐿bin (𝑃, 𝑅) = −
𝑖=1
across the model, is selected from {0.3, 0.5} based on the validation
data. The minibatch size is 1, and the model is trained for up to 150
+1𝑦𝑖 =0 (1 − 𝜋 BOW (𝑖)) ,

(21)
Õ Õ epochs, with early stopping.
𝐿link (𝑃, 𝑅) = − log 𝜋 BOW ( 𝑗, 𝑖). (22)
𝑖:𝑦𝑖 ≠0 𝑗 ∈𝑌𝑖∗
6 RESULTS AND DISCUSSION
5.2 Implementation Details and Optimization 6.1 Main Results
Each paragraph is tokenized using the OSCAR4 tokenizer [11], Child span detection. Table 2 shows the results of the baseline
which was developed for the chemical domain. We use embed- methods on child span detection, which is the performance in terms
dings from word2vec skip-gram [23] (𝑑 WE = 200) and ELMo [25] of binary classification predicting whether each reaction span is a
(𝑑 ELMo = 1024) for the WE and ELMo embeddings in Equation (1), child of another reaction span or not. The neural methods obtain
respectively, both trained on chemical patent documents [33]. For high F-scores over 0.8, whereas the bag-of-words classifier shows
the character-level word encodings, we apply a character CNN with only a small gain over the rule-based baseline. This suggests that this
14
Chemical Reaction Reference Resolution in Patents PatentSemTech, July 15th, 2021, online

Method P R (R m ) F1 (F1m ) sequences within a few sentences. A model that can more effectively
NAIVE -I MM P REV .026 .145 (.133) .044±.011 (.043) learn to capture relations between long text spans is left for future
Pattern matching .332 .207 (.188) .252±.096 (.237) work.
Bag-of-words .302 .236 (.217) .258±.173 (.244)
RR EF B INARY -I MM P REV .595 .560 (.506) .576±.130 (.545) 6.2 Error Analysis
RR EF E2E .390 .347 (.316) .367±.155 (.349)
Examples of the system output are shown in Figure 4, which corre-
O RACLE -I MM P REV .706 .706 (.643) .706±.088 (.672)
sponds to the typical reference cases introduced in Figure 1, namely
Table 3: End-to-end performance of the baseline methods in (a) reference with the example ID, (b) reference with the compound
terms of precision (P), recall (R), and F-score (F1 ). R m and label, and (c) reference with no direct referential expression. Ob-
F1m are calculated taking multiple references into account. viously, the pattern matching baseline fails when the target span
The scores are the average performance of five-fold cross val- does not mention the example label of its parent span. While the
idation. The standard deviations are also displayed for F1 . performance of O RACLE -I MM P REV and RR EF E2E is less tied to
“RR EF B INARY -I MM P REV ” model is trained on the binary classi- specific reference cases, they can fail in many “easy” cases where
fication only and performs I MM P REV decoding. “RR EF E2E ” there is an explicit mention of the parent example ID, indicating
model is trained with the end-to-end objective. Note that that it is difficult to learn such direct references without explicit
O RACLE -I MM P REV uses the true labels for detecting the child training signal that forces the model to attend to particular patterns.
spans and thus the evaluation is not end-to-end in a strict sense. The example in Figure 4 (c) is a special case that involves general
conditions: the first paragraphs describe the general procedure to
synthesize multiple similar compounds, and the examples of specific
compound names are listed in subsequent paragraphs. We can see
subtask is ML-feasible, noting that rich document representations
that RR EF E2E fails to detect references to general conditions that
are needed to achieve reasonable performance.
O RACLE -I MM P REV detects successfully. In some cases, RR EF E2E
End-to-end performance. Table 3 shows the performance on the even fails to recognize them as a child reaction, which might be
end-to-end task. As our methods assume a single parent for each because reference to general conditions is likely to be implicit, i.e.
child span, we evaluate the performance with two different measures. the child description does not explicitly mention the example label
As the primary measure, we calculate the precision (P), recall (R), of its parent description.
and F-scores (F1 ) by regarding the prediction as correct if the model
predicts one of the true parents of the target reaction span. We 7 RELATED WORK
also report the recall (R m ) and F-score (F1m ) based on the multiple Krallinger et al. [17] provide an extensive survey of information
reference scenario, in which, if the target reaction span has multiple retrieval and text mining for chemical literature, including extraction
true parents, the recall is calculated based on the total number of the of chemical reaction information from patents. Several systems have
parents in the denominator.4 been proposed that specifically target text mining from chemical
For the end-to-end task, RR EF B INARY -I MM P REV performs better patents [9, 19, 28]. Previous work has mainly focused on extract-
than RR EF E2E model, even though it is trained only with the bi- ing chemical concepts and relations from individual paragraphs. In
nary classification objective and not trained on parent–child linking. the ChEMU 2020 evaluation lab [8], state-of-the-art methods for
The RR EF E2E model struggles to detect parent–child links effec- chemical named entity recognition and event extraction were evalu-
tively, and achieves lower performance than the simple I MM P REV ated on patent documents. Jessop et al. [10] propose a broader-scope,
decoding. Considering the fact that the RR EF E2E model performs multi-stage system called PatentEye, which first detects experimental
comparably with RR EF B INARY at child span detection, we hypothe- sections in the description part of a patent, and then passes individual
size that even if the model is trained with the end-to-end objective, it sections into downstream modules that extract reaction details (e.g.
primarily learns to detect child reaction spans from non-child ones, title compound, reagents, and analytical data) from the input text.
largely ignoring relations between other reaction spans. Although they mention the presence of references between reaction
These results reveal that it is the parent–child linking step rather descriptions, they do not present specific ways to resolve them.
than the child detection step that makes reaction reference resolution To our knowledge, the first study to take into account references in
difficult. I MM P REV decoding is a strong baseline (0.706 F-score reaction descriptions was by Ai et al. [1]. Their method automatically
given oracle child reaction spans), but there is significant room for extracts chemical reactions from the experimental section of papers
improvement. Despite its success in the traditional co-reference in chemistry journals, representing a reaction as a synthesis frame
resolution task, the RR EF E2E method struggles to learn the pair- containing arguments such as Product, Yield, Role, and Substance.
wise relations between reaction spans. One of the main differences They employ template matching against parsed sentences to detect
between the traditional co-reference resolution task and the reac- phrases related to predefined reaction events. To facilitate copying
tion reference resolution task is text length: the target spans are of relevant information between synthesis frames, they define two
paragraph sequences in a long document rather than short word types of inter-paragraph references, namely general procedures and
4 P = #(correct refs) / #(predicted refs), analogous syntheses. The paragraphs are linked based on certain
R = #(correct refs) / #(spans with at least one true refs), referring expressions such as example IDs and compound labels.
F1 = 2P R / ( P + R) ,
R m = #(correct refs) / #(all true refs), Overall, their system solely relies on handcrafted patterns and rules,
F1m = F1 = 2P R m / ( P + R m ) and thus the cases covered by the system are limited. Although their
15
PatentSemTech, July 15th, 2021, online Yoshikawa et al.

(c) No direct referential expression
(a) Example ID (b) Compound label RX Text
RX Text RX Text 9 General procedure for the synthesis of aryl N-
23 Example 24 ... (guanidino)imines N’-arylhydrazones 7Aa-Ag and 8Aa.
18 A mixture of the obtained ester, ... Column ...
... chromatography of the residue (silica gel-
hexane/ethyl acetate, 9:1) gave Compound Characterization of compounds 7Aa-Ag and 8Aa:
25 Example 26 ...
B11, ... 10 (E)-2-((4’-((E and Z)-(2-(2-Chlorophenyl) hydrazono) methyl)-
... ... (7Aa)
26 Example 27 ...
Yield: 70%; mp 328-330◦ C. 1H NMR, ...
The title compound was synthe-
20 Using 2-ethoxyethanol and following the
procedure for Compound B11 gave Com-
sized by the procedure described in 11 (E)-2-((4’-((E and Z)-(2-(2-Bromophenyl) hydrazono) methyl)-
pound B13, ...
Example 26 starting from ... ... (7Ab)
True 20 → 18 Yield: 64%; mp > 380◦ C. 1H NMR ...
True 26 → 25
Pattern matching 26 → 25 Pattern matching no reference True 11 → 10
O RACLE -I MM P REV 26 → 25 O RACLE -I MM P REV 20 → 18 Pattern matching no reference
RR EF E2E 26 → 23 RR EF E2E 20 → 18 O RACLE -I MM P REV 11 → 10
RR EF E2E 11 → 7

Figure 4: Abbreviated system output examples. Only parts of the patent documents are shown due to the space limitation. “RX” is
the reaction ID, and “True” is the label in the silver set. “X → Y” indicates that the child reaction X refers to the parent reaction Y.

system is reported to produce results in the range of 80–90% for Some aspects of reaction reference are not well explored in this
simple synthesis paragraphs, and 60–70% for complex paragraphs, paper. For example, we did not observe many examples involving
details of their data set and evaluation procedures are not provided. It multiple references and references to general conditions in our silver-
is unclear whether these results would generalise to larger data sets standard dataset. This is partly because the database used to construct
with more variability, and we cannot directly compare their approach our corpus is not designed for this use. Very recently, ChEMU [7],
with ours. a shared task of chemical information extraction was held and the
Finally, we compare our task formulation with text alignment reaction reference resolution task was included as one of the key
task. Text alignment task [34] is arguably one of the most relevant tasks. As a part of the shared task, a dataset with gold annotation of
tasks to reaction reference resolution, as it requires understanding of reaction reference was made publicly available. 5 We will evaluate
long documents and it has an application to citation recommenda- the performance of our baselines on the gold-standard dataset. In
tion [4, 12]. A key difference between text alginment task and our addition, although we have cast the problem as a text processing task,
reaction reference task is that text alignment task mainly focuses on a more complete description of the reaction would involve the com-
alignment of text portions in different documents, whereas our reac- bination of the textual body, figures, and tables in the patent. Thus, a
tion reference task targets to find inner-document references between multi-modal version of the reaction reference task is a possible di-
reaction descriptions. Our task requires not only understanding of rection to explore in the future. Another interesting extension of this
the content of each reaction description based on its contextual infor- taks would be considering cross-document references of chemical
mation but also considering the document structure such as relative reactions.
position of reaction description pairs, as exemplified in Figure 1 (c).
A child reaction often omits a large part of the reaction description ACKNOWLEDGMENTS
(sometimes it is just a name of the target compound), unlike typical
This project is supported with funding to the ChEMU (Cheminfor-
inter-document references where the referer shares some common
matics Elsevier Melbourne University) project from the Australian
context with the source document it refers to. Applying core ideas
Research Council Linkage Project LP160101469, with partner con-
from existing methods for text alignment task to reaction reference
tributions from Elsevier. Some of the computations in this paper
task would be an interesting direction for future work.
were performed using the Spartan HPC-Cloud Hybrid [18] at the
University of Melbourne.
8 CONCLUSION
We have introduced the chemical reaction reference resolution prob- REFERENCES
lem, a largely unexplored yet critical step in information extraction [1] C. S. Ai, Paul E. Blower, and R. H. Ledwith. 1990. Extraction of Chemical Reac-
of complete reaction details presented in chemical patents. We pro- tion Information from Primary Journal Text. Journal of Chemical Information and
Computer Sciences 30, 2 (1990), 163–169. https://doi.org/10.1021/ci00066a012
vided a formal specification of this task, built an evaluation corpus, arXiv:https://pubs.acs.org/doi/pdf/10.1021/ci00066a012
and adapted a neural co-reference architecture to this task, as well [2] Saber A. Akhondi, Hinnerk Rey, Markus Schwörer, Michael Maier,
John Toomey, Heike Nau, Gabriele Ilchmann, Mark Sheehan, Matthias
as introducing several other baseline methods. Our neural method Irmer, Claudia Bobach, Marius Doornenbal, Michelle Gregory, and
achieved promising results for detecting (child) reaction descriptions Jan A. Kors. 2019. Automatic Identification of Relevant Chemical
containing references, but identifying the precise (parent) reference Compounds from Patents. Database 2019 (2019). https://doi.org/10.
1093/database/baz001 arXiv:https://academic.oup.com/database/article-
of child spans is challenging for the model. Key issues for future pdf/doi/10.1093/database/baz001/27636778/baz001.pdf
work are more effective learning of reference patterns in reaction
descriptions, and better handling of long text spans, as well as long-
distance relations. 5 http://chemu2021.eng.unimelb.edu.au/

16
Chemical Reaction Reference Resolution in Patents PatentSemTech, July 15th, 2021, online

[3] Cecilia Arighi, Lynette Hirschman, Thomas Lemberger, Samuel Bayer, Robin 49/58ead90dceaaa
Liechti, Donald Comeau, and Cathy Wu. 2017. Bio-ID Track Overview. In [19] Alexander Johnston Lawson, Stefan Roller, Helmut Grotz, Janusz L Wisniewski,
Proceedings of the BioCreative VI Workshop. 14–19. and Libuse Goebels. 2011. Method and Software for Extracting Chemical Data.
[4] Chandra Bhagavatula, Sergey Feldman, Russell Power, and Waleed Ammar. 2018. German patent no. DE102005020083A1.
Content-Based Citation Recommendation. In Proceedings of the 2018 Conference [20] Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. 2017. End-to-End
of the North American Chapter of the Association for Computational Linguis- Neural Coreference Resolution. In Proceedings of the 2017 Conference on Em-
tics: Human Language Technologies, Volume 1 (Long Papers). Association for pirical Methods in Natural Language Processing. Association for Computational
Computational Linguistics, 238–251. https://doi.org/10.18653/v1/N18-1022 Linguistics, 188–197. https://doi.org/10.18653/v1/D17-1018
[5] James E. Blake and Robert C. Dana. 1990. CASREACT: More than a Million [21] Daniel M. Lowe. 2012. Extraction of Chemical Structures and Reactions from the
Reactions. Journal of Chemical Information and Modeling 30, 4 (1990), 394–399. Literature. Ph.D. Dissertation. University of Cambridge.
https://doi.org/10.1021/ci00068a008 [22] Daniel Magilavy and Polly Pine. 2018. 2,4 Substituted Pyrimidinediamines for
[6] Peter Corbett, Colin Batchelor, and Simone Teufel. 2007. Annotation of Chemical Use in Discoid Lupus. European patent no. EP2683385B1.
Named Entities. In Biological, Translational, and Clinical Language Processing. [23] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.
Association for Computational Linguistics, 57–64. Distributed Representations of Words and Phrases and Their Compositionality. In
[7] Jiayuan He, Biaoyan Fang, Hiyori Yoshikawa, Yuan Li, Saber A. Akhondi, Chris- Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou,
tian Druckenbrodt, Camilo Thorne, Zubair Afzal, Zenan Zhai, Lawrence Cavedon, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). 3111–3119.
Trevor Cohn, Timothy Baldwin, and Karin Verspoor. 2021. ChEMU 2021: Re- [24] Dat Quoc Nguyen, Zenan Zhai, Hiyori Yoshikawa, Biaoyan Fang, Christian
action Reference Resolution and Anaphora Resolution in Chemical Patents. In Druckenbrodt, Camilo Thorne, Ralph Hoessel, Saber A Akhondi, Trevor Cohn,
Advances in Information Retrieval, Djoerd Hiemstra, Marie-Francine Moens, Timothy Baldwin, et al. 2020. ChEMU: Named Entity Recognition and Event
Josiane Mothe, Raffaele Perego, Martin Potthast, and Fabrizio Sebastiani (Eds.). Extraction of Chemical Reactions from Patents. In European Conference on
Springer International Publishing, 608–615. Information Retrieval. Springer, 572–579.
[8] Jiayuan He, Dat Quoc Nguyen, Saber A. Akhondi, Christian Druckenbrodt, [25] Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark,
Camilo Thorne, Ralph Hoessel, Zubair Afzal, Zenan Zhai, Biaoyan Fang, Hiyori Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representa-
Yoshikawa, Ameer Albahem, Jingqi Wang, Yuankai Ren, Zhi Zhang, Yaoyun tions. In Proceedings of the 2018 Conference of the North American Chapter of
Zhang, Mai Hoang Dao, Pedro Ruas, Andre Lamurias, Francisco M. Couto, Jenny the Association for Computational Linguistics: Human Language Technologies,
Copara, Nona Naderi, Julien Knafou, Patrick Ruch, Douglas Teodoro, Daniel Volume 1 (Long Papers). Association for Computational Linguistics, 2227–2237.
Lowe, John Mayfield, Abdullatif Köksal, Hilal Dönmez, Elif Özkirimli, Arzucan https://doi.org/10.18653/v1/N18-1202
Özgür, Darshini Mahendran, Gabrielle Gurdin, Nastassja Lewinski, Christina [26] Stanislaw Rachwal and Yufen Hu. 2018. Highly Photo-Stable Bis-Triazole Fluo-
Tang, Bridget T. McInness, C.S. Malarkodi, Pattabhi Rk Rao, Sobha Lalitha Devi, rophores. US patent no. US20180002337A1.
Lawrence Cavedon, Trevor Cohn, Timothy Baldwin, and Karin Verspoor. 2020. [27] Pedro Ruas, Andre Lamurias, and Francisco M. Couto. 2020. Linking Chem-
An Extended Overview of the CLEF 2020 ChEMU Lab : Information Extraction ical and Disease Entities to Ontologies by Integrating PageRank with Ex-
of Chemical Reactions from Patents. Proceedings of the CLEF 2020 conference tracted Relations from Literature. Journal of Cheminformatics 12, 1 (2020),
(2020). 57. https://doi.org/10.1186/s13321-020-00461-4
[9] Matthias Irmer, Lutz Weber, Timo Böhme, Anett Püschel, Claudia Bobach, and Ulf [28] Nadine Schneider, Daniel M Lowe, Roger A Sayle, Michael A Tarselli, and Gre-
Laube. 2015. OCMiner for Patents. Extracting Chemical Information from Patent gory A Landrum. 2016. Big Data from Pharmaceutical Patents: A Computational
Texts. In Proceedings of the Fifth BioCreative Challenge Evaluation Workshop. Analysis of Medicinal Chemists’ Bread and Butter. Journal of medicinal chemistry
119–123. 59, 9 (2016), 4385–4402.
[10] David M. Jessop, Sam E. Adams, and Peter Murray-Rust. 2011. Mining Chemical [29] Ingemar Starke, Mikael Ulf Johan Dahlstrom, David Blomberg, Suzanne Alenfalk,
Information from Open Patents. Journal of Cheminformatics 3, 1 (2011), 40. Tore Skjaret, and Malin Lemurell. 2017. Benzothiazepine and Benzothiadiazepine
https://doi.org/10.1186/1758-2946-3-40 Derivatives with Ileal Bile Acid Transport (Ibat) Inhibitory Activity for the Treat-
[11] David M. Jessop, Sam E. Adams, Egon L. Willighagen, Lezan Hawizy, and Peter ment Hyperlipidaemia. European patent no. EP1427423B9.
Murray-Rust. 2011. OSCAR4: A Flexible Architecture for Chemical Text-Mining. [30] Jorge A. Vanegas, Sérgio Matos, Fabio González, and José L. Oliveira. 2015.
Journal of Cheminformatics 3, 1 (2011), 41. https://doi.org/10.1186/1758-2946- An Overview of Biomolecular Event Extraction from Scientific Documents.
3-41 Computational and Mathematical Methods in Medicine 2015 (2015), 571381.
[12] Jyun-Yu Jiang, Mingyang Zhang, Cheng Li, Michael Bendersky, Nadav Golbandi, https://doi.org/10.1155/2015/571381
and Marc Najork. 2019. Semantic Text Matching for Long-Form Documents. [31] Chih-Hsuan Wei, Yifan Peng, Robert Leaman, Allan Peter Davis, Carolyn J
In The World Wide Web Conference (WWW ’19). Association for Computing Mattingly, Jiao Li, Thomas C Wiegers, and Zhiyong Lu. 2015. Overview of the
Machinery, 795–806. https://doi.org/10.1145/3308558.3313707 BioCreative V Chemical Disease Relation (CDR) Task. In Proceedings of the Fifth
[13] J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii. 2003. GENIA Corpus— a Semantically BioCreative Challenge Evaluation Workshop, Vol. 14.
Annotated Corpus for Bio-Textmining. Bioinformatics 19, suppl_1 (2003), i180– [32] Hiyori Yoshikawa, Dat Quoc Nguyen, Zenan Zhai, Christian Druckenbrodt,
i182. https://doi.org/10.1093/bioinformatics/btg1023 Camilo Thorne, Saber A. Akhondi, Timothy Baldwin, and Karin Verspoor. 2019.
[14] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Detecting Chemical Reactions in Patents. In Proceedings of the The 17th Annual
Optimization. In 3rd International Conference on Learning Representations, ICLR Workshop of the Australasian Language Technology Association. Australasian
2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Language Technology Association, 100–110.
Bengio and Yann LeCun (Eds.). [33] Zenan Zhai, Dat Quoc Nguyen, Saber Akhondi, Camilo Thorne, Christian Druck-
[15] Martin Krallinger, Obdulia Rabal, Saber A Akhondi, et al. 2017. Overview of the enbrodt, Trevor Cohn, Michelle Gregory, and Karin Verspoor. 2019. Improving
BioCreative VI Chemical-Protein Interaction Track. In Proceedings of the Sixth Chemical Named Entity Recognition in Patents with Contextualized Word Embed-
BioCreative Challenge Evaluation Workshop, Vol. 1. 141–146. dings. In Proceedings of the 18th BioNLP Workshop and Shared Task. Association
[16] Martin Krallinger, Obdulia Rabal, Florian Leitner, Miguel Vazquez, David Sal- for Computational Linguistics, 328–338.
gado, Zhiyong Lu, Robert Leaman, Yanan Lu, Donghong Ji, Daniel M. Lowe, [34] Xuhui Zhou, Nikolaos Pappas, and Noah A. Smith. 2020. Multilevel Text Align-
Roger A. Sayle, Riza Theresa Batista-Navarro, Rafal Rak, Torsten Huber, Tim ment with Cross-Document Attention. In Proceedings of the 2020 Conference on
Rocktäschel, Sérgio Matos, David Campos, Buzhou Tang, Hua Xu, Tsendsuren Empirical Methods in Natural Language Processing (EMNLP). Association for
Munkhdalai, Keun Ho Ryu, SV Ramanan, Senthil Nathan, Slavko Žitnik, Marko Computational Linguistics, 5012–5025. https://doi.org/10.18653/v1/2020.emnlp-
Bajec, Lutz Weber, Matthias Irmer, Saber A. Akhondi, Jan A. Kors, Shuo Xu, main.407
Xin An, Utpal Kumar Sikdar, Asif Ekbal, Masaharu Yoshioka, Thaer M. Dieb,
Miji Choi, Karin Verspoor, Madian Khabsa, C. Lee Giles, Hongfang Liu, Koman-
dur Elayavilli Ravikumar, Andre Lamurias, Francisco M. Couto, Hong-Jie Dai,
Richard Tzong-Han Tsai, Caglar Ata, Tolga Can, Anabel Usié, Rui Alves, Isabel
Segura-Bedmar, Paloma Martínez, Julen Oyarzabal, and Alfonso Valencia. 2015.
The CHEMDNER Corpus of Chemicals and Drugs and Its Annotation Principles.
Journal of Cheminformatics 7, 1 (2015), S2.
[17] Martin Krallinger, Obdulia Rabal, Anália Lourenço, Julen Oyarzabal, and Alfonso
Valencia. 2017. Information Retrieval and Text Mining Technologies for Chem-
istry. Chemical Reviews 117, 12 (2017), 7673–7761. https://doi.org/10.1021/acs.
chemrev.6b00851
[18] Lev Lafayette, Greg Sauter, Linh Vu, and Bernard Meade. 2017. Spartan HPC-
Cloud Hybrid: Delivering Performance and Flexibility. https://doi.org/10.4225/