<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NLPatVCU CLEF 2020 ChEMU Shared Task System Description</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Darshini Mahendran</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabrielle Gurdin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nastassja Lewinski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christina Tang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bridget T. McInnes</string-name>
          <email>btmcinnesg@vcu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Virginia Commonwealth University</institution>
          ,
          <addr-line>Richmond VA 23220</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes our team's participation in the Tracks 1 &amp; 2 from Conference and Labs of the Evaluation Forum (CLEF 2020) Challenge organized by Cheminformatics Elsevier Melbourne University for extracting information over chemical reactions from patents. We discuss our systems: MedaCy, a python-based supervised multi-class entity recognition system, and RelEx, a python-based relation extraction system which includes rule-based and supervised learning pipelines. Our best model for Task 1 obtained an overall relaxed precision of 0.95 and exact precision of 0.87; relaxed recall of 0.99 and exact recall of 0.86; and relaxed F1 score of 0.97 and exact F1 score of 0.87. Our best model for Task 2 obtained an overall precision of 0.80; recall of 0.54; and F1 score of 0.65.</p>
      </abstract>
      <kwd-group>
        <kwd>Named Entity Recognition (NER)</kwd>
        <kwd>(RE)</kwd>
        <kwd>Event Extraction (EE)</kwd>
        <kwd>Relation Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Chemical Patents are a primary source for information about novel chemicals
and chemical reactions. With the increasing volume of such patents, the
dissemination of information about these chemicals and chemical reactions has become
even more labor and time intensive. This information can be used to discover
new chemicals and synthetic pathways[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Therefore, informatics tools for
automatically extracting information from these documents are more important
than ever.
      </p>
      <p>
        The process of extracting relevant information from chemical patents has
been referred to as chemical reaction detection [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and two of the main steps in
this process are identifying the di erent parts of a chemical reaction within these
documents and then identifying the relationships between them. This can be
accomplished with Named Entity Recognition (NER) { the automatic labeling of
certain spans within text corresponding to speci c labels; and Event Extraction
(EE) { the automatic classifying and linking entities based on their relationships
to each other.
      </p>
      <p>
        The CLEF 2020 ChEMU [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] Task 1 aims to create systems to perform NER
over chemical patents as the rst step in chemical reaction detection. Speci cally,
the goal of this task is to automatically identify chemical compounds based on
the role they play in a reaction, as well as other relevant information such as
yield and temperature. The CLEF 2020 ChEMU Task 2 aims to create systems
to perform EE over the entities to identify the individual steps in the reaction.
      </p>
      <p>In this paper, we describe our participation in the CLEF 2020 ChEMU Task 1
and Task 2 Challenge. For this challenge, we used our python framework MedaCy
1 to automatically identify the experimental parameters associated with the
reaction including the trigger words used to link the parameters; and RelEx 2 to
automatically link the trigger words with the experimental parameters to
provide the sequence of steps within the reaction. MedaCy contains a number of
supervised multi-label sequence classi cation algorithms for NER. RelEx
contains a rule-based and supervised learning-based algorithms to identify relations
between entities. Our best models for Task 1 obtained an overall relaxed
precision of 0.95 and exact precision of 0.87; relaxed recall of 0.99 and exact recall of
0.86; and relaxed F1 score of 0.97 and exact F1 score of 0.87. Our best model for
Task 2 obtained an overall precision of 0.80; recall of 0.54; and F1 score of 0.65.
1 https://github.com/NLPatVCU/medaCy/
2 https://github.com/NLPatVCU/RelEx/tree/CLEF 2020
includes 10 di erent entity labels described as shown in the Table 1. The ARG1
event label corresponds to relations between a trigger word (REACTION STEP,
WORKUP) and chemical compound entities. Table 2 shows the event statistics
of the training dataset. The ARGM event label corresponds to the relations
between a trigger word and temperature, time, or yield entities.</p>
      <p>TEMPERATURE</p>
      <p>
        Triggers RWEOARCKTUIOPN STEP
To identify the experimental parameters and triggers from the data, we use
MedaCy's bidirectional Long Short Term Memory (LSTM) units with a
Conditional Random Field (CRF) output layer implemented in PyTorch [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. LSTMs
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] are a type of recurrent neural network. They take the current input
example as well as what they have seen in the past as their input. Hence, they have
two sources of input: their current state and their past states. This allows them
to connect previous observations, such as words in a sentence, and learn
dependencies of these words over arbitrarily long distances. They incorporate the
functionality to identify what information that should be passed to the next
component and what information should not, allowing for only relevant information
to be passed on. For bi-directional LSTMs (biLSTMs), data are processed in
both directions with two separate hidden layers, which are then fed forward into
the same output layer. This allows the system to exploit context in both
directions. A linear-chain CRF is used to assign the nal class probability. CRFs are
a sequence learning algorithm which incorporate the interdependence between
labels into model induction and prediction. Therefore, using a CRF output
allows the model to use the preceding label predictions to inform what labels are
most likely to follow or to occur close together.
      </p>
      <p>
        The input to our biLSTM+CRF model in this work is pre-trained word
embeddings [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] in combination with character embeddings [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. These embeddings
are concatenated and then passed through the network.
      </p>
      <p>
        The word2vec [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] embeddings are derived from a neural network that learns
a representation of a word-word co-occurrence matrix. The character
embeddings are learned using a biLSTM and concatenated onto the word2vec
embeddings. Fig 1 shows a simple example for the term mice. This network is
valuable for providing input especially in the case of out-of-vocabulary words.
In the case of chemical patents, many tokens are long chemical names that do
not show up in the dataset used to train word embeddings, such as, the
reaction product
3-Isobutyl-5-methyl-1-(oxetan-2-ylmethyl)-6-[(2-oxoimidazolidin-1yl)methyl]thieno[2,3-d]pyrimidine-2,4(1H,3H)-dione.
To identify the trigger words, we use our NER system medaCy as described
above. To identify the chemical arguments between the trigger words and the
entities, we use RelEx, a python-based Relation Extraction Framework developed
to identify relations between two entities. The framework contains two main
components: 1) Rule-based Method and 2) Convolutional Neural
Network(CNN)based Method. In this section, we provide a brief overview of each component.
Rule-based Method. RelEx's rule-based method utilizes the co-location
information of the trigger words to determine that, with respect to the entity if
the word is referring to the trigger word or not. We use a breadth- rst search
algorithm to nd the closest occurrence of the trigger word on either side of the
entity and all the closest occurrences of the trigger words within a sentence. For
each entity in the data set, we traverse both sides until the closest occurrence
of the trigger word is found using the provided span values of the entities. We
apply di erent traversal techniques and determine the best traversal technique.
The following are the di erent traversal techniques we use: traverse left-only,
traverse right-only, traverse left- rst-then-right, and vice versa. In this work, we
use left-only traversal where we traverse to the left side of the entity mention
nding the closest occurrence of the trigger words.
CNN-based Method. RelEx's CNN-based method automatically extracts and
classi es the events. CNNs are a form of deep neural networks and mostly consist
of four main layers [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]: embedding, convolution, pooling and feed-forward layers.
CNNs allow word embeddings to train on the input text itself or use pre-trained
word vectors obtained from an external resource. Initially the convolution layer
which is a lter learns using the backpropagation algorithm and extracts
features from the input. Then the maxpooling layer uses the position information
and helps to extract the most signi cant feature from the output of the
convolution lter. Finally the feed-forward layer uses a softmax classi er that performs
classi cation.
      </p>
      <p>In this work, for each Trigger word-Entity pair we perform a binary
classi cation to identify whether there is a relation between the trigger word and
the entity or not. First, we identify and extract the sentence where a Trigger
word-Entity pair pair lies and based on where the text spans are located in the
sentence, we divide the sentence into segments as follows:
{ preceding - tokenized words before the rst concept
{ concept 1 - tokenized words in the rst concept
{ middle - tokenized words between the 2 concepts
{ concept 2 - tokenized words in the second concept
{ succeeding - tokenized words after the second concept</p>
      <p>Figure 2 shows an abstract view of the construction of the CNN-based model.
A segment is represented by a matrix of k N where k is the dimension of the
word embeddings and N is the number of words in a segment. In this work,
we use ChemPatent pre-trained word embeddings. We construct separate
convolution units for each segment and concatenate before the xed-length vector
is fed to the dense layer that performs the classi cation. Each convolution unit
applies a sliding window that processes the segment and feeds the output to the
max-pooling layer to extract important features independent of their location.
The output features of the max-pooling layer of each segment are then attened
and concatenated into a vector before feeding it into the fully connected
feedforward layer. The vector is nally fed into a softmax layer to perform the binary
classi cation whether the relationship exists or not.
3.3</p>
    </sec>
    <sec id="sec-2">
      <title>Experimental Details</title>
      <p>
        Word Embeddings . We explore two pre-trained word embeddings: 1) ChemPatent
embeddings [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] trained over a collection of 84,076 full patent documents (1B
tokens); and 2) WikiPubmed embeddings [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] in our methods.
      </p>
      <p>
        MedaCy . We used PyTorch [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for the implementation of the BiLSTM+CRF
model. Models were trained for 40 epochs, and optimized using stochastic
gradient descent. A window size of 0 generated the best results. Tokenization was
conducted using the SpaCy tokenizer. The labels are strictly the entity types.
RelEx . We used Keras [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for the implementation of the CNN architecture.
We experimented with di erent sliding window sizes, lter sizes, loss functions
for ne-tuning and in this work, small lter sizes generated best results for small
lter sizes. We applied the dropout technique on the output of the
convolution layer to regularize the model. We used Adam and rmsprop optimizers to
minimize our loss function. We trained the models for 5 -10 epochs to avoid
over- tting.
3.4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>For Tasks 1 and 2, we report the precision, recall, and F1 scores. Precision is
the ratio between correctly predicted mentions over the total set of predicted
mentions for a speci c entity; recall is the ratio of correctly predicted mentions
over the actual number of mentions, and F1 is the harmonic mean between
precision and recall. For Task 1, we report both the exact and relaxed results for
each entity category. In exact evaluation, two annotations are equal only if they
have the same tag with exactly matching spans. With the relaxed evaluation,
two annotations are equal if they share the same tag and their spans overlap
with each other.</p>
      <sec id="sec-3-1">
        <title>Results and Discussion</title>
        <p>In this section, we discuss the results for Task 1 and 2.
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Task 1: Named Entity Recognition</title>
      <p>
        Results. Tables 3 - 5 show the exact and relaxed precision, recall, and F1
scores obtained over the testing set for identifying the named entities in each of
our three runs. Run 1 model was trained over the training data using the
biLSTM+CRF with the CheMU Patent embeddings; run 2 model was trained over
the training data using the biLSTM+CRF with the WikiPubmed embeddings;
and run 3 model was trained over the training and development data combined
with the biLSTM+CRF using the WikiPubmed embeddings. Table 6 shows the
baseline results using the CRF-based NER system BANNER [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] provided by the
organizers and the overall results of each of our runs.
Overall, the biLSTM+CRF model trained using patent embeddings returned
the best results, obtaining a 96.78% system-wide relaxed F1 score. This model
performed better than baseline for all entity labels except EXAMPLE LABEL,
for which it performed almost identically. This model's performance is likely
due to the domain-relevant information contained within the embeddings. The
best performance for exact evaluation resulted from the model trained over a
combination of the training and development sets. However, this model's overall
performance was worse than the baseline model. Still, we believe this model's
better performance compared to the other models may be due to the increase of
volume of data used to train by the addition of the development set.
      </p>
      <p>Although the exact results for the models performed slightly worse than
baseline, each of the models performed better on the relaxed results, with the
model trained over patent embeddings performing best. This discrepancy may be
due to the way that MedaCy handles entity classi cation. Within MedaCy, each
Entity
EXAMPLE LABEL
OTHER COMPOUND
REACTION PRODUCT
REAGENT CATALYST
SOLVENT
STARTING MATERIAL
TEMPERATURE
TIME
YIELD OTHER
YIELD PERCENT
System
Entity
EXAMPLE LABEL
OTHER COMPOUND
REACTION PRODUCT
REAGENT CATALYST
SOLVENT
STARTING MATERIAL
TEMPERATURE
TIME
YIELD OTHER
YIELD PERCENT
System
trained over training data with WikiPubmed embeddings
trained over training and development data with WikiPubmed embeddings
individual token is given its own label (`O' for unlabelled entities), so for entities
with spans long than one token, the entity may have only been partially labelled.
For instance, in many cases of the TEMPERATURE label, MedaCy labeled `C'
or ` C,' excluding the number preceding the temperature symbol. This may also
account for why each model performed poorly for the TEMPERATURE label
when evaluating in exact mode, but performed well when evaluating in relaxed
mode.</p>
      <p>Error Analysis. Confusion matrices for the three runs over the testing dataset
are shown in Figures 3 - 5. Rows in the matrix represent annotated entities
and columns represent predicted entities. For instance, in 3, YIELD OTHER
(Y.O) was misidenti ed as YIELD PERCENT (Y.P.) 28 times. Table 7 shows
the acronym of each of the labels used in the confusion matrices. The colors
in the matrix indicate the density of the entities and the system annotations.
The bottom right corner of each matrix is darker because of the large number
of OTHER COMPOUND (O.C) entities in the dataset.</p>
      <p>The majority of mislabeling occurred when more speci c entity labels, such
as STARTING MATERIAL (S.M.), REAGENT CATALYST (R.C.), or
REACTION PRODUCT (R.P.), were predicted to be OTHER COMPOUND (O.C.).
This may be because the models were able to predict that certain spans
contained chemical named, but were too general and unable to predict the speci c
label. Additionally, spans annotated as OTHER COMPOUND (O.C.) were
consistently predicted to be more speci c types of compounds. It seems that while
the models are able to predict which spans contain chemical compounds, they
are less able to distinguish between the types of compounds.
4.2</p>
    </sec>
    <sec id="sec-5">
      <title>Task 2: Event Extraction</title>
      <p>Results. Tables 8 - 10 show the exact match precision, recall, and F1 scores
obtained over the testing set for each of our three runs. Run 1 used our RelEx's
CNN-based system trained over the ChemPatent embeddings with the trigger
words identi ed using medaCy's biLSTM+CRF trained over the ChemPatent
embeddings. Run 2 used our RelEx's rule-based system with the trigger words
identi ed using medaCy's biLSTM+CRF trained with ChemPatent embeddings.
Run 3 used our rule-based system with the trigger words identi ed using medaCy's
biLSTM+CRF trained with WikiPubmed embeddings. Table 11 shows the
comparison with the co-occurrence baseline provided by the organizers of the ChEMU
challenge and the overall results from each of our runs.</p>
      <p>The overall results show that all three runs obtain a higher precision and
F1 score than the baseline but not recall. The system results show that the
CNN-based (Run 1) model obtains a higher overall F1 score than both the
rulebased (Run 2 &amp; 3) models. When training with CNN the overall precision of
the predictions is high but the recall is low, this shows that CNN failed to
classify all instances but was able to classify most of the predicted instances
correct. Also, we can see the performance of each event class (Trigger
wordEntity pair) in Run 1 is proportional to the number of instances in the
training set. For example, event classes, REACTION STEP-REAGENT CATALYST
tem with trigger words identi ed using medaCy trained with CheMU patent
embeddings
tem with trigger words identi ed using medaCy trained with CheMU patent
embeddings
WORKUP
WORKUP</p>
      <p>Entity
OTHER COMPOUND
REACTION PRODUCT
REAGENT CATALYST
SOLVENT
STARTING MATERIAL
OTHER COMPOUND
REACTION PRODUCT
SOLVENT
STARTING MATERIAL
TEMPERATURE
TIME
YIELD OTHER
YIELD PERCENT
TEMPERATURE</p>
      <p>TIME</p>
      <p>System
WORKUP
WORKUP</p>
      <p>Entity
OTHER COMPOUND
REACTION PRODUCT
REAGENT CATALYST
SOLVENT
STARTING MATERIAL
OTHER COMPOUND
REACTION PRODUCT
REAGENT CATALYST
SOLVENT
STARTING MATERIAL
TEMPERATURE
TIME
YIELD OTHER
YIELD PERCENT
TEMPERATURE
TIME
YIELD OTHER</p>
      <p>YIELD PERCENT
and REACTION STEP-STARTING MATERIAL, have more training instances
and obtain a high F1 score, whereas the event classes, WORKUP-SOLVENT
and WORKUP-STARTING MATERIAL, have a very few instances and obtain
an F1 score of zero.</p>
      <p>The rule-based models (Run 2 &amp; 3) obtain comparatively high recall and
low precision. The rule-based methods predicts all the closest occurrences of
the trigger words of the entity compounds in the traversal area, however many
WORKUP
WORKUP</p>
      <p>Entity
OTHER COMPOUND
REACTION PRODUCT
REAGENT CATALYST
SOLVENT
STARTING MATERIAL
OTHER COMPOUND
REACTION PRODUCT
REAGENT CATALYST
SOLVENT
STARTING MATERIAL
TEMPERATURE
TIME
YIELD OTHER
YIELD PERCENT
TEMPERATURE
TIME
YIELD OTHER</p>
      <p>YIELD PERCENT
System
predictions are false positives. Since the number of instances in the training set
does not a ect the rule-based methods, the performance of the event classes that
have few instances performs better. For example, the event classes,
WORKUPTIME and REACTION STEP-OTHER COMPOUND, obtained zero F1 score
with CNN-based model but performed better with the rule-based models
obtaining F1 scores of 0.43 and 0.88, respectively.</p>
      <p>Table 12 shows the arithmetic mean and weighted arithmetic mean of the
precision, recall, and F1 score for both trigger word classes for each run. Bold
terms indicate the best performance for each trigger word. We can see the
CNNbased method (Run 1) performs well with the REACTION STEP classes and
poor with WORKUP classes. This is because most of the REACTION STEP
classes have more instances for the CNN to train on but most of the WORKUP
classes have few instances. This is the same reason the rule-based methods (Run
2 &amp; 3) perform better with those classes. The weighted arithmetic mean results
contradict with the arithmetic mean results, as we can see a notable di erence in
the F1 score when comparing the classes of REACTION STEP and WORKUP.
The WORKUP event class obtains a better performance due to the signi cant
imbalance between the individual event classes. The weighted arithmetic mean
WORKUP</p>
      <p>Entity
Run 1
Run 2
Run 3
Run 1
Run 2
Run 3
allocates more weight to the classes that have more instances and vice versa we
see an improvement in the performance of both classes.</p>
      <p>Error Analysis. Tables 13 and 14 show a detailed error analysis of the
CNNbased (Run 1) and the rule-based method (Run 2) respectively where the trigger
words are trained with ChemPatent embeddings. Here we report the number of
true positives (tp), false positives (fp), and false negatives (fn) and also "fpm"
and "fnm", two metrics that represent the number of false positives and false
negatives, of which the corresponding entities are missing.</p>
      <p>The results are consistent with the previous observations from the tables
8, 9 and 10. We can see REACTION STEP classes performed better than the
WORKUP classes. It is safe to say that, class imbalance plays a signi cant role
in the miss-annotation of the instances. The results also show that the
rulebased model signi cantly over annotates given the number of false positives. For
example, the rule-based model (Run 2) identi ed 379 instances of the
WORKUPREACTION PRODUCT event class with only four being true positives.
WORKUP
WORKUP</p>
      <p>System</p>
      <p>Entity
OTHER COMPOUND
REACTION PRODUCT
REAGENT CATALYST
SOLVENT
STARTING MATERIAL
OTHER COMPOUND
REACTION PRODUCT
REAGENT CATALYST
SOLVENT
STARTING MATERIAL
TEMPERATURE
TIME
YIELD OTHER
YIELD PERCENT
TEMPERATURE
TIME
YIELD OTHER</p>
      <p>YIELD PERCENT
WORKUP
WORKUP</p>
      <p>System
5</p>
      <sec id="sec-5-1">
        <title>Conclusion</title>
        <p>Entity
OTHER COMPOUND
REACTION PRODUCT
REAGENT CATALYST
SOLVENT
STARTING MATERIAL
OTHER COMPOUND
REACTION PRODUCT
REAGENT CATALYST
SOLVENT
STARTING MATERIAL
TEMPERATURE
TIME
YIELD OTHER
YIELD PERCENT
TEMPERATURE
TIME
YIELD OTHER
YIELD PERCENT
We trained three biLSTM+CRF models over di erent pre-trained word
embeddings, as well as di erently sized datasets. Results show that while these models
did not outperform the baseline model when evaluating exact span matches, the
models outperformed the baseline when evaluating in relaxed mode. A model
trained using word embeddings trained over chemical patents performed best
when evaluating in relaxed mode, while a model trained using biomedical word
embeddings and a combination of the training and development datasets
performed best when evaluated on exact span matches. Errors primarily occurred
because of issues with the model distinguishing between di erent entity labels,
such as models mislabeling entities annotated as OTHER COMPOUND for
more speci c labels, like REACTION PRODUCT or STARTING MATERIAL.
Additionally, the way that MedaCy predicts entity labels may have contributed
to errors with labeling entity spans fully. Future work will focus on better
distinguishing between di erent types of chemical compounds, as well as looking
into models based on language models.</p>
        <p>We used one CNN-based model and two rule-based models to extract events
and according to the results, all three models outperformed the baseline model.
Results show that the CNN-based method outperforms the rule-based
methods, especially with the REACTION STEP classes as those classes have more
instances to train on. Meanwhile, as the rule-based methods do not require
training instances to train they perform better with WORKUP classes. In the future,
we plan to explore building a hybrid model with both CNN and rule-based
methods to increase the performance.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bort</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baskin</surname>
            ,
            <given-names>I.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marcou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horvath</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madzhidov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varnek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gimadiev</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nugmanov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mukanov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Discovery of novel chemical reactions by deep generative recurrent neural network (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Charles</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Project title</article-title>
          . https://github.com/charlespwd/project-title (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gridach</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Character-level neural network for biomedical named entity recognition</article-title>
          .
          <source>Journal of biomedical informatics 70</source>
          ,
          <volume>85</volume>
          {
          <fpage>91</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bidirectional lstm-crf models for sequence tagging</article-title>
          .
          <source>arXiv preprint arXiv:1508</source>
          .
          <year>01991</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Leaman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez</surname>
          </string-name>
          , G.:
          <article-title>Banner: an executable survey of advances in biomedical named entity recognition</article-title>
          .
          <source>In: Biocomputing</source>
          <year>2008</year>
          , pp.
          <volume>652</volume>
          {
          <fpage>663</fpage>
          . World Scienti c (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>3111</volume>
          {
          <issue>3119</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshikawa</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Druckenbrodt</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thorne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoessel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhondi</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohn</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , et al.:
          <article-title>Chemu: Named entity recognition and event extraction of chemical reactions from patents</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . pp.
          <volume>572</volume>
          {
          <fpage>579</fpage>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grishman</surname>
          </string-name>
          , R.:
          <article-title>Relation extraction: Perspective from convolutional neural networks</article-title>
          .
          <source>In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing</source>
          . pp.
          <volume>39</volume>
          {
          <issue>48</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Paszke</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gross</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Massa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lerer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bradbury</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chanan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Killeen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gimelshein</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antiga</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desmaison</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kopf</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DeVito</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raison</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tejani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chilamkurthy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steiner</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bai</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chintala</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Pytorch:
          <article-title>An imperative style, high-performance deep learning library</article-title>
          . In: Wallach,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Beygelzimer</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>d</surname>
            Alche-Buc,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garnett</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          , pp.
          <volume>8024</volume>
          {
          <fpage>8035</fpage>
          . Curran Associates, Inc. (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pyysalo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ginter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakoski</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ananiadou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Distributional semantics resources for biomedical text processing</article-title>
          .
          <source>In: The 5th International Symposium on Languages in Biology and Medicine</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brandt</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Construction of a generic reaction knowledge base by reaction data mining</article-title>
          .
          <source>Journal of Molecular Graphics and Modelling</source>
          <volume>19</volume>
          (
          <issue>5</issue>
          ),
          <volume>427</volume>
          {
          <fpage>433</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Yoshikawa</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Druckenbrodt</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thorne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhondi</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verspoor</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Detecting chemical reactions in patents (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>