<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BOUN-REX at CLEF-2020 ChEMU Task 2: Evaluating Pretrained Transformers for Event Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hilal Donmez</string-name>
          <email>hilal.donmez@boun.edu.tr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abdullatif Koksal</string-name>
          <email>abdullatif.koksal@boun.edu.tr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elif Ozkirimli</string-name>
          <email>elif.ozkirimli@boun.edu.tr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arzucan O zgur</string-name>
          <email>arzucan.ozgur@boun.edu.tr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Chemical Engineering, Bogazici University</institution>
          ,
          <country country="TR">Turkey</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Engineering, Bogazici University</institution>
          ,
          <country country="TR">Turkey</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Pharma International Informatics Data and Analytics Chapter</institution>
          ,
          <addr-line>F. Ho mann-La Roche AG</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe our models and results designated for CLEF-2020 ChEMU Task 2 [4], event extraction for chemical patent documents. We make use of the recent advances in pretrained transformer architectures such as BERT and BioBERT. We compare several transformers with di erent settings in order to improve performance. Our best performing model with BioBERT transformer architecture and AdamW optimizer achieves 0.7234 exact F1 score on the test dataset.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Chemical information in patents is an essential resource for researchers working
on chemical exploration and reactions. As the number of patents grows rapidly,
Natural Language Processing (NLP) approaches are widely used to extract
chemical information from patents so as to reduce the time and e ort spent. Most
previous studies on chemical information extraction focus on chemical named
entity recognition (NER) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] thanks to publicly available annotated corpora. On
the other hand, there is a limited number of studies on chemical event extraction
from patents.
      </p>
      <p>Event extraction from patents contains detection of event trigger word, event
trigger type, and event type. Figure 1 illustrates an example sentence of the event
extraction task in the dataset released by Cheminformatics Elsevier Melbourne
University (ChEMU). In this example, room temperature and 30 minutes are
given as entities with their corresponding types: TEMPERATURE and TIME.
After stirred is detected as a trigger word for both entities, two event types (both
of type ARGM) are determined separately according to the relevant entity type.</p>
      <p>
        In this work, we investigated the impact of various transformer architectures
with di erent parameters on event extraction from patents by conducting
several experiments. We also explored the e ects of the pretraining corpus of
transformers by comparing BERT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and BioBERT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Besides, we investigated the
signi cance of di erent optimizers such as Adam, AdamW, and SGD for the
netuning of transformers for this task.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Determining the semantic relation between entities is an important scienti c
problem in various domains such as biomedical text, digital text, and
governmental documents. Recently, deep neural networks have been widely used to
identify the relations between entities. Previous research studies that use deep
learning for relation extraction make use of CNN [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and RNN [20] models by
taking sentence representations with word vectors such as Word2Vec [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and
GloVe [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] in order to extract features automatically instead of hand-crafted
features [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Recent studies on relation extraction have been based on the
transformer architecture [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] trained on large amounts of unlabeled data to improve
the state-of-the-art on several natural language processing tasks. In [19], a
pretrained transformer model is utilized to extract e cient relation representations
from text.
      </p>
      <p>
        In event extraction, earlier neural network models enhanced CNN [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and
RNN [18] with di erent kinds of word representations to determine the locations
and types of trigger words. In addition, structured information bene ting from
dependency trees [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and knowledge bases [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is exploited by neural networks
to improve event extraction performance. Lately, pretrained transformer based
models have gained popularity for event extraction. In [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], trigger and
argument extractor models obtain feature representations using BERT, a pretrained
transformer model.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methodology and Data</title>
      <sec id="sec-3-1">
        <title>Data</title>
        <p>We use the dataset released for the ChEMU tasks on information extraction
from chemical patents. The dataset contains chemical patent documents with
annotation les for training, development, and test sets. Entities with their types
and relations between these entities are included in the annotation les. There
are 10 di erent types of entity annotations for the Event Extraction Task in
ChEMU. Table 1 shows the annotated types of entities in the dataset.</p>
        <p>The event extraction problem focuses on event trigger word detection,
trigger type detection, and event type prediction. Event trigger words whose types
are REACTION STEP or WORKUP are identi ed and the chemical entity
arguments of the events are determined. The relation between an argument and
a trigger word is labeled as a semantic argument role label, which is Arg1 or
ArgM. The relation between a trigger word and a temperature, time or yield
entity is labeled as ArgM, whereas the relation between a trigger word and an
entity having one of the other entity types is labeled as Arg1. Table 2 contains
the statistics of the ChEMU Dataset.
4 We were able to use 713 out of the 900 documents in the train set due to a problem
during the downloading process.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Preprocessing</title>
        <p>
          Our preprocessing steps involve sentence splitting and adding entity markers.
For simplicity, we consider the relations that are present in single sentences, and
we split the documents into sentences via the GENIA Sentence Splitter [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. For
each entity in a sentence, we construct sentence-entity pairs and predict events
and trigger words from these pairs. On the other hand, there are 121 entities that
have relations with more than one trigger word in our training set. We ignore
these kinds of entities for event trigger word detection.
        </p>
        <p>
          We need to explicitly identify an entity to nd the corresponding relation
and trigger word in a sentence. Therefore, we add speci c markers called &lt;E&gt;
and &lt;/E&gt; before and after the entities for the model to identify the entities
by following the discussion in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Moreover, we create di erent representations
for each sentence having more than one entity by applying the marker method.
Hence, the sentence representation is distinct for each entity in the same sentence
having more than one entities. The following examples show that there are two
di erent representations for hexanes and silica, which are located in the same
sentence.
        </p>
        <p>{ The solvent was removed in vacuo, and the crude product was
puried by ash chromatography (silica, 100% &lt;E&gt; hexanes &lt;/E&gt; to
9:1 hexanes/EtOAc) to give a pale-yellow viscous oil (3.83 g, 86%).
{ The solvent was removed in vacuo, and the crude product was
puried by ash chromatography (&lt;E&gt; silica &lt;/E&gt;, 100% hexanes to
9:1 hexanes/EtOAc) to give a pale-yellow viscous oil (3.83 g, 86%).
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Model</title>
        <p>Problem De nition: For a given sentence S with an entity et with type t, the
objectives are to nd the trigger word in S, its type including None, and the
relation between the trigger word and et from a set of prede ned event types. As
event types are determined according to entity types, we do not make a model
for event type detection. Hence, we focus on trigger type and trigger word
detection. If there is a trigger word for an entity in a given sentence, the event type
is found by simple rules.</p>
        <p>Two objectives are selected to address this problem. Our base model is a
transformer-based pretrained architecture, which extracts a xed-length
sentence representation and token representations from an input sentence with
entity markers indicating the entity's location. The xed-length sentence
representation is utilized to detect the type of the trigger word in the sentence with
a given annotated entity. If there is a trigger word in the sentence, the event
type is determined by the type of the given entity from a simple lookup table
shown in Table 3.</p>
        <p>
          We propose an approach similar to question answering methods [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] to nd
the span of the trigger word. Our trigger word span model predicts probabilities
of start and end tags with the token representations which are produced by the
transformer-based pretrained architecture. Trigger word span is the sequence
between tokens with the highest start and end probabilities.
        </p>
        <p>Our proposed architecture is jointly trained, as shown in Figure 2. Di erent
pretrained transformers with several optimizers, learning rates, and weight
decays are evaluated on the development set by exact F1 scores. The considered
settings are summarized below. The con guration for our best model is shown
in bold.</p>
        <p>{ Transformer Architectures: BioBERT5, BERTLarge6, BioBERTLarge7
{ Optimizer: AdamW, Adam, SGD
{ Learning Rate: 1e 5, 1e 6, 1e 4, 1e 3
{ Weight Decay: 0, 0:1, 0:01
5 https://huggingface.co/monologg/biobert_v1.1_pubmed
6 https://github.com/google-research/bert
7 https://huggingface.co/trisongz/biobert_large_cased
iandetPr
.
.</p>
        <p>irgeT
iDstrbuon
iDstrbuon
.
.</p>
        <p>.
.
nstaioeRpr
nIput
rokup*W
irgeT
eDtcion
nIput
:
:</p>
        <p>LCS
LCS
1w w 1</p>
        <p>E&lt;&gt;
1w w i
/&lt;E&gt;
w
n</p>
        <p>ESP SEP</p>
        <p>
          As shown in Table 4, the best performing pretrained transformer model is
BioBERT with AdamW optimizer, even though the complexities of BERTLarge
and BioBERTLarge are higher than BioBERT. BERTLarge and BioBERTLarge
have 24 layers, 16 heads, and 340 million parameters while BioBERT has 12
layers, 12 heads, and 110 million parameters. Besides, while BERTLarge is
pretrained on English Wikipedia and Book Corpus, BioBERT and BioBERTLarge
are pretrained on additional resources, i.e., Pubmed Abstracts and PMC full-text
articles. Table 4 shows that BioBERT and BioBERTLarge perform better than
BERTLarge. Our results suggest that the domain similarity between chemical
patent documents and the pretraining corpus of BioBERT and BioBERTLarge
leads to better performance. In [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], it is shown that the generalization
capability of the AdamW optimizer is better than the Adam and SGD optimizers and
our results support this claim.
        </p>
        <p>There are two di erent objectives, namely trigger type detection and trigger
word span detection, in our nal architecture. Table 5 contains the results of
the two objectives separately on the development set. The trigger type detection
model achieves 0.9848 F1 score, whereas the accuracy of our trigger word span
model is 0.9524.</p>
        <p>Our nal model's performance is summarized in Table 6 for all objectives:
trigger word, trigger type and event type detections. It achieves 0.7407 and
0.7234 in the main metric (exact F1) on the development and test sets,
consecutively.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Work</title>
      <p>In this paper, we introduce a transformer based approach for event extraction in
chemical patent documents. We compare several pretrained transformer models
with di erent settings and show that BioBERT's performance with the AdamW
optimizer is better than both BERTLarge and BioBERTLarge for this task. Finally,
we report our best model's performance separately on the trigger type and trigger
word span detection tasks. Our best model, BioBERT, achieves 0.7234 exact F1
score on the test set.</p>
      <p>As future work, we plan to extend our study to enable the detection of
multiple trigger words in a sentence by using a sequence labeling setup with
the BIO encoding. Thus, we will consider entities having relations with more
than one trigger word. In addition, we will design a two-stage model that rstly
detects the trigger word span and then classi es the trigger type as an alternative
to our jointly trained model.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>GEBIP Award of the Turkish Academy of Sciences (to A.O.) is gratefully
acknowledged.
18. Zhang, W., Ding, X., Liu, T.: Learning target-dependent sentence representations
for chinese event detection. In: China Conference on Information Retrieval. pp.
251{262. Springer (2018)
19. Zhao, Y., Wan, H., Gao, J., Lin, Y.: Improving relation classi cation by entity pair
graph. In: Asian Conference on Machine Learning. pp. 1156{1171 (2019)
20. Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., Xu, B.: Attention-based
bidirectional long short-term memory networks for relation classi cation. In: Proceedings
of the 54th Annual Meeting of the Association for Computational Linguistics
(Volume 2: Short Papers). pp. 207{212 (2016)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Baldini</given-names>
            <surname>Soares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>FitzGerald</surname>
          </string-name>
          , N.,
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kwiatkowski</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Matching the blanks: Distributional similarity for relation learning</article-title>
          .
          <source>In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <volume>2895</volume>
          {
          <fpage>2905</fpage>
          . Association for Computational Linguistics, Florence,
          <source>Italy (Jul</source>
          <year>2019</year>
          ). https://doi.org/10.18653/v1/
          <fpage>P19</fpage>
          -1279, https://www.aclweb.org/ anthology/P19-1279
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>4171</volume>
          {
          <fpage>4186</fpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          ). https://doi.org/10.18653/v1/
          <fpage>N19</fpage>
          -1423, https://www. aclweb.org/anthology/N19-1423
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dodge</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ilharco</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwartz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farhadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajishirzi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Finetuning pretrained language models: Weight initializations, data orders, and early stopping</article-title>
          . arXiv preprint arXiv:
          <year>2002</year>
          .
          <volume>06305</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>D.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhondi</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Druckenbrodt</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thorne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoessel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Afzal</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshikawa</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Albahem</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cavedon</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohn</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verspoor</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Overview of chemu 2020: Named entity recognition and event extraction of chemical reactions from patents</article-title>
          . In: Arampatzis,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Kanoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Tsikrika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Vrochidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Joho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Lioma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Eickho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Neveol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <article-title>Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Eleventh International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ), vol.
          <volume>12260</volume>
          . Lecture Notes in Computer Science (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kambhatla</surname>
          </string-name>
          , N.:
          <article-title>Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction</article-title>
          .
          <source>In: Proceedings of the ACL Interactive Poster and Demonstration Sessions</source>
          . pp.
          <volume>178</volume>
          {
          <issue>181</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rabal</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leitner</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vazquez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salgado</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leaman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          , et al.:
          <article-title>The chemdner corpus of chemicals and drugs and its annotation principles</article-title>
          .
          <source>Journal of cheminformatics 7(1)</source>
          ,
          <volume>1</volume>
          {
          <fpage>17</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoon</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>So</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
          </string-name>
          , J.:
          <article-title>Biobert: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          .
          <source>Bioinformatics</source>
          <volume>36</volume>
          (
          <issue>4</issue>
          ),
          <volume>1234</volume>
          {
          <fpage>1240</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            , H., Han,
            <given-names>J</given-names>
          </string-name>
          .:
          <article-title>Biomedical event extraction based on knowledgedriven tree-lstm</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <volume>1421</volume>
          {
          <issue>1430</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Leveraging framenet to improve automatic event detection</article-title>
          .
          <source>In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          . pp.
          <volume>2134</volume>
          {
          <issue>2143</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Loshchilov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Decoupled weight decay regularization</article-title>
          .
          <source>In: International Conference on Learning Representations</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grishman</surname>
          </string-name>
          , R.:
          <article-title>Event detection and domain adaptation with convolutional neural networks</article-title>
          .
          <source>In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)</source>
          . pp.
          <volume>365</volume>
          {
          <issue>371</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          ), http://www.aclweb.org/anthology/D14-1162
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. S tre, R.,
          <string-name>
            <surname>Yoshida</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yakushiji</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miyao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matsubayashi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohta</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Akane system: protein-protein interaction pairs in biocreative2 challenge, ppi-ips subtask</article-title>
          .
          <source>In: Proceedings of the second biocreative challenge workshop</source>
          . vol.
          <volume>209</volume>
          , p.
          <fpage>212</fpage>
          .
          <string-name>
            <surname>Madrid</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is all you need</article-title>
          .
          <source>ArXiv abs/1706</source>
          .03762 (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Exploring pre-trained language models for event extraction and generation</article-title>
          .
          <source>In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          . pp.
          <volume>5284</volume>
          {
          <issue>5294</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Relation classi cation via convolutional deep neural network</article-title>
          .
          <source>In: Proceedings of COLING</source>
          <year>2014</year>
          ,
          <source>the 25th International Conference on Computational Linguistics: Technical Papers</source>
          . pp.
          <volume>2335</volume>
          {
          <issue>2344</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>