<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transfer Learning for Scienti c Data Chain Extraction in Small Chemical Corpus with joint BERT-CRF Model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Na Pang</string-name>
          <email>pangna@mail.las.ac.cn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Li Qian</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Weimin Lyu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jin-Dong Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center of Basic Molecular Science (CBMS), Department of Chemistry, Tsinghua University</institution>
          ,
          <addr-line>Beijing, 100084</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>City University of New York</institution>
          ,
          <addr-line>New York</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Library, Information and Archives Management, University of Chinese Academy of Science</institution>
          ,
          <addr-line>Beijing 100190</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>National Science Library, Chinese Academy of Science</institution>
          ,
          <addr-line>Beijing 100190</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Computational chemistry develops fast in recent years due to the rapid growth and breakthroughs in AI. Thanks for the progress in natural language processing, researchers can extract more ne-grained knowledge in publications to stimulate the development in computational chemistry. While the works and corpora in chemical entity extraction have been restricted in biomedicine or life science eld instead of chemistry eld, we build a new corpus in chemical bond eld annotated for 7 types of entities: compound, solvent, method, bond, reaction, pKa and pKa value. This paper utilizes a combined BERT-CRF model to build scienti c chemical data chains by extracting 7 chemical entities and relations from publications. And we propose a joint model to extract the entities and relations simultaneously. Experimental results on our Chemical Special Corpus demonstrate that we achieve state-of-art and competitive NER performance.</p>
      </abstract>
      <kwd-group>
        <kwd>transfer learning pre-training ne-tuning entity extraction relation extraction scienti c data chain extraction BERT-CRF</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Recently, AI has stimulated the application of chemistry in many elds, such
as computational chemistry and synthetic chemistry. Several tasks have
highlighted the signi cance of the AI's role in chemistry. Scientists utilized deep
neural networks and Monte Carlo tree to plan chemical syntheses and discover
more retrosynthetic routes in short time[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], proposed machine learning method
to perform chemical reactions and analysis faster than they could be performed
manually and predict the reactivity of possible reagent combinations[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and
borrowed word2vec of NLP to create unsupervised machines Atom2Vec to predict
materials properties[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. There is no doubt that AI is revolutionizing our
understanding on chemistry. In chemistry, especially in computational chemistry,
though the chemical bond energy (pKa) is essential, most values existing in
scienti c papers are extracted by experts manually and there exists no work to try
to extract the pKa with the method of NLP.
      </p>
      <p>Our project is based on the construction of iBond 3.0 databank (iBond:
Internet Bond-energy Databank, website: http://ibond.chem.tsinghua.edu.cn or
http://ibond.nankai.edu.cn/). To aid the construction of the iBond databank,
we consider automatically extracting scienti c data chains to save the workload
of experts. But extracting the scienti c data chains can never be an easy task.
In particular, we consider three challenges in the application of scienti c data
chains extraction: (1) The existing corpora may not satisfy the aim of our task
because they focus on general chemicals or drugs; (2) The popular chemical
NER systems use the machine learning methods or deep learning methods, but
it requires abundant data to train; (3) Unlike the start-of-art method to extract
triplets fE1, relation, E2g, the entities are not con ned in triplets and some of
them are irrelevant to our relation extraction and some of them do not have
1:1 relation, but more complex 1:n or n:1 relations. These challenges makes
extracting scienti c chemical data chains signi cantly a tough task.</p>
      <p>
        The rst challenge is caused by corpus accessibility. Currently most
experiments to extract named entities and corpora are in the eld of biomedicine or life
science which focus on extracting the chemical drugs. And the corpora may not
be accessible, such as, PubMed corpus and Sciborg corpus [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Considering the
need of automatically extracting chemical bond energy to promote the
development in computational chemistry, and solving challenges of semantic problems
and numerous unknown words, we create a new corpus of papers of chemical
bond eld.
      </p>
      <p>The second challenge is caused by the ability of start-of-art deep learning
architecture. The deep learning methods usually requires big data to train in
order to get a better model, however the existing corpus for data chain extraction
is not only hard to obtain but also in small scale. What's worse, most corpus
focus on other elds instead of chemical eld. Considering this situation, we also
try to use transfer learning method to alleviate the challenge by pre-training on
large out-domain chemistry corpus before training on chemical bond in-domain
speci c corpus.</p>
      <p>The third challenge is caused by the aim of our project and the
characteristic of our corpus. In our project, we not only extract the entities which have
relations, but also extract the irrelevant entities to aid researchers to read and
con rm the right relations extracted by our system. And the multiple entities
in one relation is more complex than the traditional triplets. For this reason, we
construct our own tagging scheme to extract more extensive entities with the
combined BERT-CRF model to extract name entity and relations simultaneously
to avoid possible loss during above two tasks.</p>
      <p>Our contributions: (1) We constructed a speci c ChemBE corpus; (2) We
utilize transfer learning on pre-training with large relevant corpus to make sure
that we could have a competitive result on our minimal dataset; (3) We use
BERT-CRF model which combines the BERT model and the CRF model and
utilize a joint tagging scheme to extract entities and relations simultaneously
and build our chemical scienti c data chain. The code and data sample is on the
github (https://github.com/quewentian/ChemBE-bert-CRF).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>
        Entity extraction and relation extraction. Named entity extraction is a
main subtask of information extraction. The common NER methods are based
on rules, dictionaries, machine learning and deep learning. There are numerous
experiments conducted in many elds[4{6]. Relation Extraction is also a crucial
task of information extraction. There are 4 types of methods of extracting
relationships: fully supervised learning methods[
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], distant supervised learning
methods[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], tree based methods[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and joint learning with entity and relation
methods[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. These 4 methods can be classi ed into 2 models: pipeline models
and joint models. The previous three methods are pipeline models which treat
entity extraction and relation extraction as two separate tasks, and the last one
regards them as one task[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        In this paper, we focus on the joint learning method to learn entities and
relations simultaneously. The joint learning model usually has two methods:
parameter sharing[
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ], and tagging scheme[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Parameter sharing model mainly
utilizes the sharing parameters of the bottom layers and do di erent tasks via
the upper layers. Tagging scheme model uses new tagging method to convert
two tasks into one task and thus one end-to-end model can solve two tasks in
the meantime.
      </p>
      <p>
        Scienti c data extraction. Except the traditional entities, there exists a
lot of new trials to explore the possibility of extracting the scienti c data in
the scienti c papers to mine the latent potential of scienti c papers, such as
extracting measured information from text to form a numeric value paired with
a unit of measurement with the method of rules[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], utilizing CRF to extract
numerical attributes from discharge summary records and SVM to associate
correct relation between attributes and values[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        There are also some works concerning chemistry eld[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The most tasks
relate to the chemical entities are in the biomedicine domain[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], since researchers
do not have rich annotated data to learn in the eld of Chemistry. For example,
in the eld of biomedicine, Xie J et al. proposed a method of Bi-LSTM
network to extract to extract e-cigarette components[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Until 2015, BioCreative
put forward CHEMDNER task to specially learn chemical entities and chemical
formula[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        But still, there are several problems about the chemical entity extraction: (1)
As for corpora[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], they are mainly in the eld of biomedicine ; (2) As for the
techniques, the researchers are concentrated in machine learning in chemistry
eld and deep learning is only applied to biomedical eld in English chemical
corpora. Researchers have to extract all types of features, thus the
generalization ability is not strong. And also, we need mass of data to train the model.
Therefore, we need to establish our own speci c chemical corpus and apply some
techniques to our small corpus.
      </p>
      <p>
        Transfer learning. Transfer learning could help have better results on
small dataset. Upstream unsupervised pre-training can help use less source and
time to do the downstream tasks. There are two methods to apply the
pretrained language representations to downstream tasks: feature-based approach
(eg, ELMO[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]) and ne-tuning approach (eg, GPT[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], GPT2[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ],BERT[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]).
Feature-based approach includes pre-trained representations as additional
features into embeddings. Fine-tuning approach ne-tunes the pre-trained
parameters in the speci c downstream tasks. In our work, we use BERT in upstream
to do pre-training and CRF in downstream to ne-tune with the task-speci c
data.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <sec id="sec-3-1">
        <title>Problem Statement</title>
        <p>Our main task is to automatically extract the chemical bond energy values in
chemistry eld publications, since the pKa values are crucial in computational
chemistry and well-build pKa values can pave the way for deeper research on
computational chemistry. More speci cally, we need to extract 7 types of entities
and also extract bond energy data chains which contains many relations among
7 types of entities: compound, solvent, reaction, method, chemical bond, Bond
Energy(pKa) and Bond Energy value(pKa value), see gure 1. These 7 entities
will construct a complete chemical bond energy value chain: XX compound has
A reaction in B solvent to study the C chemical bond with D method, which pKa
is E value. Figure 2 shows the architecture of our method.
We constructed a corpus of Chemistry papers annotated for NER task with
the BIO encoding. The original data is from several subdisciplines of chemistry,
such as physical chemistry and surface chemistry. And we utilize more than 20
mainstream academic journals in the related subdisciplines, such as JOURNAL
OF THE AMERICAN CHEMICAL SOCIETY and JOURNAL OF ORGANIC
CHEMISTRY. We use the interface of Adobe to extract the PDF les into XML
version. We have 7 types of entities in our corpus: compound, solvent, method,
reaction, bond, bond energy and bond energy value.</p>
        <p>
          We invited chemistry experts from the Department of Chemistry of Tsinghua
University and National Science Library of Chinese Academy of Sciences to
construct our own chemical bond knowledge base and corpus{ChemBE (Chemical
Bond Energy) corpus. The corpus construction process is as picture 3. Table 1
shows the statistics of our chemistry corpus. ChemBE corpus is build up with
1900 full papers of chemical bond eld following the process of gold standard
corpus construction[
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. To ensure our corpus with high quality, two groups of
experts viewed the data independently and later inter-annotator agreement was
needed to ensure quality. The inter-annotator agreement score is measured by
F1 score, which can be written as follows (X is group 1 and Y is group 2) and
the nal F1 score is 89.6%.
        </p>
        <p>P recision(X; Y ) =
number of identical annotation results in X and Y</p>
        <p>number of annotation results in Y
Recall(X; Y ) =
number of identical annotation results in X and Y</p>
        <p>number of annotation results in X
F 1 =
2</p>
        <p>P recision Recall
P recision + Recall
(1)
(2)
(3)</p>
        <p>The knowledge base includes dictionaries and rules, which are further used to
recognize compounds and bonds later. The dictionaries include basic chemical
formula and molecular formula of compounds, roots and a xes, radicals,
substitutes, solvent, etc. The rules contain word indication rules, context indication
rules and logical indication rules.
3.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Bond Energy Scienti c Data Chain Concept Model</title>
        <p>Experts construct our bond energy scienti c data chain model to assist our
work. Experts build local model and global model to de ne the entities we need
extract. There are 7 entities: compound, solvent, pKa, pKa value, bond, reaction
and method. Among all the entities, we de ne 3 global entities(bond, reaction
and method) and 4 local entities(compound, solvent pKa and pKa value). We
only need to extract the relations between 4 local entities, since global entities
can apply to the whole paper and we do not have to extract relations with global
entities.
3.4</p>
      </sec>
      <sec id="sec-3-3">
        <title>Joint BERT-CRF Model</title>
        <p>In this part, we construct joint BERT-CRF Model to extract entity and relation
simultaneously.</p>
        <p>(1) Divide 7 entities into 2 categories and apply di erent methods to 2 types
( see Table 2).</p>
        <p>First, We use the established dictionaries and rules to replace compound and
chemical bond entities with two marks: $CMP$ and $BOND$. Then, in the later
deep learning process, we can avoid the unknown words trouble.
(2) Build our tagging scheme.</p>
        <p>We build our own tagging scheme to extract both entities and relationships in
the same time. In our tagging scheme, we only focus on only one relation between
a pair of entities in our local models. Thus we de ne minimum relations between
our local entities: compound-energy(CE) relation, solvent-energy(SE) relation
and energy-energy value(EE) relation (see Figure 4). Among these relations, CE
relation means "attribute", SE means "measure in" and EE relation means "the
value of".</p>
        <p>And we de ne our tagging scheme like this (see Figure 5): &lt;position
information, entity information, relation information&gt;. We give an annotation example
(see Figure 6). The position information has 2 options: B and I, which means
"begin" and "inter", respectively. The entity information has three options:
compound, solvent and pKa value (the global entities and pKa entity not include, we
only want to extract the relations of the other three local entities with pKa
entity). The relation information has 4 options: CE (compound-pKa),
SE(solventpKa), EE(pKa-pKa value) and NR(we only extract one relation among one pair
of entities, thus we ignore other relations and all give them one tag &lt;NR&gt;,
which means "no relation"). Other irrelevant words are tagged as &lt;O&gt;.</p>
        <p>Thus, in our tagging scheme, when extract entities, &lt;B-CMP-CE&gt; and
&lt;BCMP-NR&gt; are equal, because we do not pay attention to the relations. In other
words, we only pay attention to the rst two parts in the tags. If an entity should
be tagged as &lt;B-CMP-CE&gt;, we think &lt;B-CMP-NR&gt; extracts the correct
entity, but the wrong relation.</p>
        <p>(3) Re-pretrain BERT parameters with our large eld data.</p>
        <p>BERT, which is constructed of multilayer bidirectional Transformer, is a
contextualized word representation model based on mask language model and
next sentence prediction task. We replace the unused words in the vocabulary
of BERT with some common chemical terms and re-pretrain the pre-trained
parameters of BERT base trained on with 700,000 abstracts in the eld of chemical
bond energy, which was originally trained on 800M words of BooksCorpus and
2,500M words of English Wikipedia.</p>
        <p>(4) Fine-tune with small task-speci c data. In the downstream NER task,
we use the CRF layer to replace the original softmax layer and get better
performance.</p>
        <p>
          First, we use the BERT built-in softmax layer[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] to predict the labels. BERT
de nes two vectors in ne-tuning process: a start vector S and an end vector E.
And during the ne-tuning process, we feed the nal hidden representation Ti 2
RH into classi cation layer and the we get a K dimensional vector, the possibility
of the output vector belonging to category j is:
        </p>
        <p>Pj (z) =
(4)
(5)
where A is a transition scores matrix, and O is the output matrix of BERT.</p>
        <p>We use our ChemBE corpus to train our BERT+CRF model (see Figure 7).
3.5</p>
      </sec>
      <sec id="sec-3-4">
        <title>Extract bond energy data chain</title>
        <p>(1) Extract data chain from table.</p>
        <p>Tables always have some crucial entity and relation data. To some extent,
extracting information from tables is not very tough, since tables have
semistructured data. We use dictionaries and rules to extract the entities and
relations from tables.</p>
        <p>(2) Extract data chain from free text.</p>
        <p>Then, we add the CRF layer after BERT model to do the downstream NER
task. The CRF layer has a state transition matrix can use past and future tags
to predict the current tag and scores possible tags to give a probability of the tag
sequence. Given a sequence of input x=fx1; x2; :::; xng,a sequence of predictions
y=fy1; y2; :::; yng, we de ne the score of the predictions as following:
S(x; y) =
n
X Ayi;yi+1 +
i=0
n
X Oi;yi
i=0</p>
        <p>We use our BERT+CRF model to predict the entities and relations in the
free text.</p>
        <p>(3) Complete the relations extracted from tables and free text.</p>
        <p>Use entities and relation from the context and from the free text to complete
our scienti c data chain of pKa.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>Entity extraction. We conduct 6 di erent experiments to extraction
chemical entities. First, we use the traditional pre-training methods and downstream
BiLSTM+CRF networks: Glove+BiLSTM+CRF and ELMO+BiLSTM+CRF.
Then we use bert as pre-training method and two di erent downstream
networks: softmax and CRF. We also use di erent parameters: parameters with
only BERT pretraining and parameters with our re-pretraining with our
chemical corpus. The results are shown in Table 3. We need to stress that as for
compound and chemical bond entities, we use the dictionaroes and rules, not
the deep learning method. We also make statistical analysis of di erent entities
of the most competitive model-BERT+CRF model (see Table 4).</p>
      <p>As we can see in Table 3, our BERT+CRF model with re-pretrain
parameters outperforms other models signi cantly. BERT+CRF model gains 3.72%
improvement with no re-pretrained parameters and 3.66% improvement with
repretrained parameters in F1 score, respectively. With re-pretrained parameters,
BERT+softmax model gains a slight improvement of 0.43% and BERT+CRF
model gains a slight improvement of 0.26%.</p>
      <p>Relation extraction. This is also the results of previous 6 di erent
experiments, because we extract entities and simultaneously. Here, we only focus on
the results of the relation extraction. The results are shown in Table 5.</p>
      <p>Results of di erent types of relations of BERT+CRF model are shown in
Table 6. In relation extraction part, BERT+CRF model also have a
comparably competitive result than built-in softmax model. With no re-pretrained
parameters, BERT+CRF model sees an improvement of 3.23% in F1 score. With
re-pretrained parameters, BERT+CRF model improves F1 score from 85.04%
to 87.07%. The precision and F1 score of BERT+CRF model with re-pretrained
parameters are better than others. However, the recall of BERT+CRF model
declines slightly with re-pretrained parameters, compared with no re-pretrained
parameters.</p>
      <p>As we can see in Table 6, the CE relation is the toughest one among 3
relations. The reason behind this is that in our corpus, the compound is the
entity of highest frequency. But the proportion of compound with CE relation is
relatively small which requires high demand of contextual semantic information.
And during the annotation process, experts sometimes make mistakes easily as
well.</p>
      <p>Results presentation. We display our entity extraction and relation
extraction results as Figure 8. One color represents one type of entity, and arrows
represent the relations between entities.
We propose a joint BERT+CRF model to extract entities and relations
simultaneously. The contribution of our work is threefold: (1) We construct a new
chemical bond energy (pKa) corpus annotated for 7 types of entities and 3 types
of relations. (2) We construct a joint model that could extract a chemical
scienti c data chain with multiple entities and relations simultaneously and the
relation is not the traditional 1:1 entity pairs but 1:n or n:1 entity pairs. (3)
We investigate the performance of adding other task-speci c network to
downstream tasks of BERT. And the result shows that adding CRF to downstream
NER tasks may outperform simple softmax in our speci c corpus.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>The research work is supported by the Special foundation of Science and
Technology Resources Survey (No.2018FY201202). We would like to thank the support
by the Center of Basic molecular Science at Tsinghua University and National
Science Library of Chinese Academy of Science. We thank Huizhou Liu, Li Qian,
Jinpei Cheng, Jin-Dong Yang and Sanzhong Luo for the insightful suggestions
and discussions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Segler M H S</surname>
            , Preuss
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Waller M P.</surname>
          </string-name>
          <article-title>Planning chemical syntheses with deep neural networks and symbolic AI[J]</article-title>
          .
          <source>Nature</source>
          ,
          <year>2018</year>
          ,
          <volume>555</volume>
          (
          <issue>7698</issue>
          ):
          <fpage>604</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Granda J M</surname>
            , Donina
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dragone</surname>
            <given-names>V</given-names>
          </string-name>
          , et al.
          <article-title>Controlling an organic synthesis robot with machine learning to search for new reactivity[J]</article-title>
          .
          <source>Nature</source>
          ,
          <year>2018</year>
          ,
          <volume>559</volume>
          (
          <issue>7714</issue>
          ):
          <fpage>377</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Zhou</surname>
            <given-names>Q</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>S</given-names>
          </string-name>
          , et al.
          <article-title>Atom2Vec: learning atoms for materials discovery</article-title>
          [J].
          <source>arXiv preprint arXiv:1807.05617</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lin</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Disorder recognition in clinical texts using multi-label structured SVM[J]</article-title>
          .
          <source>BMC bioinformatics</source>
          ,
          <year>2017</year>
          ,
          <volume>18</volume>
          (
          <issue>1</issue>
          ):
          <fpage>75</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Wagsta</surname>
            <given-names>K L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Francis</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gowda</surname>
            <given-names>T</given-names>
          </string-name>
          , et al.
          <article-title>Mars Target Encyclopedia: Rock and Soil Composition Extracted from the Literature</article-title>
          [J].
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Liu</surname>
            <given-names>Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>X</given-names>
          </string-name>
          , et al.
          <article-title>De-identi cation of Clinical Notes via Recurrent Neural Network and Conditional Random Field[J]</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          ,
          <year>2017</year>
          , 75S:
          <fpage>S34</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Zeng</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lai</surname>
            <given-names>S</given-names>
          </string-name>
          , et al.
          <article-title>Relation classi cation via convolutional deep neural network</article-title>
          [J].
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zhou</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tian</surname>
            <given-names>J</given-names>
          </string-name>
          , et al.
          <article-title>Attention-based bidirectional long short-term memory networks for relation classi cation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers).</given-names>
          </string-name>
          <year>2016</year>
          ,
          <volume>2</volume>
          :
          <fpage>207</fpage>
          -
          <lpage>212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lin</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>Z</given-names>
          </string-name>
          , et al.
          <article-title>Neural relation extraction with selective attention over instances[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          .
          <year>2016</year>
          ,
          <volume>1</volume>
          :
          <fpage>2124</fpage>
          -
          <lpage>2133</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Miwa</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bansal</surname>
            <given-names>M</given-names>
          </string-name>
          .
          <article-title>End-to-end relation extraction using lstms on sequences and tree structures[J]</article-title>
          .
          <source>arXiv preprint arXiv:1601.00770</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Zheng</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bao</surname>
            <given-names>H</given-names>
          </string-name>
          , et al.
          <article-title>Joint extraction of entities and relations based on a novel tagging scheme[J]</article-title>
          .
          <source>arXiv preprint arXiv:1706.05075</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Zheng</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hao</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            <given-names>D</given-names>
          </string-name>
          , et al.
          <article-title>Joint entity and relation extraction based on a hybrid neural network[J]</article-title>
          .
          <source>Neurocomputing</source>
          ,
          <year>2017</year>
          ,
          <volume>257</volume>
          :
          <fpage>59</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Li</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
            <given-names>G</given-names>
          </string-name>
          , et al.
          <article-title>A neural joint model for entity and relation extraction from biomedical text[J]</article-title>
          .
          <source>BMC bioinformatics</source>
          ,
          <year>2017</year>
          ,
          <volume>18</volume>
          (
          <issue>1</issue>
          ):
          <fpage>198</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Maiya</surname>
            <given-names>A S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Visser</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wan</surname>
            <given-names>A</given-names>
          </string-name>
          . Mining Measured Information from Text[J].
          <year>2015</year>
          ,
          <volume>76</volume>
          (
          <issue>2</issue>
          ):
          <fpage>899</fpage>
          -
          <lpage>902</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sarath P R</surname>
            , Sunil
            <given-names>Mandhan</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Yoshiki</given-names>
            <surname>Niwa</surname>
          </string-name>
          .
          <source>Numerical Atrribute Extraction from Clinical Texts[J]</source>
          .
          <year>2016</year>
          .]
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Xie</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>X</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dajun Zeng D. Mining</surname>
          </string-name>
          e
          <article-title>-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation[J]</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          ,
          <year>2017</year>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ):
          <fpage>72</fpage>
          -
          <lpage>80</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wei</surname>
            <given-names>C H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leaman</surname>
            <given-names>R</given-names>
          </string-name>
          , et al.
          <article-title>Overview of the BioCreative V chemical disease relation (CDR) task</article-title>
          [C]//Proceedings of the fth
          <source>BioCreative challenge evaluation workshop</source>
          .
          <year>2015</year>
          :
          <fpage>154</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Tim</surname>
            <given-names>Rocktschel</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weidlich</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leser</surname>
            <given-names>U .</given-names>
          </string-name>
          <article-title>ChemSpot: A Hybrid System for Chemical Named Entity Recognition[J]</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <year>2012</year>
          ,
          <volume>28</volume>
          (
          <issue>12</issue>
          ):
          <fpage>1633</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Peters</surname>
            <given-names>M E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            <given-names>M</given-names>
          </string-name>
          , et al.
          <article-title>Deep contextualized word representations[J]</article-title>
          .
          <source>arXiv preprint arXiv:1802.05365</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Radford</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narasimhan</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salimans</surname>
            <given-names>T</given-names>
          </string-name>
          , et al.
          <article-title>Improving language understanding by generative pre-training[J]</article-title>
          .
          <source>URL https://s3-us-west-2</source>
          . amazonaws. com/openaiassets/research-covers/languageunsupervised/language understanding paper. pdf,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Radford</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Child</surname>
            <given-names>R</given-names>
          </string-name>
          , et al.
          <article-title>Language models are unsupervised multitask learners[J]</article-title>
          .
          <source>OpenAI Blog</source>
          ,
          <year>2019</year>
          ,
          <volume>1</volume>
          :
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Devlin</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            <given-names>M W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            <given-names>K</given-names>
          </string-name>
          , et al.
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          [J].
          <source>arXiv preprint arXiv:1810.04805</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Wissler</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almashraee</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daz D M</surname>
          </string-name>
          , et al. The Gold Standard in Corpus Annotation[C]//IEEE GSC.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>