<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transfer Learning for Biomedical Named Entity Recognition with BioBERT</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anthi Symeonidou</string-name>
          <email>anthi.symeonidou@student.uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viachaslau Sazonau</string-name>
          <email>s.sazonau@elsevier.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Groth</string-name>
          <email>p.groth@uva.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Elsevier</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We apply a transfer learning approach to biomedical named entity recognition and compare it with traditional approaches (dictionary, CRF, BiLTSM). Specifically, we build models for adverse drug reaction recognition on three datasets. We tune a pre-trained transformer model, BioBERT, on these datasets and observe the absolute F1-score improvements of 6.93, 10.46 and 13.31. This shows that, with a relatively small amount of annotated data, transfer learning can help in specialized information extraction tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>Named entity recognition</kwd>
        <kwd>BioBERT</kwd>
        <kwd>Transfer learning</kwd>
        <kwd>Text mining</kwd>
        <kwd>BIO tagging</kwd>
        <kwd>Drug safety</kwd>
        <kwd>Adverse drug reaction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Biomedical knowledge graphs are becoming important for tasks such as
pharmacovigilance [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. These knowledge graphs are often constructed by performing
information extraction (IE) over unstructured and semi-structured sources such
as clinical records, electronic health records and biomedical literature [
        <xref ref-type="bibr" rid="ref13 ref2 ref9">2, 9, 13</xref>
        ].
Named entity recognition (NER) is one of the fundamental tasks of IE.
Particular entities of interest in this domain are adverse drug reactions (ADRs).
ADRs cause significant number of deaths worldwide and billion of dollars are
spent yearly to treat people who had an ADR from a prescribed drug [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
ADR recognition is a challenging IE task because of the context importance and
multi-token phenomena such as discontinuity and overlaps.
      </p>
      <p>
        Various approaches have been applied to biomedical NER. Dictionary-based
approaches [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], where string matching methods are used to identify entities
in text, are common [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Recently, machine learning techniques, such as
conditional random fields (CRFs) and deep learning [
        <xref ref-type="bibr" rid="ref10 ref3 ref6">3, 6, 10</xref>
        ], have gained popularity
and shown performance gains. However, these techniques usually require lots of
training data, which is costly to obtain in the biomedical domain.
      </p>
      <p>
        Transfer learning techniques have shown their potential in overcoming the
lack of training data. Transfer learning is a method where a model developed for
one task is exploited to improve generalization on another task [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Giorgi and
Bader have found that a transfer learning approach is beneficial for biomedical
NER and can improve state-of-the-art results, particularly for datasets with a
number of labels less than 6000 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Devlin et al, have recently published a model
called BERT (Bidirectional Encoder Representations from Transformers), which
was trained over 3.3B words corpus and achieved outstanding performance in 11
natural language processing tasks (NLP), including NER [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In the biomedical
context, BioBERT, which has a similar architecture to BERT and was trained
on more than 18B words from PubMed abstracts and PMC articles, achieved
high performance in NER on several benchmarks [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>In this paper, we investigate if transfer learning can outperform traditional
approaches for ADR recognition on three different datasets. The pre-trained
BioBERT model was fine-tuned on these datasets and compared to a
dictionarybased method, CRF, and BiLTSM. Our main contribution is empirical and shows
that a transfer learning method based on BioBERT can achieve considerably
higher performance in recognizing ADRs than traditional methods. Our results
suggest that transfer learning based on transformer architectures is a promising
approach to addressing the lack of training data in biomedical IE.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data &amp;</title>
    </sec>
    <sec id="sec-3">
      <title>Models</title>
      <p>
        In our experiments, we use two open benchmarks for ADR recognition (the only
ones publicly available for ADR recognition to the best of our knowledge) and
complement them with Elsevier’s dataset for better reliability of the results.
TAC2017: The data are spontaneous adverse event reports submitted to the
FDA Adverse Event Reporting System (FAERS) by drug manufacturers in a
Structured Product Labeling (SPL) format from 2009. These xml documents
are converted to Brat standoff format [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        ADE corpus: ADE is an open source corpus which consists of information
extracted by PubMed articles and contains annotated entities (ADRs, drugs,
doses) and relations. The ADRs annotations were used in the current research [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Elsevier’s gold set: Elsevier’s gold set consists of xml text files which come
from DAILYMED in SPL format and contain information for human drugs from
2015 to present 3. They are similar to TAC2017 SLP documents and follow the
same annotation guidelines.
      </p>
      <p>
        Three traditional NER approaches were compared with transfer learning.
Dictionary-Based Approach. A dictionary-based approach is based on the
Aho-Corasick algorithm which is a string-matching algorithm. The output is a
dictionary of keywords for a given entity type used to create a finite state
machine for searching [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        CRFs. Conditional Random Fields is a probabilistic graphical model that
predicts a sequence of labels for a sequence of input samples (sentences in our
case). The output is a probability between 0 and 1 which denotes how well the
model predicts and assigns the correct label to entities based on features and
weights [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        BiLSTM-CRF. We use a bidirectional LSTM model with a sequential
conditional random layer above it. Character and word embeddings were given as
input to the model [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The best F1-score reported for TAC2017 was achieved
      </p>
      <sec id="sec-3-1">
        <title>3 https://dailymed.nlm.nih.gov/dailymed/</title>
        <p>
          by using the same architecture [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
        <p>
          BioBERT. A transformer-based model which is based on BERT but pre-trained
in a bidirectional way over large-scale biomedical corpora. This model, with
minimal task-specific modifications (fine-tuning), was applied to the biomedical NER
task [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3 Experiments</title>
      <p>In our evaluation,4 we focus on predicting the ADR label as it is of great
importance and is one of the predominant labels, representing 9-14% of the total
number of words in all our datasets. Due to computational constraints, only
sentences with less than 130 words were selected. This filtered out between 0-9%
of the sentences depending on the dataset. Duplicate sentences were removed.
The number of sentences, words and labelled entities of all datasets are shown
in Table 1. The same data pre-processing was accomplished for all approaches
and all datasets. The NLTK tokenizer5 was used for the conventional models.
For BioBERT, we used the default WordPiece tokenizer. In order to minimise
tokenisation differences, we implemented post-processing of its output to make
it similar to the output of the NLTK tokenizer. We use 5-fold cross validation
to split the processed data and evaluate the models. In the BioBERT case, the
model was fine-tuned on an NVIDIA Tesla T4 16GB GPU using the default
hyperparameter values and set only the maximum sequence length to 150 tokens.
The official BioBERT implementation was used.6 Entities are marked based on
the standard BIO tagging scheme. The standard precision, recall and F1-score
were used as entity-level evaluation metrics where only exact matches of the full
entity are counted (i.e. both "B" and "I" tags must match if they belong to the
same entity) and "O" is excluded.</p>
    </sec>
    <sec id="sec-5">
      <title>4 Results</title>
      <p>The mean scores and 95% confidence intervals are shown in Table 2. BioBERT
outperformed all other models on all datasets. The absolute improvement
compared to the second-best results was 6.93, 10.46 and 13.31 units on TAC2017,
ADE and Elsevier’s gold set, respectively.7 Interestingly, the results show that
BioBERT achieved much higher recall improvements than precision
improvements. This means that the model was able to miss much fewer ADR entities</p>
      <sec id="sec-5-1">
        <title>4 Source code: https://github.com/AnthiS/MasterThesis_DS</title>
        <p>
          5 https://www.nltk.org
6 https://github.com/dmis-lab/biobert
7 The best F1-score reported on TAC2017 is 85.2% [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and 86.78% on ADE [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
than the other methods. The simultaneous bidirectional contextual information
capturing, an important characteristic of BioBERT, seems to be beneficial for
model performance. Another important remark is about the number of
annotated data that have been used and the corresponding model performance. In
particular, only around 7000 of labelled examples were needed in the case of
the Elsevier’s gold set to achieve the results above 80% F1-score. No significant
differences were observed for training time between the BiLSTM and BioBERT
model, considering the boost in performance. The fine-tuning of BioBERT
requires about 30 minutes, while the BiLSTM needs about 20 minutes on the
TAC2017 dataset. However, a GPU is required for the pre-trained BioBERT
model due to its high number (110M) of parameters. Overall, the
BioBERTbased approach clearly outperformed the traditional approaches for ADR
recognition on all our datasets.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>A transfer learning approach for ADR recognition was tested using a domain
specific language model - BioBERT. The fine-tuned BioBERT model achieved
better performance on three different biomedical corpora than three traditional
methods. An interesting observed property of this model is its ability to find more
entities compared to existing methods, using only a few thousand examples and
requiring comparable amounts of training time. Based on our results, we believe
that transformer-based neural models are a promising approach for complex
biomedical NER problems, such as ADR recognition, and can be a key ingredient
for biomedical knowledge graph construction. Additional experiments on more
datasets, for ADR and other entities, should bring more empirical evidence to
validate our conjecture.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aho</surname>
            ,
            <given-names>A.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corasick</surname>
            ,
            <given-names>M.J.:</given-names>
          </string-name>
          <article-title>Efficient string matching: An aid to bibliographic search</article-title>
          .
          <source>Commun. ACM</source>
          <volume>18</volume>
          (
          <issue>6</issue>
          ),
          <fpage>333</fpage>
          -
          <lpage>340</lpage>
          (
          <year>1975</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aramaki</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miura</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tonoike</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohkuma</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Masuichi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Waki</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohe</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Extraction of adverse drug effects from clinical records</article-title>
          .
          <source>Studies in health technology and informatics 160</source>
          ,
          <fpage>739</fpage>
          -
          <lpage>43</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bundschus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dejori</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stetter</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tresp</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          :
          <article-title>Extraction of semantic biomedical relations from text using conditional random fields</article-title>
          .
          <source>BMC Bioinformatics</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>K.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hunter</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Getting started in text mining</article-title>
          .
          <source>PLOS Computational Biology</source>
          <volume>4</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qian</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A multiclass classification method based on deep learning for named entity recognition in electronic medical records</article-title>
          .
          <source>In: 2016 New York Scientific Data Summit (NYSDS)</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Giorgi</surname>
            , J.,
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Bader</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Transfer learning for biomedical named entity recognition with neural networks</article-title>
          .
          <source>Bioinformatics</source>
          (Oxford, England)
          <volume>34</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gurulingappa</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rajput</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fluck</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann-Apitius</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toldo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          <volume>45</volume>
          (
          <issue>5</issue>
          ),
          <fpage>885</fpage>
          -
          <lpage>892</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Harpaz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vilar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , DuMouchel, W.,
          <string-name>
            <surname>Salmasian</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haerian</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>N.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chase</surname>
            ,
            <given-names>H.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Combing signals from spontaneous reports and electronic health records for detection of adverse drug reactions</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>20</volume>
          (
          <issue>3</issue>
          ),
          <fpage>413</fpage>
          -
          <lpage>419</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramanian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawakami</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Neural architectures for named entity recognition</article-title>
          .
          <source>In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <fpage>260</fpage>
          -
          <lpage>270</lpage>
          . Association for Computational Linguistics, San Diego, California (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lazarou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pomeranz</surname>
            ,
            <given-names>B.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corey</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          :
          <article-title>Incidence of Adverse Drug Reactions in Hospitalized PatientsA Meta-analysis of Prospective Studies</article-title>
          .
          <source>JAMA</source>
          <volume>279</volume>
          (
          <issue>15</issue>
          ),
          <fpage>1200</fpage>
          -
          <lpage>1205</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoon</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donghyeon</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ho So</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
          </string-name>
          , J.:
          <article-title>Biobert: pre-trained biomedical language representation model for biomedical text mining (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Leser</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hakenberg</surname>
          </string-name>
          , J.:
          <article-title>What makes a gene name? Named entity recognition in the biomedical literature</article-title>
          .
          <source>Briefings in Bioinformatics</source>
          <volume>6</volume>
          (
          <issue>4</issue>
          ),
          <fpage>357</fpage>
          -
          <lpage>369</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Pubmed and beyond: a survey of web tools for searching biomedical literature</article-title>
          .
          <source>Database : the journal of biological databases and curation</source>
          <year>2011</year>
          , baq036 (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ono</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hishigaki</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanigami</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takagi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Automated extraction of information on protein-protein interactions from the biological literature</article-title>
          .
          <source>Bioinformatics</source>
          <volume>17</volume>
          (
          <issue>2</issue>
          ),
          <fpage>155</fpage>
          -
          <lpage>161</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>A survey on transfer learning</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>22</volume>
          (
          <issue>10</issue>
          ),
          <fpage>1345</fpage>
          -
          <lpage>1359</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Ramamoorthy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murugan</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An attentive sequence model for adverse drug event extraction from biomedical text (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tonning</surname>
            ,
            <given-names>J.M.:</given-names>
          </string-name>
          <article-title>Overview of the tac 2017 adverse reaction extraction from drug labels track</article-title>
          .
          <source>In: TAC</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>