<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Large Biomedical Question Answering Models with ALBERT and ELECTRA</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sultan Alrowili</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>K. Vijay-Shanker</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer and Information Science, University of Delaware</institution>
          ,
          <addr-line>Newark, Delaware</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The majority of systems that participated in the BioASQ8 challenge are based on BioBERT model [1]. We adopt a diferent approach in our participation in the BioASQ9B challenge by taking advantage of large biomedical language models that are built on ELECTRA [2] and ALBERT [3] architectures, including both BioM-ELECTRA and BioM-ALBERT [4]. Moreover, we examine the advantage of transferability [5] between BioASQ and other text classification tasks such as The Multi-Genre Natural Language Inference (MultiNLI) [6]. Our results show that both BioM-ELECTRA and BioM-ALBERT significantly outperform the BioBERT model on the BioASQ9B task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. System Description</title>
      <p>
        We use large-scale biomedical language models, which is one of the primary diferences between
our system and other prior systems that participated in the BioASQ8B challenge. In our
participation in the BioASQ9B challenge, we use both our models BioM-ELECTRA and
BioMALBERT models [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
ELECTRA is a model that was built upon the idea of Transformer encoder, and attention
mechanism [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] that BERT model uses. However, the ELECTRA model introduces novelty to
the loss function by eliminating Next Sentence Prediction (NSP) objective, which is a similar
decision taken by the RoBERTA model [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Moreover, ELECTRA improves the loss function
by incorporating ideas from GAN model [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] where it generates corrupted (fake) tokens by
employing a small Masked Language Model (MLM). Then, a discriminator model will judge
those corrupted tokens and decide if they are "original" or "replaced" tokens.
      </p>
      <p>
        To shift the contextual representation of ELECTRA, we pretrain ELECTRA on PubMed
abstracts using specific domain vocabulary learned from PubMed abstracts. We pretrain our
BioM-ELECTRA for 434K steps using TPUv3-512 units with a batch size of 4096.
ALBERT model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] takes a similar decision to ELECTRA regarding the loss function by dropping
Next Sentence Prediction (NSP) function. Furthermore, ALBERT introduces a self-supervised
loss for sentence-order prediction (SOP) objective. Additionally, the ALBERT model improves
the eficiency of the Transformer model by introducing both parameter-sharing and factorization
of embedding layers techniques. The Parameter-sharing technique improves the architecture
by reducing the parameters redundancy inside the model.
      </p>
      <p>On the other hand, factorization of embedding layers allows the model to increase its hidden
layer size up to 4096 while having only 235M parameters in the case of ALBERT-xxlarge. We
build BioM-ALBERTxxlarge by pretraining ALBERTxxlarge on PubMed Abstracts using
TPUv3512 unit for 264K steps and a batch size of 8192. Similar to BioM-ELECTRA, we also pretrain
BioM-ALBERT on PubMed abstracts only.</p>
      <p>
        Table 1 shows the architecture design and the reported results [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] of our models on SQuAD2.0
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and BioASQ7B-Factoid tasks against other SOTA models. We include this table to show
a head-to-head comparison between diferent architectures that have have been used by
participants’ systems in the BioASQ9B challenge [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. We should also note that it is a common
practice in the literature to fine-tune the biomedical language model on the the SQuAD dataset
ifrst and then on the BioASQ dataset. The reason to follow this approach because SQuAD2.0
dataset has more than 130K examples, which is much larger than the BioASQ dataset.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Setup</title>
      <sec id="sec-3-1">
        <title>3.1. Pre-Processing phase</title>
        <p>
          For BioASQ9B factoid and list questions, we converted all questions to SQuADv1.1 format.
Therefore, we duplicate the snippet (context) for each question in the training and test dataset
instead of having a group of snippets and one corresponding question. For yes/no questions,
we adopted a binary classification approach to solve this task by having the context (snippet)
as "sentence 1", questions as "sentence 2" and the answer (yes/no) as a "label." We use a
preprocessing script developed by [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] to generate the BioASQ classification dataset.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Environmental Design</title>
        <p>
          We fine-tune our models on factoid and list questions using Google Cloud Compute Engine with
TPUv3-8 units and TensorFlow 1.15. For the yes/no task, we use the Hugging-face Transformers
library [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] and V100 GPU on the Google Colab Pro environment.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Hyperparameters</title>
        <p>
          For factoid and list questions, we use the same hyperparameters settings that we use in our
previous work [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] as shown in Table 2. We made this decision to examine the consistency
and reproducibility of both BioM-ELECTRA and BioM-ALBERT on the BioASQ9B challenge.
For the yes/no question, we use the training and testing dataset of the BioASQ8B challenge to
determine our choices of hyperparameters.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Task-to-Task Transfer Learning</title>
        <p>
          The early work done by [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] shows that the transferability (Task-to-Task Transfer
Learning) between general domain tasks such as MultiNLI [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and SQuAD helps to improve
the results on SQuAD and BioASQ8B tasks. We did a similar approach by fine-tuning both
BioM-ALBERT and BioM-ELECTRA on the MNLI task, then SQuAD, and later on the BioASQ
training dataset. We investigate and report the impact of this transferability on BioASQ9B in
the result section.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>We participated in the BioASQ9B challenge under the name "UDEL-LAB". Our reported results
in this section are obtained from the BioASQ9B oficial leader board. We participate in the
BioASQ9B-Factoid challenge starting from batch 3, and we use batch 2 to test the format of
our submission. Therefore, we only include results of BioASQ-Factoid challenge starting from
batch 3. We participated in yes/no, and list questions on batch five only since both types of
tasks require extra pre-processing that we could not develop at early stage.</p>
      <sec id="sec-4-1">
        <title>4.1. Factoid Task</title>
        <p>
          Table 3 shows the results of our system on the BiASQ9B-Factoid challenge. We show only the
top five systems for each batch based on the mean reciprocal rank (MRR) score. The Fudan
University team participated with four systems under the name of ir_sys [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Their systems
combined SpanBERT [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], PubMedBERT [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and XLNet models [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. On other hand,
"bioanswerfinder" system uses the BioELECTRA model [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], which they have developed early based
on ELECTRA architecture. The result of BioM-ALBERT and BioM-ELECTRA against other
models on both batch three and batch five suggests that our models has more consistency on
the BioASQ performance than other models. Results also highlight that language model scale is
a dominant factor on the performance of BioASQ-Factoid questions. Only large-scale models
that are based on ALBERT-xxlarge, ELECTRA-large, and XLNET are taking the lead in all three
batches.
        </p>
        <p>
          On the other hand, using the transferability between MNLI and SQuAD tasks improves the
score of our systems in the third batch by almost 2% in MRR score. However, this improvement
is not consistent in both batch 4 and 5. We attribute this inconsistency to the fact that the
finetuning layer of BERT-like models is randomly initialized. This randomness causes a fluctuation
in the results, especially if we have small evaluation data set [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. On the other hand, the score
of BioM-ALBERT and BioM-ELECTRA in both batches 3 and 5 suggest that having an ensemble
model could help further improve the results.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. List and Yes/No Tasks</title>
        <p>
          Table 4 shows the results of our system on the BiASQ9B List and Yes/No challenge. In the list
task, our systems ranked in first and second place. We achieved this score for list questions
despite using the same hyperparameters that we use for the factoid task. On yes/no task,
BioMALBERT performs significantly better than BioM-ELECTRA but falls behind the performance
of "KU-DMIS-2" system, which uses BioBERT-Large [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. We should also note that the number
of both list questions (18) and yes/no (19) questions are relatively smaller than factoid questions
(36). Tasks with small data sets usually are sensitive to hyperparameter choice and fluctuate
between each fine-tuning run, especially in the case of binary classification (yes/no) task.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>We demonstrate that BioM-ELECTRA and BioM-ALBERT models are efective in addressing the
BioASQ challenge. Our systems take the lead in two batches of factoid tasks and by a significant
margin (2%) in batch 5. Additionally, we show that applying transferability between MNLI and
SQuAD led our systems to score at first place on factoid (batch 3) and list (batch 5) questions.
For future work, we plan to build a large ensemble QA system based on both BioM-ELECTRA
and BioM-ALBERT to address the BioASQ and pandemic challenges.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement References</title>
      <p>We would like to acknowledge the support we have from Tensorflow Research Cloud (TFRC)
team to grant us access to TPUv3 units.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bougiatiotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rodriguez-Penagos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          , G. Paliouras, Overview of bioasq
          <year>2020</year>
          :
          <article-title>The eighth bioasq challenge on large-scale biomedical semantic indexing and question answering</article-title>
          ,
          <source>in: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          , Springer, Springer,
          <year>2020</year>
          . URL: https://link.springer.com/chapter/10.1007/978-3-
          <fpage>030</fpage>
          -58219-7_
          <fpage>16</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>T.</given-names>
            <surname>Luong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Electra:
          <article-title>Pre-training text encoders as discriminators rather than generators</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2003</year>
          .10555.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , R. Soricut,
          <string-name>
            <surname>Albert:</surname>
          </string-name>
          <article-title>A lite bert for self-supervised learning of language representations</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>1909</year>
          .11942.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Alrowili</surname>
          </string-name>
          , V. Shanker, BioM-transformers:
          <article-title>Building large biomedical language models with BERT, ALBERT and ELECTRA</article-title>
          ,
          <source>in: Proceedings of the 20th Workshop on Biomedical Language Processing</source>
          , Association for Computational Linguistics, Online,
          <year>2021</year>
          , pp.
          <fpage>221</fpage>
          -
          <lpage>227</lpage>
          . URL: https://www.aclweb.org/anthology/2021.bionlp-
          <volume>1</volume>
          .24. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          . bionlp-
          <volume>1</volume>
          .
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jeong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <article-title>Transferability of natural language inference to biomedical question answering</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <year>2007</year>
          .00217.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Michael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          , S. Bowman,
          <string-name>
            <surname>GLUE:</surname>
          </string-name>
          <article-title>A multi-task benchmark and analysis platform for natural language understanding</article-title>
          ,
          <source>in: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Association for Computational Linguistics</source>
          , Brussels, Belgium,
          <year>2018</year>
          , pp.
          <fpage>353</fpage>
          -
          <lpage>355</lpage>
          . URL: https://www.aclweb.org/anthology/W18-5446. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W18</fpage>
          -5446.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>So</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Kang,</surname>
          </string-name>
          <article-title>BioBERT: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          ,
          <source>Bioinformatics</source>
          (
          <year>2019</year>
          ). URL: https://doi.org/10.1093/bioinformatics/btz682. doi:
          <volume>10</volume>
          .1093/bioinformatics/btz682.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://www.aclweb.org/ anthology/N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bougiatiotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Paliouras, Results of the seventh edition of the bioasq challenge</article-title>
          ,
          <source>in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases</source>
          , Springer, Springer,
          <year>2019</year>
          . URL: https://arxiv.org/pdf/
          <year>2006</year>
          .09174. pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , J. Carbonell, R. Salakhutdinov,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Xlnet: Generalized autoregressive pretraining for language understanding</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>1906</year>
          .08237.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shoeybi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Patwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Puri</surname>
          </string-name>
          , P. LeGresley, J.
          <string-name>
            <surname>Casper</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Catanzaro</surname>
          </string-name>
          , Megatronlm:
          <article-title>Training multi-billion parameter language models using model parallelism</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>1909</year>
          .08053.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Pretrained language models for biomedical and clinical tasks: Understanding and extending the state-of-the-art</article-title>
          ,
          <source>in: Proceedings of the 3rd Clinical Natural Language Processing Workshop</source>
          , Association for Computational Linguistics, Online,
          <year>2020</year>
          , pp.
          <fpage>146</fpage>
          -
          <lpage>157</lpage>
          . URL: https://www.aclweb.org/anthology/2020. clinicalnlp-
          <volume>1</volume>
          .17. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .clinicalnlp-
          <volume>1</volume>
          .
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>H.-C. Shin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            , E. Bakhturina,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Puri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Patwary</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Shoeybi</surname>
          </string-name>
          , R. Mani,
          <article-title>BioMegatron: Larger biomedical domain language model</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>4700</fpage>
          -
          <lpage>4706</lpage>
          . URL: https://www.aclweb.org/anthology/ 2020.emnlp-main.
          <volume>379</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>379</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tinn</surname>
          </string-name>
          , H. Cheng, M. Lucas,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usuyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Poon</surname>
          </string-name>
          ,
          <article-title>Domain-specific language model pretraining for biomedical natural language processing</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <year>2007</year>
          .15779.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , u. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Proceedings of the 31st International Conference on Neural Information Processing Systems</source>
          , NIPS'17, Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2017</year>
          , p.
          <fpage>6000</fpage>
          -
          <lpage>6010</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pouget-Abadie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warde-Farley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ozair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Generative adversarial nets</article-title>
          , in: Z.
          <string-name>
            <surname>Ghahramani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Welling</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Lawrence</surname>
            ,
            <given-names>K. Q.</given-names>
          </string-name>
          <string-name>
            <surname>Weinberger</surname>
          </string-name>
          (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>27</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2014</year>
          . URL: https://proceedings.neurips.cc/paper/2014/ ifle/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Know what you don't know: Unanswerable questions for SQuAD, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Melbourne, Australia,
          <year>2018</year>
          , pp.
          <fpage>784</fpage>
          -
          <lpage>789</lpage>
          . URL: https://www.aclweb.org/anthology/P18-2124. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P18</fpage>
          -2124.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Vandorou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          , G. Paliouras, Overview of bioasq
          <year>2021</year>
          :
          <article-title>The ninth bioasq challenge on large-scale biomedical semantic indexing and question answering</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2106</volume>
          .
          <fpage>14885</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Le</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-of-the-art natural language processing</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          . URL: https://www.aclweb.org/anthology/2020.emnlp-demos.6. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-demos.
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Task-to-task transfer learning with parameter-eficient adapter</article-title>
          , in: X.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Hong</surname>
          </string-name>
          , R. He (Eds.),
          <source>Natural Language Processing and Chinese Computing</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>391</fpage>
          -
          <lpage>402</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Weld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Levy,</surname>
          </string-name>
          <article-title>SpanBERT: Improving pre-training by representing and predicting spans</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>10529</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>I. B.</given-names>
            <surname>Ozyurt</surname>
          </string-name>
          ,
          <article-title>On the efectiveness of small, discriminatively pre-trained language representation models for biomedical text mining</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Scholarly Document Processing</source>
          , Association for Computational Linguistics, Online,
          <year>2020</year>
          , pp.
          <fpage>104</fpage>
          -
          <lpage>112</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .sdp-
          <volume>1</volume>
          .12. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .sdp-
          <volume>1</volume>
          .
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>