<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AI-NLM exploration of the Acronym Identification Shared Task at SDU@AAAI-21</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Willie Rogers</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alastair Rae</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dina Demner-Fushman National Library of Medicine</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rockville Pike Bethesda</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>wjrogers</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>raear</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ddemnerg@mail.nih.gov</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>National Library of Medicine has developed systems for recognition of named entities in biomedical and clinical text. The systems are primarily leveraging the Unified Medical Language System (UMLS) to recognize the terms and link them to the terminology part of the UMLS (Metathesaurus.) Biomedical and clinical texts are rife with acronyms and abbreviations. Acronym identification and disambiguation play, therefore, an important role in processing of the text using the UMLS-based approaches. To test the existing rule-based approaches developed at NLM and to explore the state-ofthe-art DL approaches, we participated in the SDU Acronym Identification shared task. Not surprisingly, our existing rulebased approach achieved high precision (over 96%), but had very low recall, whereas, the LSTM and BERT-based approaches had almost equal recall and precision and achieved F1 scores in the low 90s.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        One of the major problems in machine understanding of the
biomedical and clinical text is disambiguation of acronyms
and abbreviations. In the scientific literature, acronyms are
often introduced along with the full form of the term, for
example, Coronavirus Disease 2019 (COVID-19). Full terms
are not provided when the term is well known and
relatively unambiguous, such as HIV or NSAIDs. This
observation led to implementation of algorithms that leverage the
full term to derive the meaning of the acronym in the local
context
        <xref ref-type="bibr" rid="ref1 ref20 ref22">(Schwartz and Hearst 2002; Aronson 1996; Zhou,
Torvik, and Smalheiser 2006)</xref>
        . Clinical notes, on the other
hand, almost never contain full terms and, unlike the
scientific papers, can use the same acronyms for different terms
in different parts of a single note. For example, BS could
denote Breath Sound, Bowel Sound or Blood Sugar levels,
and only the sections of the note can help disambiguate the
acronym. Not surprisingly, there is a large body of work on
resolving the biomedical acronyms, most recently,
employing neural approaches
        <xref ref-type="bibr" rid="ref11 ref21 ref8">(Li et al. 2019; Joopudi, Dandala, and
Devarakonda 2018; Wu et al. 2017)</xref>
        .
      </p>
      <p>Our participation in the task was motivated by the
general purpose biomedical named entity recognition system,</p>
      <p>Parameter
dimension
dimension char
dropout
num oov buckets
training epochs
batch size
buffer
char lstm size*
kernel size**
lstm size
minimum steps</p>
      <p>Value
300
100
0.5
1
25
20
15000
25
3
100
8000</p>
      <p>
        MetaMap, developed at the National Library of Medicine.
There are currently two implementations of MetaMap: a
Prolog-based version
        <xref ref-type="bibr" rid="ref2">(Aronson and Lang 2010)</xref>
        that has
accrued multiple processing options over the years and a
lightweight Java implementation (Demner-Fushman, Rogers,
and Aronson 2017) intended to facilitate inclusion of the
tool in the local clinical text processing. Both versions of
MetaMap implement a rule-based acronym disambiguation
algorithm that relies on the presence of the full form of the
term. Participating in the shared task gave us an opportunity
to evaluate this algorithm, and also to explore the
state-ofthe-art approaches that we previously used for word sense
disambiguation and other tasks. To train and validate our
approaches, we used the data developed for the Acronyms
Identification task
        <xref ref-type="bibr" rid="ref15 ref16">(Pouran Ben Veyseh et al. 2020b)</xref>
        . The
task is described in the overview provided by the
organizers
        <xref ref-type="bibr" rid="ref15 ref16">(Pouran Ben Veyseh et al. 2020a)</xref>
        .
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>We tried both the above algorithmic and machine learning
methods in the acronym identification task.</p>
      <p>
        We initially attempted acronym identification using the
MetaMap
        <xref ref-type="bibr" rid="ref2">(Aronson and Lang 2010)</xref>
        implementation of an
      </p>
      <sec id="sec-2-1">
        <title>Method</title>
        <p>Bi-LSTM-CRF
Bi-LSTM-CRF w/convolution
Stacked Bi-LSTM-CRF
Bi-LSTM-CRF w/EMA
Bi-LSTM-CRF w/convolution &amp; EMA
Stacked Bi-LSTM-CRF w/EMA
BERT
RoBERTa
DistilBERT
BioBERT</p>
        <p>Precision</p>
      </sec>
      <sec id="sec-2-2">
        <title>Test Set</title>
        <p>Recall
90.73
91.96
author-defined abbreviation detection algorithm that only
detects acronyms where the author definitions occur in the
same document. The SDU@AAAI-21 Task 1 corpus
contains acronyms both with and without definitions; mostly
without. To deal with the acronyms without local definitions
we applied two deep learning approaches, Bi-directional
LSTM with CRF and Transformer models.</p>
        <p>
          Three variations of the Bidirectional LSTM-CRF
approach
          <xref ref-type="bibr" rid="ref6">(Genthial 2020)</xref>
          were applied to the Acronym
Identification corpus: Bi-directional LSTM with CRF
          <xref ref-type="bibr" rid="ref7">(Huang, Xu,
and Yu 2015)</xref>
          , Stacked Bi-directional LSTM and CRF
          <xref ref-type="bibr" rid="ref9">(Lample et al. 2016)</xref>
          , and Bi-directional LSTM and CRF with
convolution and max-pooling
          <xref ref-type="bibr" rid="ref13">(Ma and Hovy 2016)</xref>
          . All
three variations used GloVe embeddings. We also used the
Exponential Moving Average (EMA) with all three
BiLSTMs. The Exponential Moving Average of the weights
was used to determine weights for next iteration
during training. This approach improves the effectiveness of
the above methods by a small margin
          <xref ref-type="bibr" rid="ref14">(NIST/SEMATECH
2020)</xref>
          .
        </p>
        <p>The hyperparameters used for all three models with and
without EMA are shown in Table 1.</p>
        <p>We submitted runs for Stacked Bi-directional LSTM and
CRF and Stacked Bi-directional LSTM and CRF with
Exponential Moving Average, the two highest performing runs
on the development set.</p>
        <p>
          We also applied to the task the Simple Transformers NER
implementation (version 0.49.5)
          <xref ref-type="bibr" rid="ref18">(Rajapaksee 2020)</xref>
          . We
used four transformer models fine-tuned on the SDU Task 1
training set: BERT
          <xref ref-type="bibr" rid="ref4">(Devlin et al. 2018)</xref>
          , BioBERT
          <xref ref-type="bibr" rid="ref10">(Lee et al.
2020)</xref>
          , DistilBERT (Sanh et al.
          <xref ref-type="bibr" rid="ref12">2019), and RoBERTa (Liu
et al. 2019</xref>
          ). Only the batch size was modified for the task,
the other parameters are the defaults for the Simple
Transformers NER implementation. For BioBERT, we used the
batch size 16 suggested in the Simple Transformers
documentation. For RoBERTa, we used the default batch size 8.
We submitted runs for RoBERTa and BioBERT transformer
models.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>As expected, the rule based approach had high precision and
low recall on the development set, we therefore did not
submit the results of that approach on the test set. MetaMap
achieved 96.24% precision, 17.24% recall (F1=29.24%)
on the training set and 96.55% precision, 59.36% recall
(F1=73.52%) on the development set. Interestingly, the
recall on the development set is much higher for this
rulebased approach, but not high enough to make it competitive.</p>
      <p>We submitted test results for the Bi-LSTM and
BERTbased approaches that performed best on the development
set. All results are shown in Table 2.</p>
      <p>The models using Simple Transformers are quite a bit
more resource intensive than the Bi-LSTM models. The
BiLSTM models were trained on a computer with a 4GB
GTX1050 Ti graphics card. Fine-tuning the Transformer models,
however, required much more memory and it was necessary
to train them on an Nvidia Tesla K80 with 24GB memory.
The Bi-LSTM results are comparable to the Transformer
results using less resources with a similar training time (see
Table 3.)</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>Our results clearly demonstrate that although the
algorithmic approaches currently implemented in our biomedical
named entity recognition tools have higher precision than
the explored transformer-based approaches, they clearly
miss many important terms. We are looking forward to
learning more about the approaches explored by the other
participants of the shared task. We believe that implementing
the best approaches in our tools will significantly improve
recognition of named entities in clinical notes. We hope to
improve recall, while maintaining the precision of our
algorithmic approach, which is not far behind the human
performance reported by the task organizers.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported by the intramural research program
at the U.S. National Library of Medicine, National Institutes
of Health.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A. R.</given-names>
          </string-name>
          <year>1996</year>
          .
          <source>MetaMap Technical Notes. Technical report</source>
          , NLM. URL https://ii.nlm.nih.gov/Publications/ Papers/metamap.tech.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A. R.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Lang</surname>
            ,
            <given-names>F.-M.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>An overview of MetaMap: historical perspective and recent advances</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>17</volume>
          (
          <issue>3</issue>
          ):
          <fpage>229</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          2017.
          <article-title>MetaMap Lite: an evaluation of a new Java implementation of MetaMap</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>24</volume>
          (
          <issue>4</issue>
          ):
          <fpage>841</fpage>
          -
          <lpage>844</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Chang, M.-W.;
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .04805 .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Genthial</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Named Entity Recognition with Tensorflow</article-title>
          . URL https://github.com/guillaumegenthial/tf ner.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Bidirectional LSTM-CRF models for sequence tagging</article-title>
          .
          <source>arXiv preprint arXiv:1508</source>
          .
          <year>01991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Joopudi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dandala</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and Devarakonda,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>A convolutional route to abbreviation disambiguation in clinical text</article-title>
          .
          <source>Journal of biomedical informatics</source>
          <volume>86</volume>
          :
          <fpage>71</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ; Ballesteros,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Subramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Kawakami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ; and
            <surname>Dyer</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Neural architectures for named entity recognition</article-title>
          .
          <source>arXiv preprint arXiv:1603</source>
          .
          <fpage>01360</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yoon</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Kim,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ;
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>So</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            ; and
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2020</year>
          .
          <article-title>BioBERT: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          .
          <source>Bioinformatics</source>
          <volume>36</volume>
          (4):
          <fpage>1234</fpage>
          -
          <lpage>1240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ; Yasunaga,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Nuzumlalı</surname>
          </string-name>
          , M. Y.;
          <string-name>
            <surname>Caraballo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mahajan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Krumholz, H.; and
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>A Neural Topic-Attention Model for Medical Term Abbreviation Disambiguation</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .14076 .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          2019.
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .11692 .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>End-to-end sequence labeling via bi-directional lstm-cnns-crf</article-title>
          .
          <source>arXiv preprint arXiv:1603</source>
          .
          <fpage>01354</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>NIST/SEMATECH</source>
          .
          <year>2020</year>
          .
          <article-title>e-Handbook of Statistical Methods: Single Exponential Smoothing, National Institute of Standards and Technology,</article-title>
          . URL https://www.itl.nist.gov/ div898/handbook/pmc/section4/pmc431.htm.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Pouran</given-names>
            <surname>Ben Veyseh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Dernoncourt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ;
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. H.</given-names>
            ;
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ; and
            <surname>Celi</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. A.</surname>
          </string-name>
          <year>2020a</year>
          .
          <article-title>Acronym Identification and Disambiguation shared tasksfor Scientific Document Understanding</article-title>
          .
          <source>In Proceedings of the AAAI-21 Workshop on Scientific Document Understanding.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Pouran</given-names>
            <surname>Ben Veyseh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Dernoncourt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ;
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q. H.</surname>
          </string-name>
          ; and Nguyen,
          <string-name>
            <surname>T. H.</surname>
          </string-name>
          <year>2020b</year>
          .
          <article-title>What Does This Acronym Mean? Introducing a New Dataset for Acronym Identification and Disambiguation</article-title>
          .
          <source>In Proceedings of the 28th International Conference on Computational Linguistics</source>
          ,
          <fpage>3285</fpage>
          -
          <lpage>3301</lpage>
          . Barcelona,
          <string-name>
            <surname>Spain</surname>
          </string-name>
          (Online):
          <source>International Committee on Computational Linguistics. doi:10</source>
          .18653/v1/
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>coling-main.292</source>
          . URL https://www.aclweb.org/anthology/ 2020.coling-main.
          <volume>292</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Rajapaksee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Simple Transformers</article-title>
          . URL https:// simpletransformers.ai/.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Sanh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Debut</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chaumond</surname>
            , J.; and Wolf,
            <given-names>T.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .01108 .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Schwartz</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          ; and Hearst,
          <string-name>
            <surname>M. A.</surname>
          </string-name>
          <year>2002</year>
          .
          <article-title>A simple algorithm for identifying abbreviation definitions in biomedical text</article-title>
          .
          <source>In Biocomputing</source>
          <year>2003</year>
          ,
          <fpage>451</fpage>
          -
          <lpage>462</lpage>
          . World Scientific.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Denny</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Trent Rosenbloom</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>R. A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Giuse</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Blanquicett</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Soysal</surname>
          </string-name>
          , E.;
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; and Xu,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD)</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>24</volume>
          (
          <year>e1</year>
          ):
          <fpage>e79</fpage>
          -
          <lpage>e86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Torvik</surname>
            ,
            <given-names>V. I.;</given-names>
          </string-name>
          and
          <string-name>
            <surname>Smalheiser</surname>
            ,
            <given-names>N. R.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>ADAM: another database of abbreviations in MEDLINE</article-title>
          .
          <source>Bioinformatics</source>
          <volume>22</volume>
          (22):
          <fpage>2813</fpage>
          -
          <lpage>2818</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>