<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Acronym Expander at SDU@AAAI-21: an Acronym Disambiguation Module</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jo a˜o L. M. Pereira</string-name>
          <email>joaoplmpereira@tecnico.ulisboa.pt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Helena Galhardas</string-name>
          <email>helena.galhardas@tecnico.ulisboa.pt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dennis Shasha</string-name>
          <email>shasha@cs.nyu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Courant Institute</institution>
          ,
          <addr-line>NYU</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>INESC-ID and Instituto Superior Te ́cnico, Universidade de Lisboa</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In order to properly determine which of several possible meanings an acronym A in sentence s has, any system that aims to find the correct meaning for A must understand the context of s. This paper describes the techniques we use for that problem for the SDU@AAAI benchmark in which context was provided in the form of sentences in which acronym A is present and defined. As a capsule summary of our results, Support Vector Machines with Doc2Vec techniques achieves a higher Macro F1Measure score than Cosine similarity with Classic Context Vector techniques. Although these techniques usually work better with documents (i.e., many sentences rather than the one sentence offered in this benchmark), they achieved scores of Macro F1-Measure 86-89%. While these results were 5.65% worse than the best in the benchmark experiment, the high speed of our approach (max 0.6 seconds on average per sentence on a virtual machine allocated with 4 CPU cores and 32GB of RAM in a shared server) and the possibility that our methods are complementary to those of other groups may lead to high performance hybrid systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The proper expansion of an acronym depends on context.
For example, ”HD” can mean Harmonic Distortion in a
signal context, High Definition in a video context, and
Huntington ’s Disease in a medical context. Thus, any system that
hopes to help readers understand the intended meaning of an
undefined acronym in a sentence must expand that acronym
using its context.</p>
      <p>
        An acronym expander system comprises the following
steps: (i) Extraction of both acronym and (when present) its
expansion within a text. For example, if a given text has ”HD
(High Definition)” then HD would be the acronym and High
Definition would be the expansion. We call this in-expansion
because it can be done for a particular article on its own.
(ii) In the case that an acronym is not expanded in a text,
out-expansion chooses an expansion from a previously large
parsed corpus (training corpus) like Wikipedia1. The choice
of which of several possible expansions to choose is based
on some notion of article domain similarity between the text
with a non-expanded acronym A and the articles
containing expansions for A. We participated in the SDU@AAAI
benchmark presented in
        <xref ref-type="bibr" rid="ref30">Veyseh et al. (2020)</xref>
        that tests
outexpansion only (i.e., acronym disambiguation).
      </p>
      <p>
        Our system includes two techniques for out-expansion:
Cosine similarity of Classic Context Vectors
        <xref ref-type="bibr" rid="ref18 ref25 ref9">(Abdalgader
and Skabar 2012; Prokofyev et al. 2013; Li, Ji, and Yan
2015)</xref>
        and Doc2Vec
        <xref ref-type="bibr" rid="ref7">(Le and Mikolov 2014)</xref>
        whose outputs
are used as features for Support Vector Machines (SVMs)
to create a new out-expansion technique. Moreover, we used
Wikipedia articles to enrich the training data for these
techniques. Our results show that Doc2Vec together with
Support Vector Machines (SVMs) gives the best prediction
results when using Wikipedia data. Without extra data, context
vector works best.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        To our knowledge, systems that expand abbreviations
and/or acronyms use a pre-defined dictionary of
acronymexpansions
        <xref ref-type="bibr" rid="ref4">(Gooch 2012; ABBREX2)</xref>
        as opposed to trying
to discover the proper expansion based on context.
      </p>
      <p>Ciosici and Assent (2018) proposed an
abbreviation/acronym expansion system architecture that performs
out-expansion. Unfortunately, their demo paper does not
provide enough technical details and their code is
proprietary.</p>
      <p>The remaining part of this section describes previous
work on out-expansion.</p>
      <p>
        Li, Ji, and Yan (2015) proposed two approaches to
out-expansion based on word embeddings from Word2Vec
        <xref ref-type="bibr" rid="ref12 ref14">(Mikolov et al. 2013a)</xref>
        to address the out-expansion
problem. Their best approach, called Surrounding Based
Embedding (SBE), combines the Word2Vec embeddings of
the words surrounding the acronym or the expansion.
Sim1https://www.wikipedia.org/
2http://abbrex.com/
ilarly to SBE, Ciosici, Sommer, and Assent (2019)
proposed Unsupervised Acronym Disambiguation (UAD) that
replaces each expansion occurrence in the text collection by
a normalized token and retrains the Word2Vec google news
model
        <xref ref-type="bibr" rid="ref12 ref14">(Mikolov et al. 2013a)</xref>
        on that collection. The
resulting model produces an embedding for each normalized
token, i.e., an expansion embedding.
      </p>
      <p>
        <xref ref-type="bibr" rid="ref29">Thakker, Barot, and Bagul (2017</xref>
        ) creates document
vector embeddings using Doc2Vec for each document. For each
set of documents D containing an expansion for an acronym
A, the system trains a Doc2Vec model on D which is used
to infer the embedding for an input document i containing
an undefined acronym A.
      </p>
      <p>Charbonnier and Wartena (2018) proposed an
outexpansion approach based on Word2Vec embeddings
weighted by Term Frequency-Inverse Document Frequency
(TF-IDF) scores to find out-expansions for acronyms in
scientific article captions.</p>
      <p>
        More recently, Pouran Ben
        <xref ref-type="bibr" rid="ref30">Veyseh et al. (2020)</xref>
        compare previous works in a new dataset (i.e., the Acronym
Disambiguation dataset used in SDU@AAAI competition).
The authors also propose a new model called Graph-based
Acronym Disambiguation (GAD). GAD uses word and
sentence representations obtained from Bidirectional Long
Short-Term Memory (BiLSTM) neural network. Those
representations are complemented by using syntactic structure
from a dependency tree graph to model far but important
dependencies between words using a Graph Convolutional
Neural networks (GCN)
        <xref ref-type="bibr" rid="ref5">(Kipf and Welling 2017)</xref>
        . Finally,
a two layer feedforward neural network classifier is used to
guess the expansion.
      </p>
      <p>
        A related line of work explored the expansion of
acronyms in enterprise texts
        <xref ref-type="bibr" rid="ref2 ref8">(Feng et al. 2009; Li et al.
2018)</xref>
        . For instance, in
        <xref ref-type="bibr" rid="ref8">Li et al. (2018)</xref>
        , enterprise textual
documents are used as training data as well as Wikipedia
articles and a set of features like statistics based on word
frequencies, words co-occurrences, and TF-IDF. Other works
explored acronym disambiguation in biomedical domains
        <xref ref-type="bibr" rid="ref15 ref16 ref18 ref21 ref26 ref31 ref33">(Pustejovsky et al. 2001; Pakhomov, Pedersen, and Chute
2005; Yu et al. 2006; Stevenson et al. 2009; Moon,
Pakhomov, and Melton 2012; Moon, McInnes, and Melton 2015;
Wu et al. 2015; 2017)</xref>
        .
      </p>
      <p>
        Less directly related, but insightful, is the literature on
Word Sense Disambiguation (WSD)
        <xref ref-type="bibr" rid="ref18 ref19">(Navigli 2009; Moro
and Navigli 2015)</xref>
        because that work also must make use of
the context around a token (in our case, an acronym; in the
word sense literature, a word).
      </p>
    </sec>
    <sec id="sec-3">
      <title>Out-Expansion Strategy</title>
      <p>Our out-expansion strategy consists of: (i) a
Representator to map an input sentence to a document representation
that holds contextual information and (ii) an Out-Expansion
Predictor to choose a context-appropriate out-expansion for
each acronym found in the input sentence.</p>
      <sec id="sec-3-1">
        <title>Representator</title>
        <p>Representors summarize text (documents or sentences) in
order to capture information signals about their semantics.</p>
        <p>For the competition, we tested two representator
techniques: Classic Context Vector and Doc2Vec.</p>
        <p>
          Classic Context Vector The context vector technique is
an unsupervised method used as a baseline in Word Sense
Disambiguation problems (Abdalgader and Skabar 2012)
and also in acronym disambiguation problems
          <xref ref-type="bibr" rid="ref25">(Prokofyev
et al. 2013)</xref>
          <xref ref-type="bibr" rid="ref18 ref9">(Li, Ji, and Yan 2015)</xref>
          . We denote it as classic
to distinguish it from variants or other techniques that also
provide vectors to contexts.
        </p>
        <p>A Context vector represents a term (e.g, an acronym or
expansion) by a vector based on the words that co-occur
with the term in each document of the corpus containing
that term. Thus, a context vector is a sparse vector where
each position corresponds to a word in any document in the
corpus, if the word is in a document that contains the term,
then the vector position has some positive value, otherwise
the value is zero. In the classic approach, the value at each
vector position corresponds to the number of co-occurrences
of the term and the co-occurring words in all the documents
of the corpus.</p>
        <p>In acronym disambiguation, the acronym in a particular
sentence yields a context vector (which we call the ”target
context vector”) which contains the words occurring in that
sentence and their number of occurrences.</p>
        <p>Each possible expansion for the acronym will have a
context vector as well (”potential context vector”). Classic
context vector chooses the expansion associated with the
potential context vector that is most similar to the target context
vector. The simplest similarity metric is cosine similarity.</p>
        <p>Figure 1 presents an example of a context vector for Portable
Document Format expansion using two documents. For
instance, words ”the” and ”file” occur one time in each
document and so the positions reserved to these two words in the
vector contains value 2 while the others contain 1.</p>
        <p>
          Doc2Vec Doc2Vec
          <xref ref-type="bibr" rid="ref7">(Le and Mikolov 2014)</xref>
          is a document
embedding and an unsupervised learning technique that adds
the capability of automatically learning document (or
paragraph) vectors to Word2Vec
          <xref ref-type="bibr" rid="ref12 ref14">(Mikolov et al. 2013a)</xref>
          . Given a
list of words (e.g., a text document) as input, the output of
Doc2Vec is a dense vector of real numbers (i.e., an
embedding).
        </p>
        <p>Just as Word2Vec assigns a vector to a word, Doc2Vec
assigns a vector of N dimensions called a document vector to
a document (or in the case of this benchmark to a sentence).</p>
        <p>The training problem consists of finding the best set
of embedding values for each word and document (i.e.,
Doc2Vec model parameters) that, given a document,
predicts the set of words in that document. For example,
consider a document consisting on a list containing the countries
in Figure 2. If the document is known to the Doc2Vec model
(i.e., it was included in the training data) then we have a
document embedding available, otherwise, a document
embedding d is computed by finding the best values that maximize
the prediction of the country names given d.</p>
        <p>
          In contrast to Word2Vec which averages word vectors to
represent a particular document, Doc2Vec creates a trained
vector for each document in the corpus
          <xref ref-type="bibr" rid="ref18">(Dai, Olah, and Le
2015)</xref>
          . By comparing those document vectors through
co
        </p>
        <p>Documents containing the Portable Document Format expansion
Potential Context Vector for the Portable Document Format expansion
Words the
Count 2
file format increases in
2 1 1 1
popularity formats including
1 1 1
sine similarity, Doc2Vec can infer semantically similar
documents.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Out-Expansion Predictor</title>
        <p>Out-expansion predictors select an out-expansion for a given
acronym A in an input sentence ins. For this purpose, a
predictor considers each sentence containing a valid expansion
E for A. Sentences are characterized by the representator
output explained in the previous section.</p>
        <p>In the case of the Classic Context Vector, we compare the
input sentence context vector with the vector resulting from
summing the context vectors of the sentences for E. In this
classical approach, because we have only a context vector
per expansion E, we use cosine similarity to evaluate
similarity.</p>
        <p>
          We consider the use of machine learning classifiers as
alternatives to cosine similarity when more than one training
sample is possible for an expansion (i.e., label). This is the
case of Doc2Vec whose embeddings represent a set of words
(e.g., document or sentence) so we will have as many
samples per expansion as set of words (e.g., sentences) where it
occurs. However, for Classic Context Vector, it is not
possible to use such machine learning approach because we
have a context vector per expansion and so only one
sample per expansion. Specifically, for the competition we used
Support Vector Machines (SVMs) where non-binary
classification was performed by a ”one-vs-all” approach where a
binary SVM classifier predicts with a certain probability if
a sample belongs to a particular class. The class with
highest probability is selected. We used the LibLinear
          <xref ref-type="bibr" rid="ref1">(Fan et
al. 2008)</xref>
          implementation included in sckit-learn toolkit
          <xref ref-type="bibr" rid="ref23">(Pedregosa et al. 2011)</xref>
          .
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>SDU@AAAI Benchmark of Out-expansion</title>
    </sec>
    <sec id="sec-5">
      <title>Techniques</title>
      <p>This section describes the SDU@AAAI benchmark used to
evaluate the out-expansion techniques described in the
previous section.</p>
      <sec id="sec-5-1">
        <title>Datasets</title>
        <sec id="sec-5-1-1">
          <title>The datasets that we use in this benchmark are:</title>
          <p>
            SciAD contains sentences from human annotated scientific
articles extracted from ArXiv 3. Each sentence contains an
acronym to disambiguate. This is the dataset provided for
the SDU@AAAI Acronym Disambiguation (AD)
competition and it was proposed in
            <xref ref-type="bibr" rid="ref24 ref30">(Pouran Ben Veyseh
et al. 2020)</xref>
            . There are three data splits: (i) Train with
50,033 sentences, (ii) Dev with 6,188 sentences, and (iii)
Test with 6,217 sentences where acronym expansion is
unknown. This dataset also contains a dictionary with
acronyms and their possible expansions.
          </p>
          <p>
            Wikipedia contains all English articles of Wikipedia.org
taken from the Wikipedia dump of March 1, 20204. We
used the WikiExtractor5 software to obtain the articles in
plain text, and we used the
            <xref ref-type="bibr" rid="ref27">Schwartz and Hearst (2003)</xref>
            algorithm to extract acronyms and expansions from each
Wikipedia article.
          </p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Data Preparation</title>
        <p>We process the datasets by removing punctuation and
normalizing tokens in order to create a better textual
representation. That is, we perform the following operations on each
dataset:
SciAD We remove non alphanumeric tokens, punctuation
characters, and stop-words. Then, we transform each
token to its stem, e.g. expander, expanding, and expanded
all map to expand. We use the Porter Stemmer algorithm
from the Natural Language Toolkit (NLTK) (Bird, Klein,
and Loper 2009) for that purpose.</p>
        <p>Wikipedia Because expansions in Wikipedia may be
written in different formats and with plurals, we normalize
the expansions found against the dictionary of acronym</p>
        <sec id="sec-5-2-1">
          <title>3https://arxiv.org/ 4https://dumps.wikimedia.org/enwiki 5http://medialab.di.unipi.it/wiki/Wikipedia Extractor</title>
          <p>expansions shared with SciAD. So, each expansion in the
Wikipedia documents is replaced by the closest expansion
in the SciAD dictionary. Distance is given by comparing
the expansion in Wikipedia against a SciAD expansion,
if the first 4 characters of each word are equal we
consider the expansions to be equal (distance=0); distance
is given by the edit-distance between both expansions, if
the edit-distance is below 3 then the expansions are close
enough, otherwise they are considered two distinct
expansions. Wikipedia expansions not close to any expansion in
the SciAD dictionary and their corresponding documents
are not considered for prediction because only the
expansions in the dictionary are valid for the SciAD evaluation
set. Furthermore, while keeping the expansions in text,
we apply the tokenizer from NLTK and remove the non
alphanumeric tokens, punctuation characters, and
stopwords. Finally, we transform each token to its stem as we
did for the SciAD dataset.</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Out-expansion Techniques</title>
        <p>
          For the SDU@AAAI AD competition, we test the following
out-expansion techniques: we use (i) the Cosine similarity
(Cossim) with the Classic Context Vector
          <xref ref-type="bibr" rid="ref18 ref9">(Li, Ji, and Yan
2015)</xref>
          and (ii) the outputs of Doc2Vec as features for Support
Vector Machines (SVMs).
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>Prediction and Performance Metrics</title>
        <p>For the SDU@AAAI competition, we use the following
metrics:</p>
      </sec>
      <sec id="sec-5-5">
        <title>Out-expansion Macro Averages: the average of the Preci</title>
        <p>sion, Recall and F1-Measure for each expansion. These
are the official metrics used in the SDU@AAAI
competition, being the Macro F1-Measure used to rank the
competitors based on expansion prediction quality.</p>
        <p>Training execution times: the execution time to create the
representator model based on the training sentences
and/or documents.</p>
      </sec>
      <sec id="sec-5-6">
        <title>Average execution times per sentence: the average exe</title>
        <p>cution time to predict the expansions for the acronym in a
sentence.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Experimental Evaluation</title>
      <p>This section reports on the out-expansion experiments.</p>
      <p>We run the experiments on a machine with the
following specifications: Virtual Machine (VM) with 4 CPU cores
from an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz,
32GiB of RAM (Random Access Memory), and Ubuntu
18.04.3 LTS.</p>
      <sec id="sec-6-1">
        <title>Out-Expansions on the AAAI Benchmark</title>
        <p>In this section, we report the results obtained using the
SciAD dataset. We used the Train set and Dev set as test
data to tune the hyperparameters of Doc2Vec and SVMs.
Then, we used the Train and Dev sets as training data and the
Test set as evaluation. Evaluation quality measures were
provided by the Codelab competition evaluation system6. We
have submitted two combinations of out-expander
predictors and representators: (i) cosine similarity (Cossim) as
predictor with Classic Context Vector as representator, and (ii)
Support Vector Machine (SVM) as predictor with Doc2Vec
as representator.</p>
        <p>In Table 1, we report the out-expansion macro averages
for predicting expansions for acronyms in sentences:
Precision (P), Recall (R), and F1-measure (F1). Macro
F1measure is the official measure for ranking competitors in</p>
        <sec id="sec-6-1-1">
          <title>6https://competitions.codalab.org/competitions/26611</title>
          <p>P (%)
88.24%
91.54%
Acronym out-expansion technique
Predictor Representator
Cossim Classic Context Vector
SVM Doc2Vec
the SDU@AAAI competition. We also report the best
results obtained for both the Dev set used for hyperparmeter
selection and the Test set as testing data. In addition, we
report the execution times for training and the average per
predicted acronym in a sentence (note that each sentence
contains only one acronym to expand).</p>
          <p>We can see that Cossim with Classic Context Vector
achieved the best results. In general, both techniques have
slightly lower recall (less than 1%) in the Test set than in
the Dev test but higher macro precisions (1-2%). Since the
gains in macro precisions are higher than the losses in macro
recalls for the Test set, the harmonic means of both macros
(i.e., macro F1-measures) are higher in the Test set.
Differences among the techniques are consistent across various
test sets. Regarding execution times, Cossim with Classic
Context Vector is faster in training (91s) and SVM with
Doc2Vec is faster on average per sentence (0.09s).
Classic Context Vector counts word occurrences at training time
while Doc2Vec trains a neural network for word and
document embeddings with several iterations over the training
corpus (e.g., 200). Both training and sentence processing
times are low given that they are executed on a regular
machine (4 CPU cores and 32GB of RAM). Both techniques
are lightweight solutions for this problem.</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>Cross-Training and Additional Data for the SDU@AAAI</title>
        <p>Competition For our next set of experiments for the
SDU@AAAI competition, we increase the training data
provided by the competition sets with documents obtained
from Wikipedia, i.e., the Wikipedia dataset. We wanted to
test whether additional data and cross-training data helps to
solve this problem and which techniques can benefit from
such a data increment.</p>
        <p>Table 2 shows the macro averages on the SciAD Dev
set using SciAD train and Wikipedia documents as
training data; and the macro averages and execution times on the
SciAD test set using the above training sets plus the Dev set
as training data.</p>
        <p>In contrast to previous results where Wikipedia data was
not used, after adding Wikipedia documents to the training
data, SVM with Doc2Vec obtains the best results. That
combination also benefits from using Wikipedia data. The three
macro averages are lower when applied to the Dev set than
when applied to the Test set.</p>
        <p>Consistent with the experimental results without
Wikipedia, Cossim with Classic Context Vector is faster
in training than SVM with Doc2Vec, while slower in
per-sentence processing. Training both techniques is much
slower with the addition of Wikipedia data, yet fast enough
for a regular machine (e.g., 2 hours to train the Doc2Vec
model). On average, to process a sentence, the incorporation
of Wikipedia slows down Cossim with Classic Context
Vector by 0.43 seconds and slows down SVM with Doc2Vec
by 0.03 seconds.</p>
        <p>
          Most of the excellent efforts by other research groups
submitted to the competition are transformer-based models
that use pretrained models like BERT (Devlin et al. 2018),
ROBERTA
          <xref ref-type="bibr" rid="ref10">(Liu et al. 2019)</xref>
          , and SciBERT (Beltagy, Lo,
and Cohan 2019). Those works mostly distinguish
themselves on how they adapt such transformers models to
outexpansion
          <xref ref-type="bibr" rid="ref24 ref30">(Veyseh et al. 2020)</xref>
          . The three leaderboard works
use transformers and their macro F1-measures range from
93.19% to 94.05%. In our understanding, only three works
including ours explored alternative techniques to
transformers, no other work explored Doc2Vec or SVMs. Although
our best technique scores are 6% less than the best in
competition, we believe that our techniques are distinct enough to
be complements to transformer-based techniques or may
introduce a lighter/faster approach to this problem since
transformer models even using GPUs (Graphics processing units)
or TPUs (Tensor Processing Unit (TPU)) usually take more
time to train and to process data than Doc2Vec and SVMs.
Further, our approaches could work better when the context
consists of entire documents rather than single sentences,
which is our core use case.
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusions and Future Work</title>
      <p>We have evaluated two rapid techniques for acronym
disambiguation using the SDU@AAAI benchmarks. We have
found that Cosine similarity with Classic Context Vector
works best when no Wikipedia data is used. SVM with
Doc2Vec outperforms Cosine similarity with Classic
Context Vector when using Wikipedia data. Our overall
results, as measured by F1-measure score, are within 5.7% of
the best system in competition. By analyzing the execution
times of each phase (training and evaluation of sentences),
we showed that our approach is lightweight even on a
standard computer.</p>
      <p>We believe we could have improved performance if we
had used data sources in addition to Wikipedia such as
abstracts from articles in web repositories to make the domain
closer to the SDU@AAAI competition data.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>Pereira’s work was supported by national funds through
FCT (Fundac¸ a˜o para a Cieˆncia e a Tecnologia), under
the PhD Scholarship SFRH/BD/135719/2018. Furthermore,
Pereira and Galhardas’ work was supported by national
funds through FCT under the project UIDB/50021/2020.</p>
      <p>Shasha’s work has been partly supported by (i) the New
York University Abu Dhabi Center for Interacting Urban
Networks (CITIES), funded by Tamkeen under the NYUAD
Research Institute Award CG001 and by the Swiss Re
Institute under the Quantum Cities initiative, (ii) NYU Wireless,
and (iii) U.S. National Science Foundation grants 1934388,
1840761, and 1339362.</p>
      <p>The server virtual machine used to run the experiments
was supported by BioData.pt – Infraestrutura Portuguesa de
Dados Biolo´gicos, project 22231/01/SAICT/2016, funded
by Portugal 2020.
Abdalgader, K., and Skabar, A. 2012. Unsupervised
Similarity-based Word Sense Disambiguation Using
Context Vectors and Sentential Word Importance. ACM
Transactions on Speech and Language Processing 9(1):2–21.
Beltagy, I.; Lo, K.; and Cohan, A. 2019. SciBERT:
Pretrained Language Model for Scientific Text. In Proceedings
of the 2019 Conference on Empirical Methods in Natural
Language Processing.</p>
      <p>Bird, S.; Klein, E.; and Loper, E. 2009. Natural Language
Processing with Python. O’Reilly Media, Inc., 1st edition.
Charbonnier, J., and Wartena, C. 2018. Using word
embeddings for unsupervised acronym disambiguation. In
Proceedings of the Twenty-Seventh International Conference
on Computational Linguistics, 2610–2619. Santa Fe, New
Mexico: Association for Computational Linguistics.
Ciosici, M. R., and Assent, I. 2018. Abbreviation expander
- a web-based system for easy reading of technical
documents. In Proceedings of the Twenty-Seventh International
Conference on Computational Linguistics: System
Demonstrations, 1–4. Santa Fe, New Mexico: Association for
Computational Linguistics.</p>
      <p>Ciosici, M. R.; Sommer, T.; and Assent, I. 2019.
Unsupervised abbreviation disambiguation contextual
disambiguation using word embeddings. arXiv preprint.
arXiv:1904.00929v2 [cs.CL]. Ithaca, NY: Cornell
University Library.</p>
      <p>Dai, A. M.; Olah, C.; and Le, Q. V. 2015.
Document embedding with paragraph vectors. arXiv preprint.
arXiv:1507.07998 [cs.CL]. Ithaca, NY: Cornell University
Library.</p>
      <p>Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2018.
BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding. arXiv preprint. arXiv:1810.04805
[cs.CL]. Ithaca, NY: Cornell University Library.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Fan</surname>
          </string-name>
          , R.-E.;
          <string-name>
            <surname>Chang</surname>
            , K.-W.; Hsieh,
            <given-names>C.</given-names>
          </string-name>
          -J.;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          -R.; and
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.-J.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>LIBLINEAR: A Library for Large Linear Classification</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>9</volume>
          :
          <fpage>1871</fpage>
          -
          <lpage>1874</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Xiong,
          <string-name>
            <given-names>Y.</given-names>
            ;
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          ; and Liu,
          <string-name>
            <surname>W.</surname>
          </string-name>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>Acronym extraction and disambiguation in large-scale organizational web pages</article-title>
          .
          <source>In Proceedings of the Eighteenth ACM Conference on Information and Knowledge Management</source>
          ,
          <fpage>1693</fpage>
          -
          <lpage>1696</lpage>
          . New York, NY: Association for Computing Machinery.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Gooch</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>BADREX: in situ expansion and coreference of biomedical abbreviations using dynamic regular expressions</article-title>
          .
          <source>arXiv preprint. arXiv:1206.4522 [cs.CL]. Ithaca</source>
          , NY: Cornell University Library.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Kipf</surname>
            ,
            <given-names>T. N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Welling</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Semi-supervised classification with graph convolutional networks</article-title>
          .
          <source>arXiv preprint.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>arXiv:1609.02907 [cs.CL]. Ithaca</source>
          , NY: Cornell University Library.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Distributed representations of sentences and documents</article-title>
          .
          <source>In Proceedings of the ThirtyFirst International Conference on Machine Learning</source>
          ,
          <fpage>1188</fpage>
          -
          <lpage>1196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fuxman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Guess Me if You Can: Acronym Disambiguation for Enterprises</article-title>
          .
          <source>In Proceedings of the Fifty-Sixth Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <fpage>1308</fpage>
          -
          <lpage>1317</lpage>
          . Melbourne, Australia: Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Acronym Disambiguation Using Word Embedding</article-title>
          .
          <source>In Proceedings of the TwentyNinth AAAI Conference on Artificial Intelligence</source>
          ,
          <fpage>4178</fpage>
          -
          <lpage>4179</lpage>
          . Menlo Park, Calif.: AAAI Press.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Joshi,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ;
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ;
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ; and
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>RoBERTa: A Robustly Optimized BERT Pretraining</surname>
          </string-name>
          <article-title>Approach</article-title>
          . arXiv preprint. arXiv:
          <year>1907</year>
          .
          <article-title>11692 [cs</article-title>
          .CL]. Ithaca, NY: Cornell University Library.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Corrado</surname>
          </string-name>
          , G.; and
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2013a</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <article-title>Efficient Estimation of Word Representations in Vector Space. arXiv preprint</article-title>
          .
          <source>arXiv:1301.3781v3 [cs.CL]. Ithaca</source>
          , NY: Cornell University Library.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ; Chen,
          <string-name>
            <given-names>K.</given-names>
            ;
            <surname>Corrado</surname>
          </string-name>
          , G.; and
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2013b</year>
          .
          <article-title>Distributed Representations of Words and Phrases and Their Compositionality</article-title>
          .
          <source>In Proceedings of the TwentySixth International Conference on Neural Information Processing Systems</source>
          ,
          <volume>3111</volume>
          -
          <fpage>3119</fpage>
          . Red Hook, NY: Curran Associates Inc.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Moon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>McInnes</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and Melton,
          <string-name>
            <surname>G. B.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>Challenges and practical approaches with word sense disambiguation of acronyms and abbreviations in the clinical domain</article-title>
          .
          <source>Healthcare Informatics Research</source>
          <volume>21</volume>
          (
          <issue>1</issue>
          ):
          <fpage>35</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Moon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pakhomov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and Melton,
          <string-name>
            <surname>G. B.</surname>
          </string-name>
          <year>2012</year>
          .
          <article-title>Automated disambiguation of acronyms and abbreviations in clinical texts: window and training size considerations</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>AMIA Annual Symposium proceedings</source>
          <year>2012</year>
          :
          <fpage>1310</fpage>
          -
          <lpage>1319</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Moro</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2015</year>
          . SemEval-2015 task 13:
          <article-title>Multilingual all-words sense disambiguation and entity linking</article-title>
          .
          <source>In Proceedings of the Ninth International Workshop on Semantic Evaluation</source>
          ,
          <fpage>288</fpage>
          -
          <lpage>297</lpage>
          . Denver, Colorado: Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Word sense disambiguation: A survey</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>ACM Computing Surveys</source>
          <volume>41</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Pakhomov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Pedersen,
          <string-name>
            <given-names>T.</given-names>
            ; and
            <surname>Chute</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. G.</surname>
          </string-name>
          <year>2005</year>
          .
          <article-title>Abbreviation and acronym disambiguation in clinical discourse</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>AMIA Annual Symposium proceedings</source>
          <year>2005</year>
          :
          <fpage>589</fpage>
          -
          <lpage>593</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; Weiss, R.;
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Brucher,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Perrot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ; and
            <surname>Duchesnay</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>Scikitlearn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Pouran</given-names>
            <surname>Ben Veyseh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Dernoncourt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ;
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q. H.</surname>
          </string-name>
          ; and Nguyen,
          <string-name>
            <surname>T. H.</surname>
          </string-name>
          <year>2020</year>
          .
          <article-title>What does this acronym mean? introducing a new dataset for acronym identification and disambiguation</article-title>
          .
          <source>In Proceedings of the Twenty-Eighth International Conference on Computational Linguistics</source>
          ,
          <fpage>3285</fpage>
          -
          <lpage>3301</lpage>
          . Barcelona,
          <string-name>
            <surname>Spain</surname>
          </string-name>
          (Online):
          <source>International Committee on Computational Linguistics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Prokofyev</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Demartini,
          <string-name>
            <given-names>G.</given-names>
            ;
            <surname>Boyarsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Ruchayskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ; and
            <surname>Cudre</surname>
          </string-name>
          ´-Mauroux, P.
          <year>2013</year>
          .
          <article-title>Ontology-based word sense disambiguation for scientific literature</article-title>
          .
          <source>In Proceedings of the Thirty-Fifth European Conference on Advances in Information Retrieval</source>
          ,
          <fpage>594</fpage>
          -
          <lpage>605</lpage>
          . Berlin, Heidelberg: SpringerVerlag.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Pustejovsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Castan˜o, J.;
          <string-name>
            <surname>Cochran</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kotecki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and Morrell,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2001</year>
          .
          <article-title>Automatic extraction of acronymmeaning pairs from MEDLINE databases</article-title>
          .
          <source>Studies in Health Technology and Informatics 84(Pt</source>
          <volume>1</volume>
          ):
          <fpage>371</fpage>
          -
          <lpage>375</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Schwartz</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Hearst</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <year>2003</year>
          .
          <article-title>A simple algorithm for identifying abbreviation definitions in biomedical text</article-title>
          .
          <source>In Pacific Symposium on Biocomputing</source>
          ,
          <volume>451</volume>
          -
          <fpage>462</fpage>
          . Singapore: World Scientific Press.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          2009.
          <article-title>Disambiguation of Biomedical Abbreviations</article-title>
          .
          <source>In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing</source>
          ,
          <fpage>71</fpage>
          -
          <lpage>79</lpage>
          . Boulder, Colorado: Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Thakker</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Barot</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Bagul</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Acronym Disambiguation: A Domain Independent Approach</article-title>
          . arXiv preprint.
          <source>arXiv:1711.09271v3 [cs.CL]. Ithaca</source>
          , NY: Cornell University Library.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Veyseh</surname>
            ,
            <given-names>A. P. B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dernoncourt</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T. H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Celi</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Acronym identification and disambiguation shared tasks for scientific document understanding</article-title>
          .
          <source>In Proceedings of the AAAI-21 Workshop on Scientific Document Understanding.</source>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Zhang, Y.; and Xu,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>Clinical abbreviation disambiguation using neural word embeddings</article-title>
          .
          <source>In Proceedings of the Workshop on Biomedical Natural Language Processing</source>
          ,
          <fpage>171</fpage>
          -
          <lpage>176</lpage>
          . Beijing, China: Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Denny</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Trent Rosenbloom</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>R. A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Giuse</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Blanquicett</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Soysal</surname>
          </string-name>
          , E.;
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; and Xu,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD)</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>24</volume>
          (
          <year>e1</year>
          ):
          <fpage>e79</fpage>
          -
          <lpage>e86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hatzivassiloglou</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Wilbur</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <given-names>A Large</given-names>
            <surname>Scale</surname>
          </string-name>
          ,
          <article-title>Corpus-Based Approach for Automatically Disambiguating Biomedical Abbreviations</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          <volume>24</volume>
          (
          <issue>3</issue>
          ):
          <fpage>380</fpage>
          -
          <lpage>404</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>