=Paper=
{{Paper
|id=Vol-2831/paper32
|storemode=property
|title=Acronym Expander at SDU@AAAI-21: an Acronym Disambiguation Module
|pdfUrl=https://ceur-ws.org/Vol-2831/paper32.pdf
|volume=Vol-2831
|authors=João L. M. Pereira,Helena Galhardas,Dennis Shasha
|dblpUrl=https://dblp.org/rec/conf/aaai/PereiraGS21
}}
==Acronym Expander at SDU@AAAI-21: an Acronym Disambiguation Module==
Acronym Expander at SDU@AAAI-21: an Acronym Disambiguation Module
João L. M. Pereira,1 Helena Galhardas, 1 Dennis Shasha 2
1
INESC-ID and Instituto Superior Técnico, Universidade de Lisboa
2
Courant Institute, NYU
joaoplmpereira@tecnico.ulisboa.pt, helena.galhardas@tecnico.ulisboa.pt, shasha@cs.nyu.edu
Abstract (ii) In the case that an acronym is not expanded in a text,
out-expansion chooses an expansion from a previously large
In order to properly determine which of several possible parsed corpus (training corpus) like Wikipedia1 . The choice
meanings an acronym A in sentence s has, any system that
aims to find the correct meaning for A must understand the
of which of several possible expansions to choose is based
context of s. on some notion of article domain similarity between the text
with a non-expanded acronym A and the articles contain-
This paper describes the techniques we use for that problem
for the SDU@AAAI benchmark in which context was pro- ing expansions for A. We participated in the SDU@AAAI
vided in the form of sentences in which acronym A is present benchmark presented in Veyseh et al. (2020) that tests out-
and defined. expansion only (i.e., acronym disambiguation).
As a capsule summary of our results, Support Vector Ma- Our system includes two techniques for out-expansion:
chines with Doc2Vec techniques achieves a higher Macro F1- Cosine similarity of Classic Context Vectors (Abdalgader
Measure score than Cosine similarity with Classic Context and Skabar 2012; Prokofyev et al. 2013; Li, Ji, and Yan
Vector techniques. Although these techniques usually work 2015) and Doc2Vec (Le and Mikolov 2014) whose outputs
better with documents (i.e., many sentences rather than the are used as features for Support Vector Machines (SVMs)
one sentence offered in this benchmark), they achieved scores to create a new out-expansion technique. Moreover, we used
of Macro F1-Measure 86-89%. Wikipedia articles to enrich the training data for these tech-
While these results were 5.65% worse than the best in the niques. Our results show that Doc2Vec together with Sup-
benchmark experiment, the high speed of our approach (max port Vector Machines (SVMs) gives the best prediction re-
0.6 seconds on average per sentence on a virtual machine al- sults when using Wikipedia data. Without extra data, context
located with 4 CPU cores and 32GB of RAM in a shared
server) and the possibility that our methods are complemen-
vector works best.
tary to those of other groups may lead to high performance
hybrid systems. Related Work
To our knowledge, systems that expand abbreviations
Introduction and/or acronyms use a pre-defined dictionary of acronym-
expansions (Gooch 2012; ABBREX2 ) as opposed to trying
to discover the proper expansion based on context.
The proper expansion of an acronym depends on context.
Ciosici and Assent (2018) proposed an abbrevia-
For example, ”HD” can mean Harmonic Distortion in a sig-
tion/acronym expansion system architecture that performs
nal context, High Definition in a video context, and Hunt-
out-expansion. Unfortunately, their demo paper does not
ington ’s Disease in a medical context. Thus, any system that
provide enough technical details and their code is propri-
hopes to help readers understand the intended meaning of an
etary.
undefined acronym in a sentence must expand that acronym
The remaining part of this section describes previous
using its context.
work on out-expansion.
An acronym expander system comprises the following
steps: (i) Extraction of both acronym and (when present) its Li, Ji, and Yan (2015) proposed two approaches to
expansion within a text. For example, if a given text has ”HD out-expansion based on word embeddings from Word2Vec
(High Definition)” then HD would be the acronym and High (Mikolov et al. 2013a) to address the out-expansion prob-
Definition would be the expansion. We call this in-expansion lem. Their best approach, called Surrounding Based Em-
because it can be done for a particular article on its own. bedding (SBE), combines the Word2Vec embeddings of
the words surrounding the acronym or the expansion. Sim-
Copyright © 2021 for this paper by its authors. Use permitted under
1
Creative Commons License Attribution 4.0 International (CC BY https://www.wikipedia.org/
2
4.0). http://abbrex.com/
ilarly to SBE, Ciosici, Sommer, and Assent (2019) pro- For the competition, we tested two representator tech-
posed Unsupervised Acronym Disambiguation (UAD) that niques: Classic Context Vector and Doc2Vec.
replaces each expansion occurrence in the text collection by
a normalized token and retrains the Word2Vec google news Classic Context Vector The context vector technique is
model (Mikolov et al. 2013a) on that collection. The result- an unsupervised method used as a baseline in Word Sense
ing model produces an embedding for each normalized to- Disambiguation problems (Abdalgader and Skabar 2012)
ken, i.e., an expansion embedding. and also in acronym disambiguation problems (Prokofyev
Thakker, Barot, and Bagul (2017) creates document vec- et al. 2013)(Li, Ji, and Yan 2015). We denote it as classic
tor embeddings using Doc2Vec for each document. For each to distinguish it from variants or other techniques that also
set of documents D containing an expansion for an acronym provide vectors to contexts.
A, the system trains a Doc2Vec model on D which is used A Context vector represents a term (e.g, an acronym or
to infer the embedding for an input document i containing expansion) by a vector based on the words that co-occur
an undefined acronym A. with the term in each document of the corpus containing
Charbonnier and Wartena (2018) proposed an out- that term. Thus, a context vector is a sparse vector where
expansion approach based on Word2Vec embeddings each position corresponds to a word in any document in the
weighted by Term Frequency-Inverse Document Frequency corpus, if the word is in a document that contains the term,
(TF-IDF) scores to find out-expansions for acronyms in sci- then the vector position has some positive value, otherwise
entific article captions. the value is zero. In the classic approach, the value at each
More recently, Pouran Ben Veyseh et al. (2020) com- vector position corresponds to the number of co-occurrences
pare previous works in a new dataset (i.e., the Acronym of the term and the co-occurring words in all the documents
Disambiguation dataset used in SDU@AAAI competition). of the corpus.
The authors also propose a new model called Graph-based In acronym disambiguation, the acronym in a particular
Acronym Disambiguation (GAD). GAD uses word and sentence yields a context vector (which we call the ”target
sentence representations obtained from Bidirectional Long context vector”) which contains the words occurring in that
Short-Term Memory (BiLSTM) neural network. Those rep- sentence and their number of occurrences.
resentations are complemented by using syntactic structure Each possible expansion for the acronym will have a con-
from a dependency tree graph to model far but important text vector as well (”potential context vector”). Classic con-
dependencies between words using a Graph Convolutional text vector chooses the expansion associated with the poten-
Neural networks (GCN) (Kipf and Welling 2017). Finally, tial context vector that is most similar to the target context
a two layer feedforward neural network classifier is used to vector. The simplest similarity metric is cosine similarity.
guess the expansion. Figure 1 presents an example of a context vector for Portable
A related line of work explored the expansion of Document Format expansion using two documents. For in-
acronyms in enterprise texts (Feng et al. 2009; Li et al. stance, words ”the” and ”file” occur one time in each docu-
2018). For instance, in Li et al. (2018), enterprise textual ment and so the positions reserved to these two words in the
documents are used as training data as well as Wikipedia ar- vector contains value 2 while the others contain 1.
ticles and a set of features like statistics based on word fre- Doc2Vec Doc2Vec (Le and Mikolov 2014) is a document
quencies, words co-occurrences, and TF-IDF. Other works embedding and an unsupervised learning technique that adds
explored acronym disambiguation in biomedical domains the capability of automatically learning document (or para-
(Pustejovsky et al. 2001; Pakhomov, Pedersen, and Chute graph) vectors to Word2Vec (Mikolov et al. 2013a). Given a
2005; Yu et al. 2006; Stevenson et al. 2009; Moon, Pakho- list of words (e.g., a text document) as input, the output of
mov, and Melton 2012; Moon, McInnes, and Melton 2015; Doc2Vec is a dense vector of real numbers (i.e., an embed-
Wu et al. 2015; 2017). ding).
Less directly related, but insightful, is the literature on Just as Word2Vec assigns a vector to a word, Doc2Vec as-
Word Sense Disambiguation (WSD) (Navigli 2009; Moro signs a vector of N dimensions called a document vector to
and Navigli 2015) because that work also must make use of a document (or in the case of this benchmark to a sentence).
the context around a token (in our case, an acronym; in the The training problem consists of finding the best set
word sense literature, a word). of embedding values for each word and document (i.e.,
Doc2Vec model parameters) that, given a document, pre-
Out-Expansion Strategy dicts the set of words in that document. For example, con-
sider a document consisting on a list containing the countries
Our out-expansion strategy consists of: (i) a Representa- in Figure 2. If the document is known to the Doc2Vec model
tor to map an input sentence to a document representation (i.e., it was included in the training data) then we have a doc-
that holds contextual information and (ii) an Out-Expansion ument embedding available, otherwise, a document embed-
Predictor to choose a context-appropriate out-expansion for ding d is computed by finding the best values that maximize
each acronym found in the input sentence. the prediction of the country names given d.
In contrast to Word2Vec which averages word vectors to
Representator represent a particular document, Doc2Vec creates a trained
Representors summarize text (documents or sentences) in vector for each document in the corpus (Dai, Olah, and Le
order to capture information signals about their semantics. 2015). By comparing those document vectors through co-
Documents containing the Portable Document Format expansion
Potential Context Vector for the Portable Document Format expansion
Words the file format increases in popularity formats including
Count 2 2 1 1 1 1 1 1
Figure 1: Classic context vector example for portable document format expansion.
sine similarity, Doc2Vec can infer semantically similar doc- Datasets
uments. The datasets that we use in this benchmark are:
Out-Expansion Predictor SciAD contains sentences from human annotated scientific
articles extracted from ArXiv 3 . Each sentence contains an
Out-expansion predictors select an out-expansion for a given
acronym to disambiguate. This is the dataset provided for
acronym A in an input sentence ins. For this purpose, a pre-
the SDU@AAAI Acronym Disambiguation (AD) com-
dictor considers each sentence containing a valid expansion
petition and it was proposed in (Pouran Ben Veyseh
E for A. Sentences are characterized by the representator
et al. 2020). There are three data splits: (i) Train with
output explained in the previous section.
50,033 sentences, (ii) Dev with 6,188 sentences, and (iii)
In the case of the Classic Context Vector, we compare the
Test with 6,217 sentences where acronym expansion is
input sentence context vector with the vector resulting from
unknown. This dataset also contains a dictionary with
summing the context vectors of the sentences for E. In this
acronyms and their possible expansions.
classical approach, because we have only a context vector
per expansion E, we use cosine similarity to evaluate simi- Wikipedia contains all English articles of Wikipedia.org
larity. taken from the Wikipedia dump of March 1, 20204 . We
We consider the use of machine learning classifiers as al- used the WikiExtractor5 software to obtain the articles in
ternatives to cosine similarity when more than one training plain text, and we used the Schwartz and Hearst (2003)
sample is possible for an expansion (i.e., label). This is the algorithm to extract acronyms and expansions from each
case of Doc2Vec whose embeddings represent a set of words Wikipedia article.
(e.g., document or sentence) so we will have as many sam-
ples per expansion as set of words (e.g., sentences) where it Data Preparation
occurs. However, for Classic Context Vector, it is not pos- We process the datasets by removing punctuation and nor-
sible to use such machine learning approach because we malizing tokens in order to create a better textual represen-
have a context vector per expansion and so only one sam- tation. That is, we perform the following operations on each
ple per expansion. Specifically, for the competition we used dataset:
Support Vector Machines (SVMs) where non-binary classi-
SciAD We remove non alphanumeric tokens, punctuation
fication was performed by a ”one-vs-all” approach where a
characters, and stop-words. Then, we transform each to-
binary SVM classifier predicts with a certain probability if
ken to its stem, e.g. expander, expanding, and expanded
a sample belongs to a particular class. The class with high-
all map to expand. We use the Porter Stemmer algorithm
est probability is selected. We used the LibLinear (Fan et
from the Natural Language Toolkit (NLTK) (Bird, Klein,
al. 2008) implementation included in sckit-learn toolkit (Pe-
and Loper 2009) for that purpose.
dregosa et al. 2011).
Wikipedia Because expansions in Wikipedia may be writ-
SDU@AAAI Benchmark of Out-expansion ten in different formats and with plurals, we normalize
the expansions found against the dictionary of acronym
Techniques
3
This section describes the SDU@AAAI benchmark used to https://arxiv.org/
4
evaluate the out-expansion techniques described in the pre- https://dumps.wikimedia.org/enwiki
5
vious section. http://medialab.di.unipi.it/wiki/Wikipedia Extractor
Figure 2: Countries and Capitals vectors. Modified from (Mikolov et al. 2013b).
expansions shared with SciAD. So, each expansion in the Training execution times: the execution time to create the
Wikipedia documents is replaced by the closest expansion representator model based on the training sentences
in the SciAD dictionary. Distance is given by comparing and/or documents.
the expansion in Wikipedia against a SciAD expansion,
Average execution times per sentence: the average exe-
if the first 4 characters of each word are equal we con-
cution time to predict the expansions for the acronym in a
sider the expansions to be equal (distance=0); distance
sentence.
is given by the edit-distance between both expansions, if
the edit-distance is below 3 then the expansions are close
enough, otherwise they are considered two distinct expan- Experimental Evaluation
sions. Wikipedia expansions not close to any expansion in This section reports on the out-expansion experiments.
the SciAD dictionary and their corresponding documents We run the experiments on a machine with the follow-
are not considered for prediction because only the expan- ing specifications: Virtual Machine (VM) with 4 CPU cores
sions in the dictionary are valid for the SciAD evaluation from an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz,
set. Furthermore, while keeping the expansions in text, 32GiB of RAM (Random Access Memory), and Ubuntu
we apply the tokenizer from NLTK and remove the non 18.04.3 LTS.
alphanumeric tokens, punctuation characters, and stop-
words. Finally, we transform each token to its stem as we Out-Expansions on the AAAI Benchmark
did for the SciAD dataset.
In this section, we report the results obtained using the
Out-expansion Techniques SciAD dataset. We used the Train set and Dev set as test
For the SDU@AAAI AD competition, we test the following data to tune the hyperparameters of Doc2Vec and SVMs.
out-expansion techniques: we use (i) the Cosine similarity Then, we used the Train and Dev sets as training data and the
(Cossim) with the Classic Context Vector (Li, Ji, and Yan Test set as evaluation. Evaluation quality measures were pro-
2015) and (ii) the outputs of Doc2Vec as features for Support vided by the Codelab competition evaluation system6 . We
Vector Machines (SVMs). have submitted two combinations of out-expander predic-
tors and representators: (i) cosine similarity (Cossim) as pre-
Prediction and Performance Metrics dictor with Classic Context Vector as representator, and (ii)
Support Vector Machine (SVM) as predictor with Doc2Vec
For the SDU@AAAI competition, we use the following as representator.
metrics: In Table 1, we report the out-expansion macro averages
Out-expansion Macro Averages: the average of the Preci- for predicting expansions for acronyms in sentences: Pre-
sion, Recall and F1-Measure for each expansion. These cision (P), Recall (R), and F1-measure (F1). Macro F1-
are the official metrics used in the SDU@AAAI competi- measure is the official measure for ranking competitors in
tion, being the Macro F1-Measure used to rank the com-
6
petitors based on expansion prediction quality. https://competitions.codalab.org/competitions/26611
Acronym out-expansion technique Dev macro avg. Test macro avg. Execution times
Predictor Representator P (%) R (%) F1 (%) P (%) R (%) F1 (%) Training (s) Avg. per sentence (s)
Cossim Classic Context Vector 90.00% 84.68% 87.26% 92.13% 84.16% 87.96% 1.24 0.18
SVM Doc2Vec 90.49% 81.01% 85.48% 91.95% 80.15% 85.65% 92.79 0.09
Table 1: Out-expansion macro averages and execution times for training and average per sentence for SciAD dataset.
Acronym out-expansion technique Dev macro avg. Test macro avg. Execution times
Predictor Representator P (%) R (%) F1 (%) P (%) R (%) F1 (%) Training (s) Avg. per sentence (s)
Cossim Classic Context Vector 88.24% 82.79% 85.43% 90.27% 83.73% 86.88% 504.52 0.61
SVM Doc2Vec 91.54% 81.50% 86.23% 93.57% 83.77% 88.40% 7367.32 0.12
Table 2: Out-expansion macro averages and execution times for training and average per sentence for SciAD dataset with also
Wikipedia as training data.
the SDU@AAAI competition. We also report the best re- Wikipedia, Cossim with Classic Context Vector is faster
sults obtained for both the Dev set used for hyperparmeter in training than SVM with Doc2Vec, while slower in
selection and the Test set as testing data. In addition, we re- per-sentence processing. Training both techniques is much
port the execution times for training and the average per pre- slower with the addition of Wikipedia data, yet fast enough
dicted acronym in a sentence (note that each sentence con- for a regular machine (e.g., 2 hours to train the Doc2Vec
tains only one acronym to expand). model). On average, to process a sentence, the incorporation
We can see that Cossim with Classic Context Vector of Wikipedia slows down Cossim with Classic Context
achieved the best results. In general, both techniques have Vector by 0.43 seconds and slows down SVM with Doc2Vec
slightly lower recall (less than 1%) in the Test set than in by 0.03 seconds.
the Dev test but higher macro precisions (1-2%). Since the Most of the excellent efforts by other research groups
gains in macro precisions are higher than the losses in macro submitted to the competition are transformer-based models
recalls for the Test set, the harmonic means of both macros that use pretrained models like BERT (Devlin et al. 2018),
(i.e., macro F1-measures) are higher in the Test set. Differ- ROBERTA (Liu et al. 2019), and SciBERT (Beltagy, Lo,
ences among the techniques are consistent across various and Cohan 2019). Those works mostly distinguish them-
test sets. Regarding execution times, Cossim with Classic selves on how they adapt such transformers models to out-
Context Vector is faster in training (91s) and SVM with expansion (Veyseh et al. 2020). The three leaderboard works
Doc2Vec is faster on average per sentence (0.09s). Clas- use transformers and their macro F1-measures range from
sic Context Vector counts word occurrences at training time 93.19% to 94.05%. In our understanding, only three works
while Doc2Vec trains a neural network for word and docu- including ours explored alternative techniques to transform-
ment embeddings with several iterations over the training ers, no other work explored Doc2Vec or SVMs. Although
corpus (e.g., 200). Both training and sentence processing our best technique scores are 6% less than the best in compe-
times are low given that they are executed on a regular ma- tition, we believe that our techniques are distinct enough to
chine (4 CPU cores and 32GB of RAM). Both techniques be complements to transformer-based techniques or may in-
are lightweight solutions for this problem. troduce a lighter/faster approach to this problem since trans-
former models even using GPUs (Graphics processing units)
Cross-Training and Additional Data for the SDU@AAAI
or TPUs (Tensor Processing Unit (TPU)) usually take more
Competition For our next set of experiments for the
time to train and to process data than Doc2Vec and SVMs.
SDU@AAAI competition, we increase the training data
Further, our approaches could work better when the context
provided by the competition sets with documents obtained
consists of entire documents rather than single sentences,
from Wikipedia, i.e., the Wikipedia dataset. We wanted to
which is our core use case.
test whether additional data and cross-training data helps to
solve this problem and which techniques can benefit from
such a data increment. Conclusions and Future Work
Table 2 shows the macro averages on the SciAD Dev We have evaluated two rapid techniques for acronym dis-
set using SciAD train and Wikipedia documents as train- ambiguation using the SDU@AAAI benchmarks. We have
ing data; and the macro averages and execution times on the found that Cosine similarity with Classic Context Vector
SciAD test set using the above training sets plus the Dev set works best when no Wikipedia data is used. SVM with
as training data. Doc2Vec outperforms Cosine similarity with Classic Con-
In contrast to previous results where Wikipedia data was text Vector when using Wikipedia data. Our overall re-
not used, after adding Wikipedia documents to the training sults, as measured by F1-measure score, are within 5.7% of
data, SVM with Doc2Vec obtains the best results. That com- the best system in competition. By analyzing the execution
bination also benefits from using Wikipedia data. The three times of each phase (training and evaluation of sentences),
macro averages are lower when applied to the Dev set than we showed that our approach is lightweight even on a stan-
when applied to the Test set. dard computer.
Consistent with the experimental results without We believe we could have improved performance if we
had used data sources in addition to Wikipedia such as ab- Fan, R.-E.; Chang, K.-W.; Hsieh, C.-J.; Wang, X.-R.; and
stracts from articles in web repositories to make the domain Lin, C.-J. 2008. LIBLINEAR: A Library for Large Lin-
closer to the SDU@AAAI competition data. ear Classification. Journal of Machine Learning Research
9:1871–1874.
Acknowledgments Feng, S.; Xiong, Y.; Yao, C.; Zheng, L.; and Liu, W. 2009.
Pereira’s work was supported by national funds through Acronym extraction and disambiguation in large-scale or-
FCT (Fundação para a Ciência e a Tecnologia), under ganizational web pages. In Proceedings of the Eighteenth
the PhD Scholarship SFRH/BD/135719/2018. Furthermore, ACM Conference on Information and Knowledge Manage-
Pereira and Galhardas’ work was supported by national ment, 1693–1696. New York, NY: Association for Comput-
funds through FCT under the project UIDB/50021/2020. ing Machinery.
Shasha’s work has been partly supported by (i) the New Gooch, P. 2012. BADREX: in situ expansion and corefer-
York University Abu Dhabi Center for Interacting Urban ence of biomedical abbreviations using dynamic regular ex-
Networks (CITIES), funded by Tamkeen under the NYUAD pressions. arXiv preprint. arXiv:1206.4522 [cs.CL]. Ithaca,
Research Institute Award CG001 and by the Swiss Re Insti- NY: Cornell University Library.
tute under the Quantum Cities initiative, (ii) NYU Wireless,
Kipf, T. N., and Welling, M. 2017. Semi-supervised classi-
and (iii) U.S. National Science Foundation grants 1934388,
fication with graph convolutional networks. arXiv preprint.
1840761, and 1339362.
arXiv:1609.02907 [cs.CL]. Ithaca, NY: Cornell University
The server virtual machine used to run the experiments
Library.
was supported by BioData.pt – Infraestrutura Portuguesa de
Dados Biológicos, project 22231/01/SAICT/2016, funded Le, Q. V., and Mikolov, T. 2014. Distributed representations
by Portugal 2020. of sentences and documents. In Proceedings of the Thirty-
First International Conference on Machine Learning, 1188–
References 1196.
Abdalgader, K., and Skabar, A. 2012. Unsupervised Li, Y.; Zhao, B.; Fuxman, A.; and Tao, F. 2018. Guess Me if
Similarity-based Word Sense Disambiguation Using Con- You Can: Acronym Disambiguation for Enterprises. In Pro-
text Vectors and Sentential Word Importance. ACM Trans- ceedings of the Fifty-Sixth Annual Meeting of the Associa-
actions on Speech and Language Processing 9(1):2–21. tion for Computational Linguistics, 1308–1317. Melbourne,
Australia: Association for Computational Linguistics.
Beltagy, I.; Lo, K.; and Cohan, A. 2019. SciBERT: Pre-
trained Language Model for Scientific Text. In Proceedings Li, C.; Ji, L.; and Yan, J. 2015. Acronym Disambigua-
of the 2019 Conference on Empirical Methods in Natural tion Using Word Embedding. In Proceedings of the Twenty-
Language Processing. Ninth AAAI Conference on Artificial Intelligence, 4178–
Bird, S.; Klein, E.; and Loper, E. 2009. Natural Language 4179. Menlo Park, Calif.: AAAI Press.
Processing with Python. O’Reilly Media, Inc., 1st edition. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy,
Charbonnier, J., and Wartena, C. 2018. Using word embed- O.; Lewis, M.; Zettlemoyer, L.; and Stoyanov, V. 2019.
dings for unsupervised acronym disambiguation. In Pro- RoBERTa: A Robustly Optimized BERT Pretraining Ap-
ceedings of the Twenty-Seventh International Conference proach. arXiv preprint. arXiv:1907.11692 [cs.CL]. Ithaca,
on Computational Linguistics, 2610–2619. Santa Fe, New NY: Cornell University Library.
Mexico: Association for Computational Linguistics. Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013a.
Ciosici, M. R., and Assent, I. 2018. Abbreviation expander Efficient Estimation of Word Representations in Vector
- a web-based system for easy reading of technical docu- Space. arXiv preprint. arXiv:1301.3781v3 [cs.CL]. Ithaca,
ments. In Proceedings of the Twenty-Seventh International NY: Cornell University Library.
Conference on Computational Linguistics: System Demon- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; and Dean,
strations, 1–4. Santa Fe, New Mexico: Association for Com- J. 2013b. Distributed Representations of Words and Phrases
putational Linguistics. and Their Compositionality. In Proceedings of the Twenty-
Ciosici, M. R.; Sommer, T.; and Assent, I. 2019. Sixth International Conference on Neural Information Pro-
Unsupervised abbreviation disambiguation contextual dis- cessing Systems, 3111–3119. Red Hook, NY: Curran Asso-
ambiguation using word embeddings. arXiv preprint. ciates Inc.
arXiv:1904.00929v2 [cs.CL]. Ithaca, NY: Cornell Univer- Moon, S.; McInnes, B.; and Melton, G. B. 2015. Challenges
sity Library. and practical approaches with word sense disambiguation of
Dai, A. M.; Olah, C.; and Le, Q. V. 2015. Docu- acronyms and abbreviations in the clinical domain. Health-
ment embedding with paragraph vectors. arXiv preprint. care Informatics Research 21(1):35–42.
arXiv:1507.07998 [cs.CL]. Ithaca, NY: Cornell University Moon, S.; Pakhomov, S.; and Melton, G. B. 2012. Au-
Library. tomated disambiguation of acronyms and abbreviations in
Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2018. clinical texts: window and training size considerations.
BERT: Pre-training of Deep Bidirectional Transformers for AMIA Annual Symposium proceedings 2012:1310–1319.
Language Understanding. arXiv preprint. arXiv:1810.04805 Moro, A., and Navigli, R. 2015. SemEval-2015 task 13:
[cs.CL]. Ithaca, NY: Cornell University Library. Multilingual all-words sense disambiguation and entity link-
ing. In Proceedings of the Ninth International Workshop on and Xu, H. 2017. A long journey to short abbreviations:
Semantic Evaluation, 288–297. Denver, Colorado: Associa- developing an open-source framework for clinical abbrevia-
tion for Computational Linguistics. tion recognition and disambiguation (CARD). Journal of the
Navigli, R. 2009. Word sense disambiguation: A survey. American Medical Informatics Association 24(e1):e79–e86.
ACM Computing Surveys 41(2):1–69. Yu, H.; Kim, W.; Hatzivassiloglou, V.; and Wilbur, J. 2006.
Pakhomov, S.; Pedersen, T.; and Chute, C. G. 2005. Abbre- A Large Scale, Corpus-Based Approach for Automatically
viation and acronym disambiguation in clinical discourse. Disambiguating Biomedical Abbreviations. ACM Transac-
AMIA Annual Symposium proceedings 2005:589–593. tions on Information Systems 24(3):380–404.
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.;
Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss,
R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.;
Brucher, M.; Perrot, M.; and Duchesnay, E. 2011. Scikit-
learn: Machine learning in Python. Journal of Machine
Learning Research 12:2825–2830.
Pouran Ben Veyseh, A.; Dernoncourt, F.; Tran, Q. H.; and
Nguyen, T. H. 2020. What does this acronym mean? in-
troducing a new dataset for acronym identification and dis-
ambiguation. In Proceedings of the Twenty-Eighth Inter-
national Conference on Computational Linguistics, 3285–
3301. Barcelona, Spain (Online): International Committee
on Computational Linguistics.
Prokofyev, R.; Demartini, G.; Boyarsky, A.; Ruchayskiy, O.;
and Cudré-Mauroux, P. 2013. Ontology-based word sense
disambiguation for scientific literature. In Proceedings of
the Thirty-Fifth European Conference on Advances in Infor-
mation Retrieval, 594–605. Berlin, Heidelberg: Springer-
Verlag.
Pustejovsky, J.; Castaño, J.; Cochran, B.; Kotecki, M.; and
Morrell, M. 2001. Automatic extraction of acronym-
meaning pairs from MEDLINE databases. Studies in Health
Technology and Informatics 84(Pt 1):371–375.
Schwartz, A. S., and Hearst, M. A. 2003. A simple algorithm
for identifying abbreviation definitions in biomedical text. In
Pacific Symposium on Biocomputing, 451–462. Singapore:
World Scientific Press.
Stevenson, M.; Guo, Y.; Al Amri, A.; and Gaizauskas, R.
2009. Disambiguation of Biomedical Abbreviations. In Pro-
ceedings of the Workshop on Current Trends in Biomedical
Natural Language Processing, 71–79. Boulder, Colorado:
Association for Computational Linguistics.
Thakker, A.; Barot, S.; and Bagul, S. 2017. Acronym
Disambiguation: A Domain Independent Approach. arXiv
preprint. arXiv:1711.09271v3 [cs.CL]. Ithaca, NY: Cornell
University Library.
Veyseh, A. P. B.; Dernoncourt, F.; Nguyen, T. H.; Chang, W.;
and Celi, L. A. 2020. Acronym identification and disam-
biguation shared tasks for scientific document understand-
ing. In Proceedings of the AAAI-21 Workshop on Scientific
Document Understanding.
Wu, Y.; Xu, J.; Zhang, Y.; and Xu, H. 2015. Clinical abbre-
viation disambiguation using neural word embeddings. In
Proceedings of the Workshop on Biomedical Natural Lan-
guage Processing, 171–176. Beijing, China: Association for
Computational Linguistics.
Wu, Y.; Denny, J. C.; Trent Rosenbloom, S.; Miller, R. A.;
Giuse, D. A.; Wang, L.; Blanquicett, C.; Soysal, E.; Xu, J.;