Acronym Identification and Disambiguation Shared Tasks
                            for Scientific Document Understanding
                     Amir Pouran Ben Veyseh1 , Franck Dernoncourt2 , Thien Huu Nguyen1 ,
                                   Walter Chang2 , and Leo Anthony Celi3,4
                                    1
                                      University of Oregon, Eugene, OR, USA
                                      2
                                        Adobe Research, San Jose, CA, USA
                                  3
                                    Harvard University, Cambridge, MA, USA
                         4
                           Massachusetts Institute of Technology, Cambridge, MA, USA
                                {thien, apouranb}@cs.uoregon.edu
                            {franck.dernoncourt, wachang}@adobe.com
                                        lceli@bidmc.harvard.edu
                            Abstract                                  is the E2E throughput”, the text processing system must rec-
                                                                      ognize KPI and E2E as acronyms and the phrase key perfor-
  Acronyms are the short forms of longer phrases and they             mance indicator as the long-form. Another issue related to
  are frequently used in writing, especially scholarly writing,
                                                                      the acronym that text understanding tools encounter is that
  to save space and facilitate the communication of informa-
  tion. As such, every text understanding tool should be capable      the correct meaning (i.e., long-form) of the acronym might
  of recognizing acronyms in text (i.e., acronym identification)      not be provided in the document itself (e.g., the acronym
  and also finding their correct meaning (i.e., acronym disam-        E2E in the running example). In these cases, the correct
  biguation). As most of the prior works on these tasks are re-       meaning can be obtained by looking up the meaning in
  stricted to the biomedical domain and use unsupervised meth-        an acronym dictionary. However, as different long forms
  ods or models trained on limited datasets, they fail to perform     could share the same acronym (e.g., two long forms Ca-
  well for scientific document understanding. To push forward         ble News Network and Convolution Neural Network share
  research in this direction, we have organized two shared task       the acronym CNN), this meaning look-up is not straightfor-
  for acronym identification and acronym disambiguation in            ward and the system must disambiguate the acronym (i.e.,
  scientific documents, named AI@SDU and AD@SDU, re-
                                                                      acronym disambiguation). Both AI and AD models could be
  spectively. The two shared tasks have attracted 52 and 43
  participants, respectively. While the submitted systems make        used in downstream applications including definition extrac-
  substantial improvements compared to the existing baselines,        tion (Veyseh et al. 2020a; Spala et al. 2020, 2019; Espinosa-
  there are still far from the human-level performance. This pa-      Anke and Schockaert 2018; Jin et al. 2013), various infor-
  per reviews the two shared tasks and the prominent partici-         mation extraction tasks (Liu et al. 2019; Pouran Ben Vey-
  pating systems for each of them.                                    seh, Nguyen, and Dou 2019) and question answering (Ack-
                                                                      ermann et al. 2020; Veyseh 2016).
                        Introduction                                     Due to the importance of the two aforementioned tasks,
                                                                      i.e. acronym identification (AI) and acronym disambigua-
One of the common practices in writing to save space and              tion (AD), there is a wealth of prior work on AI and AD
make the flow of information smoother is to avoid repetition          (Park and Byrd 2001; Schwartz and Hearst 2002; Nadeau
of long phrases which might waste space and the reader’s              and Turney 2005; Kuo et al. 2009; Taneva et al. 2013; Kirch-
time. To this end, acronyms that are the shortened form               hoff and Turner 2016; Li et al. 2018; Ciosici, Sommer, and
of a long-phrase are often used in various types of writ-             Assent 2019; Jin, Liu, and Lu 2019; Veyseh et al. 2021).
ing, especially in scientific documents. However, this preva-         However, there are two major limitations in the existing sys-
lence might introduce more challenges for text understand-            tems. First, for AD tasks, the existing models are mainly
ing tools. More specifically, as the acronyms might not be            limited to the biomedical domain, ignoring the challenges in
defined in dictionaries, especially locally-defined acronyms          other domains. Second, for the AI task, the existing models
whose long-form is only provided in the document that in-             employ either unsupervised methods or models trained using
troduces them, a text processing model should be able to              a limited manually annotated AI dataset. The unsupervised
identify the acronyms and their long forms in the text (i.e.,         methods or small size of the AI dataset results in errors for
acronym identification). For instance, in the sentence “The           acronym identification which could be also propagated for
main key performance indicator, herein referred to as KPI,            acronym disambiguation task.
Copyright © 2021 for this paper by its authors. Use permitted under      To address the above issues in the prior works, we recently
Creative Commons License Attribution 4.0 International (CC BY         released the largest manually annotated acronym identifica-
4.0).                                                                 tion dataset for scientific documents (viz., SciAI) (Veyseh
   Dataset                                          Size   # Unique Acronyms   # Unique Meaning   # Documents   Publicly Available       Domain
   LSAEF (Liu et al. 2011)                         6,185          255                1,372            N/A              No               Wikipedia
   AESM (Nautial, Sristy, and Somayajulu 2014)      355           N/A                 N/A             N/A              No               Wikipedia
   MHIR (Harris and Srinivasan 2019)                N/A           N/A                 N/A             50               No            Scientific Papers
   MHIR (Harris and Srinivasan 2019)                N/A           N/A                 N/A             50               No                 Patent
   MHIR (Harris and Srinivasan 2019)                N/A           N/A                 N/A             50               No                 News
   SciAI (ours)                                   17,506         7,964               9,775           6,786             yes           Scientific Papers

Table 1: Comparison of non-medical manually annotated acronym identification datasets. Note that size refers to the number of
sentences in the dataset.

   Dataset                                        Size                    Annotation                    Avg. Number of Samples per Long Form
   Science WISE (Prokofyev et al. 2013)          5,217        Disambiguation manually annotated                         N/A
   NOA (Charbonnier and Wartena 2018)            19,954             No manual annotation                                  4
   SciAD (ours)                                  62,441    Acronym identification manually annotated                     22

Table 2: Comparison of scientific acronym disambiguation (AD) datasets. Note that size refers to the number of sentences in
the dataset.


et al. 2020b). This dataset consists of 17,506 sentences from                       As mentioned in the introduction, the existing AI datasets
6,786 English papers published in arXiv. The annotation of                      are either created using some unsupervised methods (e.g.,
each sentence involves the acronyms and long forms men-                         by character matching the acronym with their surrounding
tioned in the sentence. Also, using this manually annotated                     words in the text) or they are small-sized thus inappropri-
AI dataset, we also created a dictionary of 732 acronyms                        ate for data-hungry deep learning models. To address these
with multiple corresponding long forms (i.e., ambiguous                         limitations we aim to create the largest acronym identifi-
acronyms) which is the largest available acronym dictionary                     cation dataset which is manually labeled. To this end, we
for scientific documents. Moreover, using the prepared dic-                     first collect 6,786 English papers from arXiv. This collec-
tionary and 2,031,592 sentences extracted from arXiv pa-                        tion contains 2,031,592 sentences. As all of these sentences
pers, we created a dataset for the acronym disambiguation                       might not contain the acronym and their long forms, we
task (viz., SciAD) (Veyseh et al. 2020b). This dataset con-                     first filter out the sentences without any candidate acronym
sists of 62,441 sentences, which is larger than the prior AD                    and long-form. To identify the candidate acronyms, we use
dataset for the scientific domain.                                              the rule that the word wt is a candidate acronym if half
   Using the two datasets SciAI and SciAD, we organize                          of its characters are upper-cased. To identify the candidate
two shared tasks for acronym identification and acronym                         long forms, we employ the rule that the subsequent words
disambiguation for scientific document understanding (i.e.,                     [wj , wj + 1, . . . , wj+k ] are a candidate long-form if the con-
AI@SDU and AD@SDU, respectively). The AI@SDU                                    catenation of their first one, two, or three characters can
shared task has attracted 52 participant teams with 19 sub-                     form a candidate acronym, i.e., wt , in the sentence. After
missions during the evaluation phase. The AD@SDU has                            filtering sentences without any candidate acronym and long-
also attracted 43 participant teams with 10 submissions dur-                    form, 17,506 sentences are selected that are annotated by
ing the evaluation phase. The participant teams made con-                       three annotators from Amazon Mechanical Turk (MTurk).
siderable progress on both shared task compared to the pro-                     More specifically, MTurk workers annotated the acronyms,
vided baselines. However, the top-performing models, (viz.,                     long forms, and the mapping between identified acronyms
AT-BERT-E for AI@SDU with 93.3% F1 score and Deep-                              and long forms. In case of disagreements, if two out of
BlueAI for AD@SDU with 94.0% F1 score), underperforms                           three workers agree on an annotation, we use majority vot-
human (with 96.0% and 96.1% F1 score for AI@SDU and                             ing to decide the correct annotation. Otherwise, a fourth an-
AD@SDU shared task, respectively), leaving room for fu-                         notator is hired to resolve the conflict. The inter-annotator
ture research. In this paper, we review the dataset creation                    agreement (IAA) using Krippendorff’s alpha (Krippendorff
process, the details of the shared task, and the prominent                      2011) with the MASI distance metric (Passonneau 2006) for
submitted systems.                                                              short-forms (i.e., acronyms) is 0.80 and for long-forms (i.e.,
                                                                                phrases) is 0.86. This dataset is called SciAI. A comparison
              Dataset & Task Description                                        of the SciAI dataset with other existing manually annotated
Acronym Identification                                                          AI datasets is provided in Table 1.
The acronym identification (AI) task aims to recognize all
                                                                                Acronym Disambiguation
acronym and long forms mentioned in a sentence. For-
mally, given the sentence S = [w1 , w2 , . . . , wN ] the goal                  The goal of acronym disambiguation (AD) task is to find
is to predict the sequence L = [l1 , l2 , . . . , lN ] where li ∈               the correct meaning of a given acronym in a sentence. More
{Ba , Ia , Bl , Il , O}. Note that Ba and Ia indicate the begin-                specifically, given the sentences S = [w1 , w2 , . . . , wN ] and
ning and inside an acronym, respectively, while Bl and Il                       the index t where wt is an acronym with multiple long forms
show beginning and inside a long form, respectively, and O                      L = {l1 , l2 , . . . , lm } the goal is to predict the long form li
is the label for other words.                                                   form L as the correct meaning of wt .
   As discussed earlier, one of the issues with the exist-       Team Name                              Precision   Recall    F1
                                                                 GCDH                                     86.50     85.57    86.03
ing AD datasets is that they mainly focus on the biomedi-        Aadarshsingh                             88.26     89.08    88.67
cal domain, ignoring the challenges in other domains. This       TAG-CIC                                  89.70     88.16    88.92
domain shift is important as some of the existing models         Spark                                    89.91     90.49    90.20
for biomedical AD exert domain-specific resources (e.g.,         Dumb-AI                                  89.72     90.94    90.33
                                                                 Napsternxg                               90.15     91.15    90.65
BioBERT) which might not be suitable for other domains.          Pikaqiu                                  91.02     90.51    90.76
Another issue of the existing AD datasets, especially the        SciDr (Singh and Kumar 2020)             90.98     90.83    90.90
ones proposed for a scientific domain, is that they are based    Aliou                                    90.78     91.12    90.95
on unsupervised AI datasets. That is, acronyms and long          AliBaba2020                              90.30     92.87    91.57
                                                                 RK                                       89.93     93.88    91.86
forms in a corpus are identified using some rules and the        DeepBlueAI                               92.01     91.84    91.92
resulting AI dataset is employed to find acronyms with mul-      EELM-SLP (Kubal and Nagvenkar 2020)      89.70     94.59    92.08
tiple long forms to create the AD dataset. This unsuper-         Lufiedby                                 92.64     91.74    92.19
vised method to create an AD dataset could introduce noises      Primer (Egan and Bohannon 2020)          91.73     93.49    92.60
and miss some challenging cases. To address these limita-        HowToSay                                 91.93     93.70    92.81
                                                                 N&E (Li et al. 2020)                     93.49     92.74    93.11
tions, we created a new AD dataset using the manually la-        AT-BERT-E (Zhu et al. 2020)              92.20     94.43    93.30
beled SciAI dataset. More specifically, first using the map-     Baseline (Rule-based)                    91.31     77.93    84.09
pings between annotated acronyms and long forms in SciAI,        Human Performance                        97.70     94.56    96.09
we create a dictionary of acronyms that have multiple long
forms (i.e., ambiguous acronyms). This dictionary contains      Table 3: Performance of the participating systems in
732 acronyms with an average of 3.1 meaning (i.e., long-        Acronym Identification task
form) per acronym. Afterward, to create samples for the AD
dataset, we look up all sentences in the collected corpus in
which one of the ambiguous acronyms is locally defined          signed rules which could have high precision, but low recall
(i.e., its long-form is provided in the same sentence). Next,   (Rogers, Rae, and Demner-Fushman 2020; Li et al. 2020)
in the documents hosting these sentences, we automatically      (2) Feature-based Models: These models extract various
annotate every occurrence of the acronym with its locally       features from the texts to be used by a statistical model
defined long-form. Using this process a dataset consisting      to predict the acronyms and long forms (Li et al. 2020)
of 62,441 sentences is created. We call this dataset SciAD.     (3) Transformer-based models: In these systems, the sen-
A comparison of the SciAD dataset with other existing sci-      tence is encoded with a pre-trained transformer-based lan-
entific AD dataset is provided in Table 2                       guage model and the labels are predicted using the obtained
                                                                word embeddings (Kubal and Nagvenkar 2020; Li et al.
         Participating Systems & Results                        2020; Egan and Bohannon 2020). Some of these models
                                                                may also leverage adversarial training to make the model
Acronym Identification                                          more robust to the noises (Zhu et al. 2020) or they might em-
For the AI task, we provide a rule-based baseline. In partic-   ploy an ensemble model (Singh and Kumar 2020). Among
ular, inspired by (Schwartz and Hearst 2002), the baseline      all submitted models, the method proposed by (Zhu et al.
identifies the acronyms and their long-forms if they match      2020), i.e., AT-BERT-E, achieves the highest performance.
one of the patterns of long form (acronym) or acronym (long     This model employs an adversarial training approach to in-
form). More specifically, if there is a word with more than     crease the model robustness toward the noise. More specif-
60% upper-cased characters which is inside parentheses or       ically, they augment the training data with adversarial per-
right before parentheses, it is predicted as an acronym. Af-    turbed samples and fine-tune a BERT model followed by a
terward, we assess the words before or after the acronym        feed-forward neural net on this task. For the adversarial per-
(depending on which pattern the predicted acronym belongs       turbation, they leverage a gradient-based approach in which
to) that fall into the pre-defined window of size min(|A| +     the sample representations are altered in the direction that
5, 2 ∗ |A|), where |A| is the number of characters in the       the gradient of the loss function rises.
acronym. In particular, if there is a sequence of characters       We evaluate the systems based on macro-averaged preci-
in these words which can form the upper-cased characters in     sion, recall, and F1 score of the acronym and long-form pre-
the acronym then the words after or before the acronym are      diction. The results are shown in Table 3. This table shows
selected as its meaning (i.e., long-form). Moreover, as SciAI   that the participants have made considerable improvement
dataset annotates acronyms even if they do not have any lo-     over the provided baseline. However, there is still a gap be-
cally defined long-form, we extend the rule for identifying     tween the performance of the task winner (i.e., AT-BERT-E
the acronyms by relaxing the requirement of being inside or     (Zhu et al. 2020)) and human-level performance, suggesting
before the parentheses.                                         more improvement is required.
   In the AI@SDU task, 54 teams participated and 18 of
them submitted their system results in the evaluation phase.    Acronym Disambiguation
In total, all teams submitted 254 submissions for different     For AD task, we propose to employ the frequency of the
versions of their models. The submitted systems employ var-     acronym long forms to disambiguate them. More specif-
ious methods including: (1) Rule-based Methods: Similar         ically, for the acronym a with the long forms L =
to our baseline, some participants exploited manually de-       [l1 , l2 , . . . , lm ], we compute the number of occurrence of
 Team Name                                       Precision   Recall    F1
 UC3M (Jaber and Martinez 2020)                    92.15     77.97    84.37
                                                                              and the candidate with the highest scores is selected as the
 AccAcE (Pereira, Galhardas, and Shasha 2020)      93.57     83.77    88.40   final model prediction. For the classifier, authors employ a
 GCDH                                              94.88     87.03    90.79   pre-trained BERT model that takes the input in the form of
 Spark                                             94.87     87.23    90.89
 AI-NLM (Rogers, Rae, and Demner-Fushman 2020)     90.73     91.96    91.34   Li [SEP ] w1 , w2 , . . . , start, wa , end, . . . , wn , where Li is
 Primer (Egan and Bohannon 2020)                   94.72     88.64    91.58   the long-form candiate, wi is the words of the input sen-
 Sansansanye                                       95.18     88.93    91.95
 Zhuyeu                                            95.48     89.07    92.16   tence, wa is the ambiguous acronym in the input sentence,
 Dumb AI                                           95.95     89.59    92.66   and start and end are two special tokens to provide the po-
 SciDr (Singh and Kumar 2020)                      96.52     90.09    93.19   sition of the acronym to the model.
 hdBERT (Zhong et al. 2020)                        96.94     90.73    93.73
 DeepBlueAI (Pan et al. 2020)                      96.95     91.32    94.05      We evaluate the systems using their macro-averaged pre-
 Baseline (Freq.)                                  89.00     46.36    60.97   cision, recall, and F1 score for predicting the correct long
 Human Performance                                 97.82     94.45    96.10
                                                                              form. The results are shown in Table 4. Again, this table
Table 4: Performance of the participating systems in                          shows that the participating systems considerably improved
Acronym Disambiguation task                                                   the performance over the provided baseline. Although, the
                                                                              existing gap between the best performing model, i.e., Deep-
                                                                              BlueAI (Pan et al. 2020), and human-level performance
                                                                              shows that more research is required.
each of its long forms in the training data, i.e., F =
[f1 , f2 , . . . , fm ] where fi = |Aai | and Aai is the set of sen-
tences in the training data with the acronym a and the long
                                                                                                        Conclusion
form li . In inference time, the acronym a is expanded to its                 In this paper, we summarized the task of acronym identi-
long form with the highest frequency, i.e., i∗ = arg maxi fi .                fication and acronym disambiguation at a scientific docu-
   The AD@SDU task attracted 44 participants, 12 submis-                      ment understanding workshop (AI@SDU and AD@SDU).
sions at the evaluation phase, and 187 total submissions                      For these tasks, we provide two novel datasets that ad-
for different versions of the participating systems. This task                dress the limitations of the prior work. Both tasks attracted
has been approached with a variety of methods, including                      substantial participants with considerable performance im-
(1) Feature-based models: Some systems extract features                       provement over provided baselines. However, the lower per-
from the input sentence (e.g., word stems, part-of-speech                     formance of the best performing models compared to the
tags, or special characters in the acronym). Next, a statis-                  human level performance shows that more research should
tical model, such as Support Vector Machine, Naive Bayes,                     be conducted on both tasks.
and K-nearest neighbors, is employed to predict the correct
long form of the acronym (Jaber and Martinez 2020; Pereira,                                             References
Galhardas, and Shasha 2020); (2) Neural Networks: A
few of the participating systems employ deep architectures,                   Ackermann, C. F.; Beller, C. E.; Boxwell, S. A.; Katz, E. G.;
e.g., convolution neural networks (CNN) or long short-                        and Summers, K. M. 2020. Resolution of acronyms in ques-
term memory (LSTM) (Rogers, Rae, and Demner-Fushman                           tion answering systems. US Patent 10,572,597.
2020); (3) Transformer-based Models: The majority of                          Charbonnier, J.; and Wartena, C. 2018. Using Word Em-
the participants resort to transformer-based language mod-                    beddings for Unsupervised Acronym Disambiguation. In
els, e.g., BERT, SciBERT or RoBERTa, to encode the in-                        Proceedings of the 27th International Conference on Com-
put sentence. However, they differ in how they leverage the                   putational Linguistics. Santa Fe, New Mexico, USA: Asso-
outputs of these language models for prediction and also                      ciation for Computational Linguistics. URL https://www.
how they formulate the task. Whereas most of the existing                     aclweb.org/anthology/C18-1221.
works formulate the task as a classification problem (Pan
et al. 2020; Zhong et al. 2020), authors in (Egan and Bo-                     Ciosici, M. R.; Sommer, T.; and Assent, I. 2019. Un-
hannon 2020) use an information retrieval approach. More                      supervised abbreviation disambiguation. arXiv preprint
specifically, the cosine similarity between the embeddings                    arXiv:1904.00929 .
of the candidates and the input is employed to compute the                    Egan, N.; and Bohannon, J. 2020. Primer AI’s Sys-
score of each candidate and then to rank them based on their                  tems for Acronym Identification and Disambiguation. In
scores. Moreover, authors in (Singh and Kumar 2020) model                     SDU@AAAI-21.
this task as a span prediction problem. Specifically, the con-
catenation of the different candidate long forms with the                     Espinosa-Anke, L.; and Schockaert, S. 2018. Syntactically
acronym and the input sentence is encoded by a transformer-                   Aware Neural Architectures for Definition Extraction. In
based language model. Afterward, a sequence labeling com-                     NAACL-HLT.
ponent predicts the sub-sequence with the highest proba-                      Harris, C. G.; and Srinivasan, P. 2019. My Word! Machine
bility of being the correct long form. Among all submit-                      versus Human Computation Methods for Identifying and
ted systems, the DeepBlueAI model proposed by (Pan et al.                     Resolving Acronyms. Computación y Sistemas 23(3).
2020) obtained the highest performance for acronym dis-
ambiguation on SciAD test set. This model formulate this                      Jaber, A.; and Martinez, P. 2020. Participation of UC3M
task as a binary classification problem in which each can-                    in SDU@AAAI-21: A Hybrid Approach to Disambiguate
didate long-form is assigned a score by a binary classifier                   Scientific Acronyms. In SDU@AAAI-21.
Jin, Q.; Liu, J.; and Lu, X. 2019. Deep Contextual-              Passonneau, R. 2006. Measuring agreement on set-valued
ized Biomedical Abbreviation Expansion. arXiv preprint           items (MASI) for semantic and pragmatic annotation. In
arXiv:1906.03360 .                                               LREC.
Jin, Y.; Kan, M.-Y.; Ng, J.-P.; and He, X. 2013. Mining Sci-     Pereira, J. L. M.; Galhardas, H.; and Shasha, D. 2020.
entific Terms and their Definitions: A Study of the ACL An-      Acronym Expander at SDU@AAAI-21: an Acronym Dis-
thology. In EMNLP.                                               ambiguation Module. In SDU@AAAI-21.
Kirchhoff, K.; and Turner, A. M. 2016. Unsupervised res-         Pouran Ben Veyseh, A.; Nguyen, T. H.; and Dou, D. 2019.
olution of acronyms and abbreviations in nursing notes us-       Graph based Neural Networks for Event Factuality Predic-
ing document-level context models. In Proceedings of the         tion using Syntactic and Semantic Structures. In ACL.
Seventh International Workshop on Health Text Mining and         Prokofyev, R.; Demartini, G.; Boyarsky, A.; Ruchayskiy, O.;
Information Analysis, 52–60.                                     and Cudré-Mauroux, P. 2013. Ontology-based word sense
Krippendorff, K. 2011. Computing Krippendorff’s alpha-           disambiguation for scientific literature. In European confer-
reliability.                                                     ence on information retrieval, 594–605. Springer.
Kubal, D.; and Nagvenkar, A. 2020. Effective Ensembling          Rogers, W.; Rae, A.; and Demner-Fushman, D. 2020. AI-
of Transformer based Language Models for Acronyms Iden-          NLM exploration of the Acronym Identification and Disam-
tification. In SDU@AAAI-21.                                      biguation Shared Tssks at SDU@AAAI-21. In SDU@AAAI-
                                                                 21.
Kuo, C.-J.; Ling, M. H.; Lin, K.-T.; and Hsu, C.-N. 2009.
                                                                 Schwartz, A. S.; and Hearst, M. A. 2002. A simple algorithm
BIOADI: a machine learning approach to identifying ab-
                                                                 for identifying abbreviation definitions in biomedical text. In
breviations and definitions in biological literature. In BMC
                                                                 Biocomputing 2003, 451–462. World Scientific.
bioinformatics, volume 10, S7. Springer.
                                                                 Singh, A.; and Kumar, P. 2020. SciDr at SDU-2020 : IDEAS
Li, F.; Mai, Z.; Zou, W.; Ou, W.; Qin, X.; Lin, Y.; and Zhang,   - Identifying and Disambiguating Everyday Acronyms for
W. 2020. Systems at SDU-2021 Task 1: Transformers for            Scientific Domain. In SDU@AAAI-21.
Sentence Level Sequence Label. In SDU@AAAI-21.
                                                                 Spala, S.; Miller, N.; Dernoncourt, F.; and Dockhorn, C.
Li, Y.; Zhao, B.; Fuxman, A.; and Tao, F. 2018. Guess            2020. SemEval-2020 Task 6: Definition Extraction from
Me if You Can: Acronym Disambiguation for Enterprises.           Free Text with the DEFT Corpus. In Proceedings of the
In Proceedings of the 56th Annual Meeting of the Associa-        Fourteenth Workshop on Semantic Evaluation.
tion for Computational Linguistics (Volume 1: Long Papers),
1308–1317. Melbourne, Australia: Association for Com-            Spala, S.; Miller, N. A.; Yang, Y.; Dernoncourt, F.; and
putational Linguistics. doi:10.18653/v1/P18-1121. URL            Dockhorn, C. 2019. DEFT: A corpus for definition extrac-
https://www.aclweb.org/anthology/P18-1121.                       tion in free- and semi-structured text. In Proceedings of the
                                                                 13th Linguistic Annotation Workshop.
Liu, J.; Chen, J.; Zhang, Y.; and Huang, Y. 2011. Learn-
                                                                 Taneva, B.; Cheng, T.; Chakrabarti, K.; and He, Y. 2013.
ing conditional random fields with latent sparse features for
                                                                 Mining acronym expansions and their meanings using query
acronym expansion finding. In Proceedings of the 20th
                                                                 click log. In Proceedings of the 22nd international confer-
ACM international conference on Information and knowl-
                                                                 ence on World Wide Web, 1261–1272.
edge management, 867–872.
                                                                 Veyseh, A. P. B. 2016. Cross-lingual question answering us-
Liu, Y.; Meng, F.; Zhang, J.; Xu, J.; Chen, Y.; and Zhou,        ing common semantic space. In Proceedings of TextGraphs-
J. 2019. Gcdt: A global context enhanced deep transition         10: the workshop on graph-based methods for natural lan-
architecture for sequence labeling. In ACL.                      guage processing, 15–19.
Nadeau, D.; and Turney, P. D. 2005. A supervised learning        Veyseh, A. P. B.; Dernoncourt, F.; Chang, W.; and Nguyen,
approach to acronym identification. In Conference of the         T. H. 2021. MadDog: A Web-based System for Acronym
Canadian Society for Computational Studies of Intelligence,      Identification and Disambiguation. In EACL.
319–329. Springer.
                                                                 Veyseh, A. P. B.; Dernoncourt, F.; Dou, D.; and Nguyen,
Nautial, A.; Sristy, N. B.; and Somayajulu, D. V. 2014. Find-    T. H. 2020a. A Joint Model for Definition Extraction with
ing acronym expansion using semi-Markov conditional ran-         Syntactic Connection and Semantic Consistency. In AAAI,
dom fields. In Proceedings of the 7th ACM India Computing        9098–9105.
Conference, 1–6.
                                                                 Veyseh, A. P. B.; Dernoncourt, F.; Tran, Q. H.; and Nguyen,
Pan, C.; Song, B.; Wang, S.; and Luo, Z. 2020. BERT-based        T. H. 2020b. What Does This Acronym Mean? Introducing
Acronym Disambiguation with Multiple Training Strategies.        a New Dataset for Acronym Identification and Disambigua-
In SDU@AAAI-21.                                                  tion. In Proceedings of COLING.
Park, Y.; and Byrd, R. J. 2001. Hybrid text mining for find-     Zhong, Q.; Zeng, G.; Zhu, D.; Zhang, Y.; Lin, W.; Chen,
ing abbreviations and their definitions. In Proceedings of the   B.; and Tang, J. 2020. Leveraging Domain Agnostic
2001 conference on empirical methods in natural language         and Specific Knowledge for Acronym Disambiguation. In
processing.                                                      SDU@AAAI-21.
Zhu, D.; Lin, W.; Zhang, Y.; Zhong, Q.; Zeng, G.; Wu,
W.; and Tang, J. 2020. AT-BERT: Adversarial Training
BERT for Acronym Identification Winning Solution for
SDU@AAAI-21. In SDU@AAAI-21.