Acronym Identification and Disambiguation Shared Tasks for Scientific Document Understanding Amir Pouran Ben Veyseh1 , Franck Dernoncourt2 , Thien Huu Nguyen1 , Walter Chang2 , and Leo Anthony Celi3,4 1 University of Oregon, Eugene, OR, USA 2 Adobe Research, San Jose, CA, USA 3 Harvard University, Cambridge, MA, USA 4 Massachusetts Institute of Technology, Cambridge, MA, USA {thien, apouranb}@cs.uoregon.edu {franck.dernoncourt, wachang}@adobe.com lceli@bidmc.harvard.edu Abstract is the E2E throughput”, the text processing system must rec- ognize KPI and E2E as acronyms and the phrase key perfor- Acronyms are the short forms of longer phrases and they mance indicator as the long-form. Another issue related to are frequently used in writing, especially scholarly writing, the acronym that text understanding tools encounter is that to save space and facilitate the communication of informa- tion. As such, every text understanding tool should be capable the correct meaning (i.e., long-form) of the acronym might of recognizing acronyms in text (i.e., acronym identification) not be provided in the document itself (e.g., the acronym and also finding their correct meaning (i.e., acronym disam- E2E in the running example). In these cases, the correct biguation). As most of the prior works on these tasks are re- meaning can be obtained by looking up the meaning in stricted to the biomedical domain and use unsupervised meth- an acronym dictionary. However, as different long forms ods or models trained on limited datasets, they fail to perform could share the same acronym (e.g., two long forms Ca- well for scientific document understanding. To push forward ble News Network and Convolution Neural Network share research in this direction, we have organized two shared task the acronym CNN), this meaning look-up is not straightfor- for acronym identification and acronym disambiguation in ward and the system must disambiguate the acronym (i.e., scientific documents, named AI@SDU and AD@SDU, re- acronym disambiguation). Both AI and AD models could be spectively. The two shared tasks have attracted 52 and 43 participants, respectively. While the submitted systems make used in downstream applications including definition extrac- substantial improvements compared to the existing baselines, tion (Veyseh et al. 2020a; Spala et al. 2020, 2019; Espinosa- there are still far from the human-level performance. This pa- Anke and Schockaert 2018; Jin et al. 2013), various infor- per reviews the two shared tasks and the prominent partici- mation extraction tasks (Liu et al. 2019; Pouran Ben Vey- pating systems for each of them. seh, Nguyen, and Dou 2019) and question answering (Ack- ermann et al. 2020; Veyseh 2016). Introduction Due to the importance of the two aforementioned tasks, i.e. acronym identification (AI) and acronym disambigua- One of the common practices in writing to save space and tion (AD), there is a wealth of prior work on AI and AD make the flow of information smoother is to avoid repetition (Park and Byrd 2001; Schwartz and Hearst 2002; Nadeau of long phrases which might waste space and the reader’s and Turney 2005; Kuo et al. 2009; Taneva et al. 2013; Kirch- time. To this end, acronyms that are the shortened form hoff and Turner 2016; Li et al. 2018; Ciosici, Sommer, and of a long-phrase are often used in various types of writ- Assent 2019; Jin, Liu, and Lu 2019; Veyseh et al. 2021). ing, especially in scientific documents. However, this preva- However, there are two major limitations in the existing sys- lence might introduce more challenges for text understand- tems. First, for AD tasks, the existing models are mainly ing tools. More specifically, as the acronyms might not be limited to the biomedical domain, ignoring the challenges in defined in dictionaries, especially locally-defined acronyms other domains. Second, for the AI task, the existing models whose long-form is only provided in the document that in- employ either unsupervised methods or models trained using troduces them, a text processing model should be able to a limited manually annotated AI dataset. The unsupervised identify the acronyms and their long forms in the text (i.e., methods or small size of the AI dataset results in errors for acronym identification). For instance, in the sentence “The acronym identification which could be also propagated for main key performance indicator, herein referred to as KPI, acronym disambiguation task. Copyright © 2021 for this paper by its authors. Use permitted under To address the above issues in the prior works, we recently Creative Commons License Attribution 4.0 International (CC BY released the largest manually annotated acronym identifica- 4.0). tion dataset for scientific documents (viz., SciAI) (Veyseh Dataset Size # Unique Acronyms # Unique Meaning # Documents Publicly Available Domain LSAEF (Liu et al. 2011) 6,185 255 1,372 N/A No Wikipedia AESM (Nautial, Sristy, and Somayajulu 2014) 355 N/A N/A N/A No Wikipedia MHIR (Harris and Srinivasan 2019) N/A N/A N/A 50 No Scientific Papers MHIR (Harris and Srinivasan 2019) N/A N/A N/A 50 No Patent MHIR (Harris and Srinivasan 2019) N/A N/A N/A 50 No News SciAI (ours) 17,506 7,964 9,775 6,786 yes Scientific Papers Table 1: Comparison of non-medical manually annotated acronym identification datasets. Note that size refers to the number of sentences in the dataset. Dataset Size Annotation Avg. Number of Samples per Long Form Science WISE (Prokofyev et al. 2013) 5,217 Disambiguation manually annotated N/A NOA (Charbonnier and Wartena 2018) 19,954 No manual annotation 4 SciAD (ours) 62,441 Acronym identification manually annotated 22 Table 2: Comparison of scientific acronym disambiguation (AD) datasets. Note that size refers to the number of sentences in the dataset. et al. 2020b). This dataset consists of 17,506 sentences from As mentioned in the introduction, the existing AI datasets 6,786 English papers published in arXiv. The annotation of are either created using some unsupervised methods (e.g., each sentence involves the acronyms and long forms men- by character matching the acronym with their surrounding tioned in the sentence. Also, using this manually annotated words in the text) or they are small-sized thus inappropri- AI dataset, we also created a dictionary of 732 acronyms ate for data-hungry deep learning models. To address these with multiple corresponding long forms (i.e., ambiguous limitations we aim to create the largest acronym identifi- acronyms) which is the largest available acronym dictionary cation dataset which is manually labeled. To this end, we for scientific documents. Moreover, using the prepared dic- first collect 6,786 English papers from arXiv. This collec- tionary and 2,031,592 sentences extracted from arXiv pa- tion contains 2,031,592 sentences. As all of these sentences pers, we created a dataset for the acronym disambiguation might not contain the acronym and their long forms, we task (viz., SciAD) (Veyseh et al. 2020b). This dataset con- first filter out the sentences without any candidate acronym sists of 62,441 sentences, which is larger than the prior AD and long-form. To identify the candidate acronyms, we use dataset for the scientific domain. the rule that the word wt is a candidate acronym if half Using the two datasets SciAI and SciAD, we organize of its characters are upper-cased. To identify the candidate two shared tasks for acronym identification and acronym long forms, we employ the rule that the subsequent words disambiguation for scientific document understanding (i.e., [wj , wj + 1, . . . , wj+k ] are a candidate long-form if the con- AI@SDU and AD@SDU, respectively). The AI@SDU catenation of their first one, two, or three characters can shared task has attracted 52 participant teams with 19 sub- form a candidate acronym, i.e., wt , in the sentence. After missions during the evaluation phase. The AD@SDU has filtering sentences without any candidate acronym and long- also attracted 43 participant teams with 10 submissions dur- form, 17,506 sentences are selected that are annotated by ing the evaluation phase. The participant teams made con- three annotators from Amazon Mechanical Turk (MTurk). siderable progress on both shared task compared to the pro- More specifically, MTurk workers annotated the acronyms, vided baselines. However, the top-performing models, (viz., long forms, and the mapping between identified acronyms AT-BERT-E for AI@SDU with 93.3% F1 score and Deep- and long forms. In case of disagreements, if two out of BlueAI for AD@SDU with 94.0% F1 score), underperforms three workers agree on an annotation, we use majority vot- human (with 96.0% and 96.1% F1 score for AI@SDU and ing to decide the correct annotation. Otherwise, a fourth an- AD@SDU shared task, respectively), leaving room for fu- notator is hired to resolve the conflict. The inter-annotator ture research. In this paper, we review the dataset creation agreement (IAA) using Krippendorff’s alpha (Krippendorff process, the details of the shared task, and the prominent 2011) with the MASI distance metric (Passonneau 2006) for submitted systems. short-forms (i.e., acronyms) is 0.80 and for long-forms (i.e., phrases) is 0.86. This dataset is called SciAI. A comparison Dataset & Task Description of the SciAI dataset with other existing manually annotated Acronym Identification AI datasets is provided in Table 1. The acronym identification (AI) task aims to recognize all Acronym Disambiguation acronym and long forms mentioned in a sentence. For- mally, given the sentence S = [w1 , w2 , . . . , wN ] the goal The goal of acronym disambiguation (AD) task is to find is to predict the sequence L = [l1 , l2 , . . . , lN ] where li ∈ the correct meaning of a given acronym in a sentence. More {Ba , Ia , Bl , Il , O}. Note that Ba and Ia indicate the begin- specifically, given the sentences S = [w1 , w2 , . . . , wN ] and ning and inside an acronym, respectively, while Bl and Il the index t where wt is an acronym with multiple long forms show beginning and inside a long form, respectively, and O L = {l1 , l2 , . . . , lm } the goal is to predict the long form li is the label for other words. form L as the correct meaning of wt . As discussed earlier, one of the issues with the exist- Team Name Precision Recall F1 GCDH 86.50 85.57 86.03 ing AD datasets is that they mainly focus on the biomedi- Aadarshsingh 88.26 89.08 88.67 cal domain, ignoring the challenges in other domains. This TAG-CIC 89.70 88.16 88.92 domain shift is important as some of the existing models Spark 89.91 90.49 90.20 for biomedical AD exert domain-specific resources (e.g., Dumb-AI 89.72 90.94 90.33 Napsternxg 90.15 91.15 90.65 BioBERT) which might not be suitable for other domains. Pikaqiu 91.02 90.51 90.76 Another issue of the existing AD datasets, especially the SciDr (Singh and Kumar 2020) 90.98 90.83 90.90 ones proposed for a scientific domain, is that they are based Aliou 90.78 91.12 90.95 on unsupervised AI datasets. That is, acronyms and long AliBaba2020 90.30 92.87 91.57 RK 89.93 93.88 91.86 forms in a corpus are identified using some rules and the DeepBlueAI 92.01 91.84 91.92 resulting AI dataset is employed to find acronyms with mul- EELM-SLP (Kubal and Nagvenkar 2020) 89.70 94.59 92.08 tiple long forms to create the AD dataset. This unsuper- Lufiedby 92.64 91.74 92.19 vised method to create an AD dataset could introduce noises Primer (Egan and Bohannon 2020) 91.73 93.49 92.60 and miss some challenging cases. To address these limita- HowToSay 91.93 93.70 92.81 N&E (Li et al. 2020) 93.49 92.74 93.11 tions, we created a new AD dataset using the manually la- AT-BERT-E (Zhu et al. 2020) 92.20 94.43 93.30 beled SciAI dataset. More specifically, first using the map- Baseline (Rule-based) 91.31 77.93 84.09 pings between annotated acronyms and long forms in SciAI, Human Performance 97.70 94.56 96.09 we create a dictionary of acronyms that have multiple long forms (i.e., ambiguous acronyms). This dictionary contains Table 3: Performance of the participating systems in 732 acronyms with an average of 3.1 meaning (i.e., long- Acronym Identification task form) per acronym. Afterward, to create samples for the AD dataset, we look up all sentences in the collected corpus in which one of the ambiguous acronyms is locally defined signed rules which could have high precision, but low recall (i.e., its long-form is provided in the same sentence). Next, (Rogers, Rae, and Demner-Fushman 2020; Li et al. 2020) in the documents hosting these sentences, we automatically (2) Feature-based Models: These models extract various annotate every occurrence of the acronym with its locally features from the texts to be used by a statistical model defined long-form. Using this process a dataset consisting to predict the acronyms and long forms (Li et al. 2020) of 62,441 sentences is created. We call this dataset SciAD. (3) Transformer-based models: In these systems, the sen- A comparison of the SciAD dataset with other existing sci- tence is encoded with a pre-trained transformer-based lan- entific AD dataset is provided in Table 2 guage model and the labels are predicted using the obtained word embeddings (Kubal and Nagvenkar 2020; Li et al. Participating Systems & Results 2020; Egan and Bohannon 2020). Some of these models may also leverage adversarial training to make the model Acronym Identification more robust to the noises (Zhu et al. 2020) or they might em- For the AI task, we provide a rule-based baseline. In partic- ploy an ensemble model (Singh and Kumar 2020). Among ular, inspired by (Schwartz and Hearst 2002), the baseline all submitted models, the method proposed by (Zhu et al. identifies the acronyms and their long-forms if they match 2020), i.e., AT-BERT-E, achieves the highest performance. one of the patterns of long form (acronym) or acronym (long This model employs an adversarial training approach to in- form). More specifically, if there is a word with more than crease the model robustness toward the noise. More specif- 60% upper-cased characters which is inside parentheses or ically, they augment the training data with adversarial per- right before parentheses, it is predicted as an acronym. Af- turbed samples and fine-tune a BERT model followed by a terward, we assess the words before or after the acronym feed-forward neural net on this task. For the adversarial per- (depending on which pattern the predicted acronym belongs turbation, they leverage a gradient-based approach in which to) that fall into the pre-defined window of size min(|A| + the sample representations are altered in the direction that 5, 2 ∗ |A|), where |A| is the number of characters in the the gradient of the loss function rises. acronym. In particular, if there is a sequence of characters We evaluate the systems based on macro-averaged preci- in these words which can form the upper-cased characters in sion, recall, and F1 score of the acronym and long-form pre- the acronym then the words after or before the acronym are diction. The results are shown in Table 3. This table shows selected as its meaning (i.e., long-form). Moreover, as SciAI that the participants have made considerable improvement dataset annotates acronyms even if they do not have any lo- over the provided baseline. However, there is still a gap be- cally defined long-form, we extend the rule for identifying tween the performance of the task winner (i.e., AT-BERT-E the acronyms by relaxing the requirement of being inside or (Zhu et al. 2020)) and human-level performance, suggesting before the parentheses. more improvement is required. In the AI@SDU task, 54 teams participated and 18 of them submitted their system results in the evaluation phase. Acronym Disambiguation In total, all teams submitted 254 submissions for different For AD task, we propose to employ the frequency of the versions of their models. The submitted systems employ var- acronym long forms to disambiguate them. More specif- ious methods including: (1) Rule-based Methods: Similar ically, for the acronym a with the long forms L = to our baseline, some participants exploited manually de- [l1 , l2 , . . . , lm ], we compute the number of occurrence of Team Name Precision Recall F1 UC3M (Jaber and Martinez 2020) 92.15 77.97 84.37 and the candidate with the highest scores is selected as the AccAcE (Pereira, Galhardas, and Shasha 2020) 93.57 83.77 88.40 final model prediction. For the classifier, authors employ a GCDH 94.88 87.03 90.79 pre-trained BERT model that takes the input in the form of Spark 94.87 87.23 90.89 AI-NLM (Rogers, Rae, and Demner-Fushman 2020) 90.73 91.96 91.34 Li [SEP ] w1 , w2 , . . . , start, wa , end, . . . , wn , where Li is Primer (Egan and Bohannon 2020) 94.72 88.64 91.58 the long-form candiate, wi is the words of the input sen- Sansansanye 95.18 88.93 91.95 Zhuyeu 95.48 89.07 92.16 tence, wa is the ambiguous acronym in the input sentence, Dumb AI 95.95 89.59 92.66 and start and end are two special tokens to provide the po- SciDr (Singh and Kumar 2020) 96.52 90.09 93.19 sition of the acronym to the model. hdBERT (Zhong et al. 2020) 96.94 90.73 93.73 DeepBlueAI (Pan et al. 2020) 96.95 91.32 94.05 We evaluate the systems using their macro-averaged pre- Baseline (Freq.) 89.00 46.36 60.97 cision, recall, and F1 score for predicting the correct long Human Performance 97.82 94.45 96.10 form. The results are shown in Table 4. Again, this table Table 4: Performance of the participating systems in shows that the participating systems considerably improved Acronym Disambiguation task the performance over the provided baseline. Although, the existing gap between the best performing model, i.e., Deep- BlueAI (Pan et al. 2020), and human-level performance shows that more research is required. each of its long forms in the training data, i.e., F = [f1 , f2 , . . . , fm ] where fi = |Aai | and Aai is the set of sen- tences in the training data with the acronym a and the long Conclusion form li . In inference time, the acronym a is expanded to its In this paper, we summarized the task of acronym identi- long form with the highest frequency, i.e., i∗ = arg maxi fi . fication and acronym disambiguation at a scientific docu- The AD@SDU task attracted 44 participants, 12 submis- ment understanding workshop (AI@SDU and AD@SDU). sions at the evaluation phase, and 187 total submissions For these tasks, we provide two novel datasets that ad- for different versions of the participating systems. This task dress the limitations of the prior work. Both tasks attracted has been approached with a variety of methods, including substantial participants with considerable performance im- (1) Feature-based models: Some systems extract features provement over provided baselines. However, the lower per- from the input sentence (e.g., word stems, part-of-speech formance of the best performing models compared to the tags, or special characters in the acronym). Next, a statis- human level performance shows that more research should tical model, such as Support Vector Machine, Naive Bayes, be conducted on both tasks. and K-nearest neighbors, is employed to predict the correct long form of the acronym (Jaber and Martinez 2020; Pereira, References Galhardas, and Shasha 2020); (2) Neural Networks: A few of the participating systems employ deep architectures, Ackermann, C. F.; Beller, C. E.; Boxwell, S. A.; Katz, E. G.; e.g., convolution neural networks (CNN) or long short- and Summers, K. M. 2020. Resolution of acronyms in ques- term memory (LSTM) (Rogers, Rae, and Demner-Fushman tion answering systems. US Patent 10,572,597. 2020); (3) Transformer-based Models: The majority of Charbonnier, J.; and Wartena, C. 2018. Using Word Em- the participants resort to transformer-based language mod- beddings for Unsupervised Acronym Disambiguation. In els, e.g., BERT, SciBERT or RoBERTa, to encode the in- Proceedings of the 27th International Conference on Com- put sentence. However, they differ in how they leverage the putational Linguistics. Santa Fe, New Mexico, USA: Asso- outputs of these language models for prediction and also ciation for Computational Linguistics. URL https://www. how they formulate the task. Whereas most of the existing aclweb.org/anthology/C18-1221. works formulate the task as a classification problem (Pan et al. 2020; Zhong et al. 2020), authors in (Egan and Bo- Ciosici, M. R.; Sommer, T.; and Assent, I. 2019. Un- hannon 2020) use an information retrieval approach. More supervised abbreviation disambiguation. arXiv preprint specifically, the cosine similarity between the embeddings arXiv:1904.00929 . of the candidates and the input is employed to compute the Egan, N.; and Bohannon, J. 2020. Primer AI’s Sys- score of each candidate and then to rank them based on their tems for Acronym Identification and Disambiguation. In scores. Moreover, authors in (Singh and Kumar 2020) model SDU@AAAI-21. this task as a span prediction problem. Specifically, the con- catenation of the different candidate long forms with the Espinosa-Anke, L.; and Schockaert, S. 2018. Syntactically acronym and the input sentence is encoded by a transformer- Aware Neural Architectures for Definition Extraction. In based language model. Afterward, a sequence labeling com- NAACL-HLT. ponent predicts the sub-sequence with the highest proba- Harris, C. G.; and Srinivasan, P. 2019. My Word! Machine bility of being the correct long form. Among all submit- versus Human Computation Methods for Identifying and ted systems, the DeepBlueAI model proposed by (Pan et al. Resolving Acronyms. Computación y Sistemas 23(3). 2020) obtained the highest performance for acronym dis- ambiguation on SciAD test set. This model formulate this Jaber, A.; and Martinez, P. 2020. Participation of UC3M task as a binary classification problem in which each can- in SDU@AAAI-21: A Hybrid Approach to Disambiguate didate long-form is assigned a score by a binary classifier Scientific Acronyms. In SDU@AAAI-21. Jin, Q.; Liu, J.; and Lu, X. 2019. Deep Contextual- Passonneau, R. 2006. Measuring agreement on set-valued ized Biomedical Abbreviation Expansion. arXiv preprint items (MASI) for semantic and pragmatic annotation. In arXiv:1906.03360 . LREC. Jin, Y.; Kan, M.-Y.; Ng, J.-P.; and He, X. 2013. Mining Sci- Pereira, J. L. M.; Galhardas, H.; and Shasha, D. 2020. entific Terms and their Definitions: A Study of the ACL An- Acronym Expander at SDU@AAAI-21: an Acronym Dis- thology. In EMNLP. ambiguation Module. In SDU@AAAI-21. Kirchhoff, K.; and Turner, A. M. 2016. Unsupervised res- Pouran Ben Veyseh, A.; Nguyen, T. H.; and Dou, D. 2019. olution of acronyms and abbreviations in nursing notes us- Graph based Neural Networks for Event Factuality Predic- ing document-level context models. In Proceedings of the tion using Syntactic and Semantic Structures. In ACL. Seventh International Workshop on Health Text Mining and Prokofyev, R.; Demartini, G.; Boyarsky, A.; Ruchayskiy, O.; Information Analysis, 52–60. and Cudré-Mauroux, P. 2013. Ontology-based word sense Krippendorff, K. 2011. Computing Krippendorff’s alpha- disambiguation for scientific literature. In European confer- reliability. ence on information retrieval, 594–605. Springer. Kubal, D.; and Nagvenkar, A. 2020. Effective Ensembling Rogers, W.; Rae, A.; and Demner-Fushman, D. 2020. AI- of Transformer based Language Models for Acronyms Iden- NLM exploration of the Acronym Identification and Disam- tification. In SDU@AAAI-21. biguation Shared Tssks at SDU@AAAI-21. In SDU@AAAI- 21. Kuo, C.-J.; Ling, M. H.; Lin, K.-T.; and Hsu, C.-N. 2009. Schwartz, A. S.; and Hearst, M. A. 2002. A simple algorithm BIOADI: a machine learning approach to identifying ab- for identifying abbreviation definitions in biomedical text. In breviations and definitions in biological literature. In BMC Biocomputing 2003, 451–462. World Scientific. bioinformatics, volume 10, S7. Springer. Singh, A.; and Kumar, P. 2020. SciDr at SDU-2020 : IDEAS Li, F.; Mai, Z.; Zou, W.; Ou, W.; Qin, X.; Lin, Y.; and Zhang, - Identifying and Disambiguating Everyday Acronyms for W. 2020. Systems at SDU-2021 Task 1: Transformers for Scientific Domain. In SDU@AAAI-21. Sentence Level Sequence Label. In SDU@AAAI-21. Spala, S.; Miller, N.; Dernoncourt, F.; and Dockhorn, C. Li, Y.; Zhao, B.; Fuxman, A.; and Tao, F. 2018. Guess 2020. SemEval-2020 Task 6: Definition Extraction from Me if You Can: Acronym Disambiguation for Enterprises. Free Text with the DEFT Corpus. In Proceedings of the In Proceedings of the 56th Annual Meeting of the Associa- Fourteenth Workshop on Semantic Evaluation. tion for Computational Linguistics (Volume 1: Long Papers), 1308–1317. Melbourne, Australia: Association for Com- Spala, S.; Miller, N. A.; Yang, Y.; Dernoncourt, F.; and putational Linguistics. doi:10.18653/v1/P18-1121. URL Dockhorn, C. 2019. DEFT: A corpus for definition extrac- https://www.aclweb.org/anthology/P18-1121. tion in free- and semi-structured text. In Proceedings of the 13th Linguistic Annotation Workshop. Liu, J.; Chen, J.; Zhang, Y.; and Huang, Y. 2011. Learn- Taneva, B.; Cheng, T.; Chakrabarti, K.; and He, Y. 2013. ing conditional random fields with latent sparse features for Mining acronym expansions and their meanings using query acronym expansion finding. In Proceedings of the 20th click log. In Proceedings of the 22nd international confer- ACM international conference on Information and knowl- ence on World Wide Web, 1261–1272. edge management, 867–872. Veyseh, A. P. B. 2016. Cross-lingual question answering us- Liu, Y.; Meng, F.; Zhang, J.; Xu, J.; Chen, Y.; and Zhou, ing common semantic space. In Proceedings of TextGraphs- J. 2019. Gcdt: A global context enhanced deep transition 10: the workshop on graph-based methods for natural lan- architecture for sequence labeling. In ACL. guage processing, 15–19. Nadeau, D.; and Turney, P. D. 2005. A supervised learning Veyseh, A. P. B.; Dernoncourt, F.; Chang, W.; and Nguyen, approach to acronym identification. In Conference of the T. H. 2021. MadDog: A Web-based System for Acronym Canadian Society for Computational Studies of Intelligence, Identification and Disambiguation. In EACL. 319–329. Springer. Veyseh, A. P. B.; Dernoncourt, F.; Dou, D.; and Nguyen, Nautial, A.; Sristy, N. B.; and Somayajulu, D. V. 2014. Find- T. H. 2020a. A Joint Model for Definition Extraction with ing acronym expansion using semi-Markov conditional ran- Syntactic Connection and Semantic Consistency. In AAAI, dom fields. In Proceedings of the 7th ACM India Computing 9098–9105. Conference, 1–6. Veyseh, A. P. B.; Dernoncourt, F.; Tran, Q. H.; and Nguyen, Pan, C.; Song, B.; Wang, S.; and Luo, Z. 2020. BERT-based T. H. 2020b. What Does This Acronym Mean? Introducing Acronym Disambiguation with Multiple Training Strategies. a New Dataset for Acronym Identification and Disambigua- In SDU@AAAI-21. tion. In Proceedings of COLING. Park, Y.; and Byrd, R. J. 2001. Hybrid text mining for find- Zhong, Q.; Zeng, G.; Zhu, D.; Zhang, Y.; Lin, W.; Chen, ing abbreviations and their definitions. In Proceedings of the B.; and Tang, J. 2020. Leveraging Domain Agnostic 2001 conference on empirical methods in natural language and Specific Knowledge for Acronym Disambiguation. In processing. SDU@AAAI-21. Zhu, D.; Lin, W.; Zhang, Y.; Zhong, Q.; Zeng, G.; Wu, W.; and Tang, J. 2020. AT-BERT: Adversarial Training BERT for Acronym Identification Winning Solution for SDU@AAAI-21. In SDU@AAAI-21.