1. Introduction

Acronym Identification using Transformers and Flair Framework

F. Balouchzahi

O. Vitman

ovitman2021@cic.ipn.mx 1

H.L. Shashirekha

G. Sidorov

sidorov@cic.ipn.mx 1

A. Gelbukh

gelbukh@cic.ipn.mx 1 0 Department of Computer Science, Mangalore University , Mangalore , India 1 Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC) , Mexico City , Mexico

The amount of acronyms in texts is growing with the increase in the number of scientific articles and it is not bound only to English texts. The Acronym Extraction (AE) task aims at automatically identifying and extracting the acronyms and their long forms in the given text. To tackle the challenge of AE in diferent languages, this paper describes the participation of the team MUCIC in the AE shared task at the AAAI-22 Workshop on Scientific Document Understanding (SDU@AAAI-22). This shared task aims at identifying and extracting acronyms and their long forms from English, Spanish, French, Danish, Persian, and Vietnamese texts. The proposed methodology consists of data transformation using Spacy and/or other libraries depending on the language and a Flair framework to fine-tune the transformers of the corresponding languages to extract acronyms and their long-forms. For the Spanish language, the proposed methodology secured the second rank and for all other languages, the results obtained are reasonable.

eol>Acronym Expansion Flair BERT

1. Introduction 1https://github.com/amirveyseh/AAAI-22-SDU-shared-task-1

3. Methodology

texts in six languages, namely: English, Spanish, French, they adopted several strategies including dynamic negDanish, Persian, and Vietnamese. ative sample selection, task adaptive pretraining, adver

The proposed methodology to identify acronyms in sarial training and pseudo-labeling for AD. The experithe given text contains Data transformation and Model ments conducted won the first place in AD shared task Fine-Tuning and is based on our previous work [ 11 ] that at SDU@AAAI-2021 with F1-score of 0.94. utilized Flair framework to fine-tune transformers. Our Three models based on Bidirectional Long Short-Term proposed model obtained promising results for almost Memory (BiLSTM) and Conditional Random Field (CRF)2, all high-resource languages and the best performance is namely: BiLSTM with CRF Huang et al. [ 18 ], Stacked achieved for Spanish with a F1-score of 0.90 leading to BiLSTM and CRF Lample et al. [ 19 ], and Bi-LSTM and second rank in the AE shared task. CRF with convolution and max-pooling Ma et al. [ 20 ]

The rest of the paper is organized as follows: Sec- were adopted by Rogers et al. [ 21 ] for AI shared task with tion 2 describes some of the good performing models Glove embedding for all the models. They also employed submitted to Acronym Identification (AI) shared task at four transformer models, namely: BERT, BioBERT, DistilAAAI-21 Workshop on Scientific Document Understand- BERT, and RoBERTa as well for AI shared task. The best ing (SDU@AAAI-21) followed by the proposed method- performance was obtained using stacked BiLSTM with ology in Section 3. Experiments and results are discussed CRF with a F1-score of 0.91. in Section 4 and the paper concludes in Section 5. Despite several models, the complexity of AI/AE provides scope for further experimentation.

2. Related Work Researchers have developed several eficient models start

ing from traditional rule-based to advanced DL meth- The proposed methodology is adopted from our previods for AI, AE and Acronym Disambiguation (AD) tasks. ous work on Automatic Detection of Occupations and Given an acronym and several possible expansions, AD Profession in Medical Texts using Flair and BERT [ 11 ] task has to determine the correct expansion for the given applied only on Spanish language texts. With minor modcontext. AD task is challenging due to the high ambigu- ifications to the existing architecture, the methodology is ity of acronyms. The organizers of SDU@AAAI-21 have extended for the AE task in six languages text provided released two large datasets of English scientific papers by the organizers. The workflow of the methodology conpublished at arXiv for two shared tasks: AI [ 12 ] and AD tains two major parts: Data Transformation and Model [ 13 ]. The studies and models related to AI, AE and AD Fine-Tuning, which are explained in the following subare described below: sections:

Traditional approaches of sequence labeling, mainly rule-based or feature-based, are introduced by Schwartz 3.1. Data Transformation et al. [ 14 ] for AI. Their model builds a dictionary of local acronyms by utilizing character-match between This phase contains the necessary steps to transform the acronym letters and corresponding long-forms in the data to a representation that can be used to train and same sentence to discover the acronym and its long-form. fine-tune the model. The data provided for our previous

Zhu et al. [ 15 ] proposed AT-BERT - a Bidirectional En- work [ 11 ] was in Brat standof format 3 and this data coder Representations from Transformers (BERT)-based was transformed to CONLL IOB4 format as it is easy to model for AI shared task at SDU@AAAI-21. A Fast Gra- process data in CONLL IOB format rather than in Brat dient Method (FGM)-based adversarial training strategy format. Brat format consists of a collection of text (.txt) was incorporated in the fine-tuning of BERT variants, and their corresponding annotation files (.ann). and an average ensemble mechanism was devised to cap- The datasets for the AI shared task consists of JSON ture the better representation from multi-BERT variants. files. Each JSON file contains a collection of 4 compoThis proposed model secured first rank in AI shared task nents comprising of text, beginning and ending ofsets with an average macro F1-score of 0.94. of acronyms and their corresponding long-forms and

The model proposed by Egan et al. [ 16 ] uses a trans- an id of that text. A sample JSON file is shown in Figformer followed by linear projection for AI and finds ure 1. These JSON files are first transformed to Brat similar examples with embeddings learned from Twin representation as shown in Figure 2 and then the Brat Networks for AD. With ensemble of diferent transform- representations are transformed to CONLL IOB repreers, the models obtained F1-scores of 0.93 and 0.91 for AI sentation as described in [ 11 ] and is shown in Figure 3. and AD shared tasks respectively.

Pan et al. [ 17 ] introduces a binary classification model for AD. Using BERT encoder for input representations,

2https://github.com/guillaumegenthial/tf_ner 3https://brat.nlplab.org/standof.html 4https://nlp.lsi.upc.edu/freeling/node/83

3.2. Model Fine-Tuning

Model Fine-Tuning employs Flair framework to fine-tune

the pre-trained transformer language model to build a sequence tagger for the task of AE - a downstream task.

Figure 2: Transformation of data from JSON to Brat format Flair8 is a PyTorch based NLP tool that provides the facility of utilizing individual or combination of word embeddings and language models [ 11 ]. Sequence Tagger module from Flair has BiLSTM backend with CRF layer The input JSON files of all the languages in the given on top of this model (which is not used in this work). dataset are first converted to a collection of text (.txt) and Since fine-tuning the transformers is time consumtheir corresponding annotation files (.ann) according to ing and require significant resources such as RAM and Brat format based on the provided beginning and ending GPU, models are fine-tuned only for 3 epochs which may ofsets corresponding to acronyms and their long-forms. probably lead to lower results. As the overall perforAs the proposed methodology is based on our previous mance of the proposed methodology also depends on the work, the direct transformation of JSON files to CONLL language model, for each language, the most popular lanIOB format is avoided. guage model is selected and fine-tuned. The pre-trained

Spacy5 library which provides various tools for pro- transformer language models used for each language cessing texts in diferent languages is used specifically are presented in Table 1 and the overview of proposed to extract tokens and sentences from text. However, as methodology is shown in Figure 4. Spacy does not support low resource languages such as Persian and Vietnamese, the tools pyvi6 and HAZM7 are used to extract tokens and sentences from Vietnamese 4. Experiments and Results and Persian texts respectively.

5https://spacy.io/ 6https://pypi.org/project/pyvi/ 7https://github.com/sobhe/hazm The primary requirement to promote research in any

NLP task is the availability of annotated dataset. AE shared task organizers have provided the participants with labeled training and development set as well as unlabelled test set for evaluating the developed models. The

8https://github.com/flairNLP/flair

datasets are provided in six languages, namely: English, The reason for lower results in Persian and Vietnamese Spanish, Danish, French, Persian, and Vietnamese and could be due to the presence of only acronyms and their only English language dataset consists of legal and scien- long forms in English (in some cases no long forms also) tific domains [ 22 ]. Description of the datasets is available and the rest of the text in their native script. As the transin the GitHub page9 and their statistics are shown in formers used for these languages are monolingual, they Table 2. It can be observed that the datasets are highly usually do not support other scripts. The proposed model imbalanced. Further, more number of samples in lan- obtained its best performance in Spanish language and guages such as Spanish and French may lead to better obtained second rank in the shared task. performance of the task as compared to less number of Comparison of macro-averaged F1-scores of the top samples in Vietnamese and Persian languages. models in the shared task for all languages is illustrated

The models submitted to the shared task are evalu- in Figure 5. It can be observed that, as per the expecated on the blinded test set for predicting the boundaries tations most of models obtained higher performance in of acronyms and their long-forms based on the macro- English and Spanish languages. The results also prove averaged scores such as Precision, Recall and F1-score. that as the proposed methodology with only 3 epochs Participating teams are ranked based on macro-averaged training has shown promising results, experiments could F1-score and the results obtained by the proposed method be conducted on improving the results by increasing the for all languages are presented in Table 3. As expected epochs. the proposed method obtained lower results in Persian and Vietnamese languages (Spacy does not support these languages) compared to the results in other languages. 5. Conclusion and Future Work

This paper provides the description of the methodology

9https://github.com/amirveyseh/AAAI-22-SDU-shared-task-1- and the results obtained by team MUCIC for AE shared task at SDU@AAAI-22. Data transformation which deals with diferent data representations is the primary step in this methodology. The sentences and tokens required for this step are extracted using Spacy or other libraries depending on the language. Flair framework used for ifne-tuning the pre-trained transformer language model for NER task is extended by building a sequence tagger to extract acronyms and their long forms. Results obtained for diferent languages prove that more number of samples in the training set leads to the higher performances in identifying the acronyms and their long-forms. The proposed model obtained its best performance in Spanish language and obtained second rank in the shared task and for all other languages, the results obtained are quite reasonble. As future work we would like to experiment the combination of embeddings and language models using Flair frame work as well as other DL methods for the task of AE in diferent languages.

Acknowledgments The work was done with partial support from the Mex

ican Government through the grant A1-S-47854 of the CONACYT, Mexico, grants 20211784, 20211884, and 20211178 of the Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The authors thank the CONACYT for the computing resources brought to them through the Plataforma de Aprendizaje Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE, Mexico and acknowledge the support of Microsoft through the Microsoft Latin America PhD Award.

[1]

C. A.

Mack , How to Write a Good Scientific Paper: Acronyms , Journal of micro/nanolithography, MEMS, and MOEMS 11 ( 2012 ) 040102 .

[2]

Barnett ,

Doubleday , The Growth of Acronyms in the Scientific Literature , eLife Sciences Publications, Ltd 9 ( 2020 ) e60080 . URL: https://doi.org/10. 7554/eLife.60080. doi: 10 .7554/eLife.60080.

[3]

Taghva ,

Gilbreth , Finding Acronyms and their Definitions, IJDAR 1 ( 1999 ) 191 - 198 . doi: 10 .1007/ s100320050018.

[4]

Liu , C. Liu,

Huang , Multi-Granularity Sequence Labeling Model for Acronym Expansion Identification , Information Sciences 378 ( 2017 ) 462 - 474 .

[5]

Jacobs ,

Itai ,

Wintner , Acronyms: Identiifcation, Expansion and Disambiguation, Annals of Mathematics and Artificial Intelligence 88 ( 2020 ) 517 - 532 .

[6]

Kirchhof ,

A. M.

Turner , Unsupervised Resolution of Acronyms and Abbreviations in Nursing Notes using Document-level Context Models , in: Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis , 2016 , pp. 52 - 60 .

[7]

Charbonnier ,

Wartena , Using Word Embeddings for Unsupervised Acronym Disambiguation , in: Proceedings of the 27th International Conference on Computational Linguistics , 2018 , pp. 2610 - 2619 .

[8]

Peters ,

Neumann ,

Zettlemoyer , W.-t. Yih, Dissecting Contextual Word Embeddings: Architecture and Representation , in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , 2018 , pp. 1499 - 1509 .

[9]

Devlin , M.-

Chang ,

Lee ,

Toutanova , Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 (Long and Short Papers), 2019 , pp. 4171 - 4186 .

[10]

Y. R. J. F. D. T. H. N. Amir Pouran Ben Veyseh , Nicole Meister, Multilingual Acronym Extraction and Disambiguation Shared Tasks at SDU 2022 , in: Proceedings of SDU@AAAI-22 , 2022 .

[11]

Balouchzahi , G. Sidorov,

H. L.

Shashirekha , ADOP FERT-Automatic Detection of Occupations and Profession in Medical Texts using Flair and BERT , in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021 ) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), XXXVII International Conference of the Spanish Society for Natural Language Processing ., Málaga , Spain, September, 2021 , volume 2943 of CEUR Workshop Proceedings, CEUR-WS.org , 2021 , pp. 747 - 757 . URL: http: //ceur-ws. org/ Vol- 2943 /meddoprof_paper2.pdf .

[12] A. P. B. Veyseh , F.

Dernoncourt , T. H.

Nguyen , W.

Chang , L. A.

Celi , Acronym Identification and Disambiguation Shared Tasks for Scientific Document Understanding , in: Proceedings of the Workshop on Scientific Document Understanding colocated with 35th AAAI Conference on Artificial Inteligence, SDU@AAAI 2021 ,

Virtual

Event , February 9 , 2021 , volume 2831 of CEUR Workshop Proceedings, CEUR-WS.org , 2021 . URL: http://ceur-ws. org/ Vol- 2831 /paper33.pdf .

[13] A. P. B. Veyseh , F.

Dernoncourt , Q. H.

Tran , T. H.

Nguyen , What does this Acronym Mean? Introducing a New Dataset for Acronym Identification and Disambiguation , in: Proceedings of COLING , 2020 .

[14]

Schwartz ,

Hearst , A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 4 ( 2003 ) 451 - 62 . doi: 10 .1142/9789812776303_ 0042 .

[15]

Zhu ,

Lin ,

Zhang ,

Zhong , G. Zeng,

Wu ,

Tang , AT-BERT: Adversarial Training BERT for Acronym Identification Winning Solution for SDU@AAAI-21 , CEUR Workshop Proceedings ( 2021 ).

[16]

Egan ,

Bohannon , Primer AI's Systems for Acronym Identification and Disambiguation , in: Proceedings of the Workshop on Scientific Document Understanding co-located with 35th AAAI Conference on Artificial Inteligence, SDU@AAAI 2021 ,

Virtual

Event , February 9 , 2021 , volume 2831 of CEUR Workshop Proceedings, CEUR-WS.org , 2021 . URL: http://ceur-ws. org/ Vol- 2831 /paper30.pdf .

[17]

Pan ,

Song ,

Wang ,

Luo , BERT-based Acronym Disambiguation with Multiple Training Strategies , in: Proceedings of the Workshop on Scientific Document Understanding co-located with 35th AAAI Conference on Artificial Inteligence, SDU@AAAI 2021 ,

Virtual

Event , February 9 , 2021 , volume 2831 of CEUR Workshop Proceedings , CEURWS.org, 2021 . URL: http://ceur-ws. org/ Vol- 2831 / paper25.pdf .

[18]

Huang ,

Xu ,

Yu , Bidirectional lstm-crf models for sequence tagging , arXiv preprint arXiv:1508 . 01991 ( 2015 ).

[19]

Lample ,

Ballesteros ,

Subramanian ,

Kawakami ,

Dyer , Neural Architectures for Named Entity Recognition , in: Proceedings of NAACL-HLT , 2016 , pp. 260 - 270 .

[20]

Ma , E. Hovy, End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1 : Long

Papers)

, 2016 , pp. 1064 - 1074 .

[21]

Rogers ,

A. R.

Rae , D. Demner-Fushman, AI-NLM exploration of the Acronym Identification Shared Task at SDU@ AAAI-21 ., 2021 .

[22]

Y. R. J. F. D. T. H. N. Amir Pouran Ben Veyseh , Nicole Meister, MACRONYM: A LargeScale Dataset for Multilingual and Multi-Domain Acronym Extraction , in: arXiv, 2022 .