=Paper=
{{Paper
|id=Vol-3164/paper12
|storemode=property
|title=Acronym Identification using Transformers and Flair Framework
|pdfUrl=https://ceur-ws.org/Vol-3164/paper12.pdf
|volume=Vol-3164
|authors=Fazlourrahman Balouchzahi,Oxana Vitman,Hosahalli Lakshmaiah Shashirekha,Grigori Sidorov,Alexander Gelbukh
|dblpUrl=https://dblp.org/rec/conf/aaai/BalouchzahiVSSG22
}}
==Acronym Identification using Transformers and Flair Framework==
Acronym Identification using Transformers and Flair
Framework
F. Balouchzahi1 , O. Vitman1 , H.L. Shashirekha2 , G. Sidorov1 and A. Gelbukh1
1
Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), Mexico City, Mexico
2
Department of Computer Science, Mangalore University, Mangalore, India
Abstract
The amount of acronyms in texts is growing with the increase in the number of scientific articles and it is not bound only to
English texts. The Acronym Extraction (AE) task aims at automatically identifying and extracting the acronyms and their
long forms in the given text. To tackle the challenge of AE in different languages, this paper describes the participation of
the team MUCIC in the AE shared task at the AAAI-22 Workshop on Scientific Document Understanding (SDU@AAAI-22).
This shared task aims at identifying and extracting acronyms and their long forms from English, Spanish, French, Danish,
Persian, and Vietnamese texts. The proposed methodology consists of data transformation using Spacy and/or other libraries
depending on the language and a Flair framework to fine-tune the transformers of the corresponding languages to extract
acronyms and their long-forms. For the Spanish language, the proposed methodology secured the second rank and for all
other languages, the results obtained are reasonable.
Keywords
Acronym, Expansion, Flair, BERT
1. Introduction tion (TC), Information Retrieval (IR) and text summariza-
tion. Therefore, it is necessary to develop a system that
The term “acronym” is defined as a word or name framed can automatically extract acronyms and their meanings
by taking the first letters of each word of a phrase [1]. (i.e.,long-forms or expansions) in the given documents.
For instance, AIDS is an acronym for “Acquired Immune Most of the existing dominant approaches to identify
Deficiency Syndrome”. Acronyms are used in a text to acronyms and their expansions in free text focus on local
familiarize the abbreviations. They also serve impor- acronyms, whose expansions appear in the same docu-
tant purposes such as speeding up the reading, avoiding ment, typically in the same sentence or nearby sentences
repetition of unwieldy technical terms and ease the un- usually enclosed within parentheses. In contrast, non-
derstanding of the content in a scientific paper. local (global) acronyms are unaccompanied by their ex-
Scientists frequently over-use acronyms. According to pansion in the same document. They are usually written
the report [2]: after an analysis of more than 24 million with the (not necessarily correct) assumption that the
article titles and 18 million article abstracts published reader is already familiar with the acronyms’ intended
between 1950 and 2019, there was at least one acronym meanings. Non-local acronyms are more challenging
in 19% of titles and 73% of abstracts. This shows that to interpret since the expansions are not found in the
the number of acronyms is constantly increasing with neighbourhood.
the increase in the amount of scientific papers published Over the past two decades, several techniques have
every year. Thus, the widespread usage of acronyms been proposed to extract acronyms and their expansions
poses a challenge to machines and non-expert human from a given text corpus. These techniques use pattern-
beings attempting to read the scientific documents. matching [3], Machine Learning (ML) (i.e., CRF and SVM)
Understanding the correlation between acronyms and [4, 5] or word-embedding [6] to extract acronyms. More
their expansions is critical for several Natural Language recently, Deep Learning (DL) methods [7] are showing
Processing (NLP) applications such as Text Classifica- promising results for AE. Further, pre-trained language
models such as ELMo [8] and BERT [9] have also shown
The Second Workshop on Scientific Document Understanding at AAAI their effectiveness in contextual representation for ex-
2022 tracting acronyms.
$ frs_b@yahoo.com (F. Balouchzahi); ovitman2021@cic.ipn.mx
The usage of acronyms is common in many high-
(O. Vitman); hlsrekha@gmail.com (H.L. Shashirekha);
sidorov@cic.ipn.mx (G. Sidorov); gelbukh@cic.ipn.mx (A. Gelbukh) resource as well as low-resource languages. This paper
https://sites.google.com/view/fazlfrs/home (F. Balouchzahi); describes the model submitted by our team MUCIC to
https://www.cic.ipn.mx/~sidorov/ (G. Sidorov); AE shared task at SDU@AAAI-221 [10]. The shared task
http://www.gelbukh.com/ (A. Gelbukh) consists of identifying the acronyms and long forms from
0000-0003-1937-3475 (F. Balouchzahi)
© 2021 Copyright for this paper by its authors. Use permitted under Creative 1
Commons License Attribution 4.0 International (CC BY 4.0). https://github.com/amirveyseh/AAAI-22-SDU-shared-task-1-
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org) AE
texts in six languages, namely: English, Spanish, French, they adopted several strategies including dynamic neg-
Danish, Persian, and Vietnamese. ative sample selection, task adaptive pretraining, adver-
The proposed methodology to identify acronyms in sarial training and pseudo-labeling for AD. The experi-
the given text contains Data transformation and Model ments conducted won the first place in AD shared task
Fine-Tuning and is based on our previous work [11] that at SDU@AAAI-2021 with F1-score of 0.94.
utilized Flair framework to fine-tune transformers. Our Three models based on Bidirectional Long Short-Term
proposed model obtained promising results for almost Memory (BiLSTM) and Conditional Random Field (CRF)2 ,
all high-resource languages and the best performance is namely: BiLSTM with CRF Huang et al. [18], Stacked
achieved for Spanish with a F1-score of 0.90 leading to BiLSTM and CRF Lample et al. [19], and Bi-LSTM and
second rank in the AE shared task. CRF with convolution and max-pooling Ma et al. [20]
The rest of the paper is organized as follows: Sec- were adopted by Rogers et al. [21] for AI shared task with
tion 2 describes some of the good performing models Glove embedding for all the models. They also employed
submitted to Acronym Identification (AI) shared task at four transformer models, namely: BERT, BioBERT, Distil-
AAAI-21 Workshop on Scientific Document Understand- BERT, and RoBERTa as well for AI shared task. The best
ing (SDU@AAAI-21) followed by the proposed method- performance was obtained using stacked BiLSTM with
ology in Section 3. Experiments and results are discussed CRF with a F1-score of 0.91.
in Section 4 and the paper concludes in Section 5. Despite several models, the complexity of AI/AE pro-
vides scope for further experimentation.
2. Related Work
3. Methodology
Researchers have developed several efficient models start-
ing from traditional rule-based to advanced DL meth- The proposed methodology is adopted from our previ-
ods for AI, AE and Acronym Disambiguation (AD) tasks. ous work on Automatic Detection of Occupations and
Given an acronym and several possible expansions, AD Profession in Medical Texts using Flair and BERT [11]
task has to determine the correct expansion for the given applied only on Spanish language texts. With minor mod-
context. AD task is challenging due to the high ambigu- ifications to the existing architecture, the methodology is
ity of acronyms. The organizers of SDU@AAAI-21 have extended for the AE task in six languages text provided
released two large datasets of English scientific papers by the organizers. The workflow of the methodology con-
published at arXiv for two shared tasks: AI [12] and AD tains two major parts: Data Transformation and Model
[13]. The studies and models related to AI, AE and AD Fine-Tuning, which are explained in the following sub-
are described below: sections:
Traditional approaches of sequence labeling, mainly
rule-based or feature-based, are introduced by Schwartz 3.1. Data Transformation
et al. [14] for AI. Their model builds a dictionary of
local acronyms by utilizing character-match between This phase contains the necessary steps to transform the
acronym letters and corresponding long-forms in the data to a representation that can be used to train and
same sentence to discover the acronym and its long-form. fine-tune the model. The data provided for our previous
Zhu et al. [15] proposed AT-BERT - a Bidirectional En- work [11] was in Brat standoff format3 and this data
coder Representations from Transformers (BERT)-based was transformed to CONLL IOB4 format as it is easy to
model for AI shared task at SDU@AAAI-21. A Fast Gra- process data in CONLL IOB format rather than in Brat
dient Method (FGM)-based adversarial training strategy format. Brat format consists of a collection of text (.txt)
was incorporated in the fine-tuning of BERT variants, and their corresponding annotation files (.ann).
and an average ensemble mechanism was devised to cap- The datasets for the AI shared task consists of JSON
ture the better representation from multi-BERT variants. files. Each JSON file contains a collection of 4 compo-
This proposed model secured first rank in AI shared task nents comprising of text, beginning and ending offsets
with an average macro F1-score of 0.94. of acronyms and their corresponding long-forms and
The model proposed by Egan et al. [16] uses a trans- an id of that text. A sample JSON file is shown in Fig-
former followed by linear projection for AI and finds ure 1. These JSON files are first transformed to Brat
similar examples with embeddings learned from Twin representation as shown in Figure 2 and then the Brat
Networks for AD. With ensemble of different transform- representations are transformed to CONLL IOB repre-
ers, the models obtained F1-scores of 0.93 and 0.91 for AI sentation as described in [11] and is shown in Figure 3.
and AD shared tasks respectively.
Pan et al. [17] introduces a binary classification model 2
https://github.com/guillaumegenthial/tf_ner
3
for AD. Using BERT encoder for input representations, https://brat.nlplab.org/standoff.html
4
https://nlp.lsi.upc.edu/freeling/node/83
Figure 1: A sample JSON file
Language Transformer
English bert-base-uncased
Spanish dccuchile/bert-base-spanish-wwm-cased
Danish Maltehb/danish-bert-botxo
French gilf/french-camembert-postag-model
Persian HooshvareLab/bert-fa-zwnj-base
lamhieu/distilbert-base-multilingual
Vietnamese
-cased-vietnamese-topicifier
Table 1
Transformer used for each language
3.2. Model Fine-Tuning
Model Fine-Tuning employs Flair framework to fine-tune
the pre-trained transformer language model to build a
sequence tagger for the task of AE - a downstream task.
8
Figure 2: Transformation of data from JSON to Brat format Flair is a PyTorch based NLP tool that provides the fa-
cility of utilizing individual or combination of word em-
beddings and language models [11]. Sequence Tagger
module from Flair has BiLSTM backend with CRF layer
The input JSON files of all the languages in the given on top of this model (which is not used in this work).
dataset are first converted to a collection of text (.txt) and Since fine-tuning the transformers is time consum-
their corresponding annotation files (.ann) according to ing and require significant resources such as RAM and
Brat format based on the provided beginning and ending GPU, models are fine-tuned only for 3 epochs which may
offsets corresponding to acronyms and their long-forms. probably lead to lower results. As the overall perfor-
As the proposed methodology is based on our previous mance of the proposed methodology also depends on the
work, the direct transformation of JSON files to CONLL language model, for each language, the most popular lan-
IOB format is avoided. guage model is selected and fine-tuned. The pre-trained
Spacy5 library which provides various tools for pro- transformer language models used for each language
cessing texts in different languages is used specifically are presented in Table 1 and the overview of proposed
to extract tokens and sentences from text. However, as methodology is shown in Figure 4.
Spacy does not support low resource languages such as
Persian and Vietnamese, the tools pyvi6 and HAZM7 are
used to extract tokens and sentences from Vietnamese 4. Experiments and Results
and Persian texts respectively.
The primary requirement to promote research in any
NLP task is the availability of annotated dataset. AE
shared task organizers have provided the participants
with labeled training and development set as well as un-
5
https://spacy.io/ labelled test set for evaluating the developed models. The
6
https://pypi.org/project/pyvi/
7 8
https://github.com/sobhe/hazm https://github.com/flairNLP/flair
Figure 3: Transformation of data from Brat to CONLL IOB format
Figure 4: Overview of proposed method
datasets are provided in six languages, namely: English, The reason for lower results in Persian and Vietnamese
Spanish, Danish, French, Persian, and Vietnamese and could be due to the presence of only acronyms and their
only English language dataset consists of legal and scien- long forms in English (in some cases no long forms also)
tific domains [22]. Description of the datasets is available and the rest of the text in their native script. As the trans-
in the GitHub page9 and their statistics are shown in formers used for these languages are monolingual, they
Table 2. It can be observed that the datasets are highly usually do not support other scripts. The proposed model
imbalanced. Further, more number of samples in lan- obtained its best performance in Spanish language and
guages such as Spanish and French may lead to better obtained second rank in the shared task.
performance of the task as compared to less number of Comparison of macro-averaged F1-scores of the top
samples in Vietnamese and Persian languages. models in the shared task for all languages is illustrated
The models submitted to the shared task are evalu- in Figure 5. It can be observed that, as per the expec-
ated on the blinded test set for predicting the boundaries tations most of models obtained higher performance in
of acronyms and their long-forms based on the macro- English and Spanish languages. The results also prove
averaged scores such as Precision, Recall and F1-score. that as the proposed methodology with only 3 epochs
Participating teams are ranked based on macro-averaged training has shown promising results, experiments could
F1-score and the results obtained by the proposed method be conducted on improving the results by increasing the
for all languages are presented in Table 3. As expected epochs.
the proposed method obtained lower results in Persian
and Vietnamese languages (Spacy does not support these
languages) compared to the results in other languages. 5. Conclusion and Future Work
This paper provides the description of the methodology
9
https://github.com/amirveyseh/AAAI-22-SDU-shared-task-1- and the results obtained by team MUCIC for AE shared
AE
Train set Dev. set Test set
Language
# of # of # of # of
# of Texts # of Texts # of Texts
Acronyms Long forms Acronyms Long forms
English (Legal) 3,563 9,532 5,246 444 1,213 669 445
English (Scientific) 3,979 7,689 5,715 469 970 720 497
Spanish 5,927 13,016 9,393 740 1,538 1,108 740
Danish 3,081 6,282 2,119 384 784 271 385
French 7,782 21,746 13,638 972 2,651 1,628 972
Persian 1,335 2,451 209 166 311 17 167
Vietnamese 1,273 1,332 62 158 175 8 159
Table 2
Statistics of the datasets used in the shared task
Figure 5: Comparison of macro-averaged F1-scores of top models in the shared task
task at SDU@AAAI-22. Data transformation which deals using Flair frame work as well as other DL methods for
with different data representations is the primary step the task of AE in different languages.
in this methodology. The sentences and tokens required
for this step are extracted using Spacy or other libraries
depending on the language. Flair framework used for Acknowledgments
fine-tuning the pre-trained transformer language model
The work was done with partial support from the Mex-
for NER task is extended by building a sequence tagger to
ican Government through the grant A1-S-47854 of the
extract acronyms and their long forms. Results obtained
CONACYT, Mexico, grants 20211784, 20211884, and
for different languages prove that more number of sam-
20211178 of the Secretaría de Investigación y Posgrado
ples in the training set leads to the higher performances
of the Instituto Politécnico Nacional, Mexico. The au-
in identifying the acronyms and their long-forms. The
thors thank the CONACYT for the computing resources
proposed model obtained its best performance in Spanish
brought to them through the Plataforma de Aprendizaje
language and obtained second rank in the shared task
Profundo para Tecnologías del Lenguaje of the Laborato-
and for all other languages, the results obtained are quite
rio de Supercómputo of the INAOE, Mexico and acknowl-
reasonble. As future work we would like to experiment
edge the support of Microsoft through the Microsoft Latin
the combination of embeddings and language models
America PhD Award.
Languages F1-score Precision Recall Rank 2019 Conference of the North American Chapter
English
0.87 0.84 0.89 5 of the Association for Computational Linguistics:
(Legal) Human Language Technologies, Volume 1 (Long
English and Short Papers), 2019, pp. 4171–4186.
0.83 0.80 0.86 5
(Scientific) [10] S. Y. R. J. F. D. T. H. N. Amir Pouran Ben Veyseh,
Spanish 0.90 0.90 0.91 2 Nicole Meister, Multilingual Acronym Extraction
and Disambiguation Shared Tasks at SDU 2022, in:
Danish 0.81 0.78 0.84 5 Proceedings of SDU@AAAI-22, 2022.
French 0.81 0.81 0.80 4 [11] F. Balouchzahi, G. Sidorov, H. L. Shashirekha,
Persian 0.59 0.92 0.43 3 ADOP FERT-Automatic Detection of Occupations
and Profession in Medical Texts using Flair and
Vietnamese 0.36 0.37 0.36 6 BERT, in: Proceedings of the Iberian Languages
Table 3 Evaluation Forum (IberLEF 2021) co-located with
Performance of the proposed methodology the Conference of the Spanish Society for Natural
Language Processing (SEPLN 2021), XXXVII Inter-
national Conference of the Spanish Society for Nat-
ural Language Processing., Málaga, Spain, Septem-
References ber, 2021, volume 2943 of CEUR Workshop Proceed-
[1] C. A. Mack, How to Write a Good Scientific Pa- ings, CEUR-WS.org, 2021, pp. 747–757. URL: http:
per: Acronyms, Journal of micro/nanolithography, //ceur-ws.org/Vol-2943/meddoprof_paper2.pdf.
MEMS, and MOEMS 11 (2012) 040102. [12] A. P. B. Veyseh, F. Dernoncourt, T. H. Nguyen,
[2] A. Barnett, Z. Doubleday, The Growth of Acronyms W. Chang, L. A. Celi, Acronym Identification and
in the Scientific Literature, eLife Sciences Publica- Disambiguation Shared Tasks for Scientific Docu-
tions, Ltd 9 (2020) e60080. URL: https://doi.org/10. ment Understanding, in: Proceedings of the Work-
7554/eLife.60080. doi:10.7554/eLife.60080. shop on Scientific Document Understanding co-
[3] K. Taghva, J. Gilbreth, Finding Acronyms and their located with 35th AAAI Conference on Artificial
Definitions, IJDAR 1 (1999) 191–198. doi:10.1007/ Inteligence, SDU@AAAI 2021, Virtual Event, Febru-
s100320050018. ary 9, 2021, volume 2831 of CEUR Workshop Pro-
[4] J. Liu, C. Liu, Y. Huang, Multi-Granularity Sequence ceedings, CEUR-WS.org, 2021. URL: http://ceur-ws.
Labeling Model for Acronym Expansion Identifica- org/Vol-2831/paper33.pdf.
tion, Information Sciences 378 (2017) 462–474. [13] A. P. B. Veyseh, F. Dernoncourt, Q. H. Tran, T. H.
[5] K. Jacobs, A. Itai, S. Wintner, Acronyms: Identi- Nguyen, What does this Acronym Mean? Introduc-
fication, Expansion and Disambiguation, Annals ing a New Dataset for Acronym Identification and
of Mathematics and Artificial Intelligence 88 (2020) Disambiguation, in: Proceedings of COLING, 2020.
517–532. [14] A. Schwartz, M. Hearst, A Simple Algorithm for
[6] K. Kirchhoff, A. M. Turner, Unsupervised Resolu- Identifying Abbreviation Definitions in Biomedi-
tion of Acronyms and Abbreviations in Nursing cal Text, Pacific Symposium on Biocomputing. Pa-
Notes using Document-level Context Models, in: cific Symposium on Biocomputing 4 (2003) 451–62.
Proceedings of the Seventh International Workshop doi:10.1142/9789812776303_0042.
on Health Text Mining and Information Analysis, [15] D. Zhu, W. Lin, Y. Zhang, Q. Zhong, G. Zeng,
2016, pp. 52–60. W. Wu, J. Tang, AT-BERT: Adversarial Training
[7] J. Charbonnier, C. Wartena, Using Word Embed- BERT for Acronym Identification Winning Solution
dings for Unsupervised Acronym Disambiguation, for SDU@AAAI-21, CEUR Workshop Proceedings
in: Proceedings of the 27th International Confer- (2021).
ence on Computational Linguistics, 2018, pp. 2610– [16] N. Egan, J. Bohannon, Primer AI’s Systems for
2619. Acronym Identification and Disambiguation, in:
[8] M. Peters, M. Neumann, L. Zettlemoyer, W.-t. Yih, Proceedings of the Workshop on Scientific Docu-
Dissecting Contextual Word Embeddings: Archi- ment Understanding co-located with 35th AAAI
tecture and Representation, in: Proceedings of the Conference on Artificial Inteligence, SDU@AAAI
2018 Conference on Empirical Methods in Natural 2021, Virtual Event, February 9, 2021, volume 2831
Language Processing, 2018, pp. 1499–1509. of CEUR Workshop Proceedings, CEUR-WS.org, 2021.
[9] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: URL: http://ceur-ws.org/Vol-2831/paper30.pdf.
Pre-training of Deep Bidirectional Transformers for [17] C. Pan, B. Song, S. Wang, Z. Luo, BERT-based
Language Understanding, in: Proceedings of the Acronym Disambiguation with Multiple Training
Strategies, in: Proceedings of the Workshop on Sci-
entific Document Understanding co-located with
35th AAAI Conference on Artificial Inteligence,
SDU@AAAI 2021, Virtual Event, February 9, 2021,
volume 2831 of CEUR Workshop Proceedings, CEUR-
WS.org, 2021. URL: http://ceur-ws.org/Vol-2831/
paper25.pdf.
[18] Z. Huang, W. Xu, K. Yu, Bidirectional lstm-crf
models for sequence tagging, arXiv preprint
arXiv:1508.01991 (2015).
[19] G. Lample, M. Ballesteros, S. Subramanian,
K. Kawakami, C. Dyer, Neural Architectures for
Named Entity Recognition, in: Proceedings of
NAACL-HLT, 2016, pp. 260–270.
[20] X. Ma, E. Hovy, End-to-end Sequence Labeling via
Bi-directional LSTM-CNNs-CRF, in: Proceedings
of the 54th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers),
2016, pp. 1064–1074.
[21] W. Rogers, A. R. Rae, D. Demner-Fushman, AI-NLM
exploration of the Acronym Identification Shared
Task at SDU@ AAAI-21., 2021.
[22] S. Y. R. J. F. D. T. H. N. Amir Pouran Ben Vey-
seh, Nicole Meister, MACRONYM: A Large-
Scale Dataset for Multilingual and Multi-Domain
Acronym Extraction, in: arXiv, 2022.