=Paper=
{{Paper
|id=Vol-3164/paper12
|storemode=property
|title=Acronym Identification using Transformers and Flair Framework
|pdfUrl=https://ceur-ws.org/Vol-3164/paper12.pdf
|volume=Vol-3164
|authors=Fazlourrahman Balouchzahi,Oxana Vitman,Hosahalli Lakshmaiah Shashirekha,Grigori Sidorov,Alexander Gelbukh
|dblpUrl=https://dblp.org/rec/conf/aaai/BalouchzahiVSSG22
}}
==Acronym Identification using Transformers and Flair Framework==
<pdf width="1500px">https://ceur-ws.org/Vol-3164/paper12.pdf</pdf>
<pre>
Acronym Identification using Transformers and Flair
Framework
F. Balouchzahi1 , O. Vitman1 , H.L. Shashirekha2 , G. Sidorov1 and A. Gelbukh1
1
    Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), Mexico City, Mexico
2
    Department of Computer Science, Mangalore University, Mangalore, India


                                             Abstract
                                             The amount of acronyms in texts is growing with the increase in the number of scientific articles and it is not bound only to
                                             English texts. The Acronym Extraction (AE) task aims at automatically identifying and extracting the acronyms and their
                                             long forms in the given text. To tackle the challenge of AE in different languages, this paper describes the participation of
                                             the team MUCIC in the AE shared task at the AAAI-22 Workshop on Scientific Document Understanding (SDU@AAAI-22).
                                             This shared task aims at identifying and extracting acronyms and their long forms from English, Spanish, French, Danish,
                                             Persian, and Vietnamese texts. The proposed methodology consists of data transformation using Spacy and/or other libraries
                                             depending on the language and a Flair framework to fine-tune the transformers of the corresponding languages to extract
                                             acronyms and their long-forms. For the Spanish language, the proposed methodology secured the second rank and for all
                                             other languages, the results obtained are reasonable.

                                             Keywords
                                             Acronym, Expansion, Flair, BERT


1. Introduction                                                                                                       tion (TC), Information Retrieval (IR) and text summariza-
                                                                                                                      tion. Therefore, it is necessary to develop a system that
The term “acronym” is defined as a word or name framed                                                                can automatically extract acronyms and their meanings
by taking the first letters of each word of a phrase [1].                                                             (i.e.,long-forms or expansions) in the given documents.
For instance, AIDS is an acronym for “Acquired Immune                                                                     Most of the existing dominant approaches to identify
Deficiency Syndrome”. Acronyms are used in a text to                                                                  acronyms and their expansions in free text focus on local
familiarize the abbreviations. They also serve impor-                                                                 acronyms, whose expansions appear in the same docu-
tant purposes such as speeding up the reading, avoiding                                                               ment, typically in the same sentence or nearby sentences
repetition of unwieldy technical terms and ease the un-                                                               usually enclosed within parentheses. In contrast, non-
derstanding of the content in a scientific paper.                                                                     local (global) acronyms are unaccompanied by their ex-
   Scientists frequently over-use acronyms. According to                                                              pansion in the same document. They are usually written
the report [2]: after an analysis of more than 24 million                                                             with the (not necessarily correct) assumption that the
article titles and 18 million article abstracts published                                                             reader is already familiar with the acronyms’ intended
between 1950 and 2019, there was at least one acronym                                                                 meanings. Non-local acronyms are more challenging
in 19% of titles and 73% of abstracts. This shows that                                                                to interpret since the expansions are not found in the
the number of acronyms is constantly increasing with                                                                  neighbourhood.
the increase in the amount of scientific papers published                                                                 Over the past two decades, several techniques have
every year. Thus, the widespread usage of acronyms                                                                    been proposed to extract acronyms and their expansions
poses a challenge to machines and non-expert human                                                                    from a given text corpus. These techniques use pattern-
beings attempting to read the scientific documents.                                                                   matching [3], Machine Learning (ML) (i.e., CRF and SVM)
   Understanding the correlation between acronyms and                                                                 [4, 5] or word-embedding [6] to extract acronyms. More
their expansions is critical for several Natural Language                                                             recently, Deep Learning (DL) methods [7] are showing
Processing (NLP) applications such as Text Classifica-                                                                promising results for AE. Further, pre-trained language
                                                                                                                      models such as ELMo [8] and BERT [9] have also shown
The Second Workshop on Scientific Document Understanding at AAAI                                                      their effectiveness in contextual representation for ex-
2022                                                                                                                  tracting acronyms.
$ frs_b@yahoo.com (F. Balouchzahi); ovitman2021@cic.ipn.mx
                                                                                                                          The usage of acronyms is common in many high-
(O. Vitman); hlsrekha@gmail.com (H.L. Shashirekha);
sidorov@cic.ipn.mx (G. Sidorov); gelbukh@cic.ipn.mx (A. Gelbukh)                                                      resource as well as low-resource languages. This paper
 https://sites.google.com/view/fazlfrs/home (F. Balouchzahi);                                                        describes the model submitted by our team MUCIC to
https://www.cic.ipn.mx/~sidorov/ (G. Sidorov);                                                                        AE shared task at SDU@AAAI-221 [10]. The shared task
http://www.gelbukh.com/ (A. Gelbukh)                                                                                  consists of identifying the acronyms and long forms from
 0000-0003-1937-3475 (F. Balouchzahi)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative        1
                                       Commons License Attribution 4.0 International (CC BY 4.0).                              https://github.com/amirveyseh/AAAI-22-SDU-shared-task-1-
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)                                        AE
texts in six languages, namely: English, Spanish, French,     they adopted several strategies including dynamic neg-
Danish, Persian, and Vietnamese.                              ative sample selection, task adaptive pretraining, adver-
   The proposed methodology to identify acronyms in           sarial training and pseudo-labeling for AD. The experi-
the given text contains Data transformation and Model         ments conducted won the first place in AD shared task
Fine-Tuning and is based on our previous work [11] that       at SDU@AAAI-2021 with F1-score of 0.94.
utilized Flair framework to fine-tune transformers. Our          Three models based on Bidirectional Long Short-Term
proposed model obtained promising results for almost          Memory (BiLSTM) and Conditional Random Field (CRF)2 ,
all high-resource languages and the best performance is       namely: BiLSTM with CRF Huang et al. [18], Stacked
achieved for Spanish with a F1-score of 0.90 leading to       BiLSTM and CRF Lample et al. [19], and Bi-LSTM and
second rank in the AE shared task.                            CRF with convolution and max-pooling Ma et al. [20]
   The rest of the paper is organized as follows: Sec-        were adopted by Rogers et al. [21] for AI shared task with
tion 2 describes some of the good performing models           Glove embedding for all the models. They also employed
submitted to Acronym Identification (AI) shared task at       four transformer models, namely: BERT, BioBERT, Distil-
AAAI-21 Workshop on Scientific Document Understand-           BERT, and RoBERTa as well for AI shared task. The best
ing (SDU@AAAI-21) followed by the proposed method-            performance was obtained using stacked BiLSTM with
ology in Section 3. Experiments and results are discussed     CRF with a F1-score of 0.91.
in Section 4 and the paper concludes in Section 5.               Despite several models, the complexity of AI/AE pro-
                                                              vides scope for further experimentation.

2. Related Work
                                                              3. Methodology
Researchers have developed several efficient models start-
ing from traditional rule-based to advanced DL meth-          The proposed methodology is adopted from our previ-
ods for AI, AE and Acronym Disambiguation (AD) tasks.         ous work on Automatic Detection of Occupations and
Given an acronym and several possible expansions, AD          Profession in Medical Texts using Flair and BERT [11]
task has to determine the correct expansion for the given     applied only on Spanish language texts. With minor mod-
context. AD task is challenging due to the high ambigu-       ifications to the existing architecture, the methodology is
ity of acronyms. The organizers of SDU@AAAI-21 have           extended for the AE task in six languages text provided
released two large datasets of English scientific papers      by the organizers. The workflow of the methodology con-
published at arXiv for two shared tasks: AI [12] and AD       tains two major parts: Data Transformation and Model
[13]. The studies and models related to AI, AE and AD         Fine-Tuning, which are explained in the following sub-
are described below:                                          sections:
   Traditional approaches of sequence labeling, mainly
rule-based or feature-based, are introduced by Schwartz       3.1. Data Transformation
et al. [14] for AI. Their model builds a dictionary of
local acronyms by utilizing character-match between           This phase contains the necessary steps to transform the
acronym letters and corresponding long-forms in the           data to a representation that can be used to train and
same sentence to discover the acronym and its long-form.      fine-tune the model. The data provided for our previous
   Zhu et al. [15] proposed AT-BERT - a Bidirectional En-     work [11] was in Brat standoff format3 and this data
coder Representations from Transformers (BERT)-based          was transformed to CONLL IOB4 format as it is easy to
model for AI shared task at SDU@AAAI-21. A Fast Gra-          process data in CONLL IOB format rather than in Brat
dient Method (FGM)-based adversarial training strategy        format. Brat format consists of a collection of text (.txt)
was incorporated in the fine-tuning of BERT variants,         and their corresponding annotation files (.ann).
and an average ensemble mechanism was devised to cap-            The datasets for the AI shared task consists of JSON
ture the better representation from multi-BERT variants.      files. Each JSON file contains a collection of 4 compo-
This proposed model secured first rank in AI shared task      nents comprising of text, beginning and ending offsets
with an average macro F1-score of 0.94.                       of acronyms and their corresponding long-forms and
   The model proposed by Egan et al. [16] uses a trans-       an id of that text. A sample JSON file is shown in Fig-
former followed by linear projection for AI and finds         ure 1. These JSON files are first transformed to Brat
similar examples with embeddings learned from Twin            representation as shown in Figure 2 and then the Brat
Networks for AD. With ensemble of different transform-        representations are transformed to CONLL IOB repre-
ers, the models obtained F1-scores of 0.93 and 0.91 for AI    sentation as described in [11] and is shown in Figure 3.
and AD shared tasks respectively.
   Pan et al. [17] introduces a binary classification model       2
                                                                      https://github.com/guillaumegenthial/tf_ner
                                                                  3
for AD. Using BERT encoder for input representations,                 https://brat.nlplab.org/standoff.html
                                                                  4
                                                                      https://nlp.lsi.upc.edu/freeling/node/83
Figure 1: A sample JSON file


                                                               Language           Transformer
                                                               English            bert-base-uncased
                                                               Spanish            dccuchile/bert-base-spanish-wwm-cased
                                                               Danish             Maltehb/danish-bert-botxo
                                                               French             gilf/french-camembert-postag-model
                                                               Persian            HooshvareLab/bert-fa-zwnj-base
                                                                                  lamhieu/distilbert-base-multilingual
                                                               Vietnamese
                                                                                  -cased-vietnamese-topicifier
                                                             Table 1
                                                             Transformer used for each language


                                                             3.2. Model Fine-Tuning
                                                               Model Fine-Tuning employs Flair framework to fine-tune
                                                               the pre-trained transformer language model to build a
                                                               sequence tagger for the task of AE - a downstream task.
                                                                     8
Figure 2: Transformation of data from JSON to Brat format Flair is a PyTorch based NLP tool that provides the fa-
                                                               cility of utilizing individual or combination of word em-
                                                               beddings and language models [11]. Sequence Tagger
                                                               module from Flair has BiLSTM backend with CRF layer
The input JSON files of all the languages in the given on top of this model (which is not used in this work).
dataset are first converted to a collection of text (.txt) and    Since fine-tuning the transformers is time consum-
their corresponding annotation files (.ann) according to ing and require significant resources such as RAM and
Brat format based on the provided beginning and ending GPU, models are fine-tuned only for 3 epochs which may
offsets corresponding to acronyms and their long-forms. probably lead to lower results. As the overall perfor-
As the proposed methodology is based on our previous mance of the proposed methodology also depends on the
work, the direct transformation of JSON files to CONLL language model, for each language, the most popular lan-
IOB format is avoided.                                         guage model is selected and fine-tuned. The pre-trained
   Spacy5 library which provides various tools for pro- transformer language models used for each language
cessing texts in different languages is used specifically are presented in Table 1 and the overview of proposed
to extract tokens and sentences from text. However, as methodology is shown in Figure 4.
Spacy does not support low resource languages such as
Persian and Vietnamese, the tools pyvi6 and HAZM7 are
used to extract tokens and sentences from Vietnamese 4. Experiments and Results
and Persian texts respectively.
                                                               The primary requirement to promote research in any
                                                               NLP task is the availability of annotated dataset. AE
                                                               shared task organizers have provided the participants
                                                               with labeled training and development set as well as un-
    5
      https://spacy.io/                                        labelled test set for evaluating the developed models. The
    6
        https://pypi.org/project/pyvi/
    7                                                            8
        https://github.com/sobhe/hazm                                https://github.com/flairNLP/flair
Figure 3: Transformation of data from Brat to CONLL IOB format


Figure 4: Overview of proposed method


datasets are provided in six languages, namely: English,            The reason for lower results in Persian and Vietnamese
Spanish, Danish, French, Persian, and Vietnamese and                could be due to the presence of only acronyms and their
only English language dataset consists of legal and scien-          long forms in English (in some cases no long forms also)
tific domains [22]. Description of the datasets is available        and the rest of the text in their native script. As the trans-
in the GitHub page9 and their statistics are shown in               formers used for these languages are monolingual, they
Table 2. It can be observed that the datasets are highly            usually do not support other scripts. The proposed model
imbalanced. Further, more number of samples in lan-                 obtained its best performance in Spanish language and
guages such as Spanish and French may lead to better                obtained second rank in the shared task.
performance of the task as compared to less number of                  Comparison of macro-averaged F1-scores of the top
samples in Vietnamese and Persian languages.                        models in the shared task for all languages is illustrated
    The models submitted to the shared task are evalu-              in Figure 5. It can be observed that, as per the expec-
ated on the blinded test set for predicting the boundaries          tations most of models obtained higher performance in
of acronyms and their long-forms based on the macro-                English and Spanish languages. The results also prove
averaged scores such as Precision, Recall and F1-score.             that as the proposed methodology with only 3 epochs
Participating teams are ranked based on macro-averaged              training has shown promising results, experiments could
F1-score and the results obtained by the proposed method            be conducted on improving the results by increasing the
for all languages are presented in Table 3. As expected             epochs.
the proposed method obtained lower results in Persian
and Vietnamese languages (Spacy does not support these
languages) compared to the results in other languages.              5. Conclusion and Future Work
                                                                    This paper provides the description of the methodology
     9
         https://github.com/amirveyseh/AAAI-22-SDU-shared-task-1-   and the results obtained by team MUCIC for AE shared
AE
                                            Train set                       Dev. set           Test set
               Language
                                              # of     # of                  # of     # of
                                # of Texts                     # of Texts                     # of Texts
                                           Acronyms Long forms            Acronyms Long forms
            English (Legal)       3,563       9,532     5,246          444       1,213       669         445
         English (Scientific)     3,979       7,689     5,715          469       970         720         497
                Spanish           5,927       13,016    9,393          740       1,538      1,108        740
                Danish            3,081       6,282     2,119          384       784         271         385
                French            7,782       21,746   13,638          972       2,651      1,628        972
                Persian           1,335       2,451     209            166       311          17         167
             Vietnamese           1,273       1,332      62            158       175          8          159

Table 2
Statistics of the datasets used in the shared task


Figure 5: Comparison of macro-averaged F1-scores of top models in the shared task


task at SDU@AAAI-22. Data transformation which deals            using Flair frame work as well as other DL methods for
with different data representations is the primary step         the task of AE in different languages.
in this methodology. The sentences and tokens required
for this step are extracted using Spacy or other libraries
depending on the language. Flair framework used for             Acknowledgments
fine-tuning the pre-trained transformer language model
                                                                The work was done with partial support from the Mex-
for NER task is extended by building a sequence tagger to
                                                                ican Government through the grant A1-S-47854 of the
extract acronyms and their long forms. Results obtained
                                                                CONACYT, Mexico, grants 20211784, 20211884, and
for different languages prove that more number of sam-
                                                                20211178 of the Secretaría de Investigación y Posgrado
ples in the training set leads to the higher performances
                                                                of the Instituto Politécnico Nacional, Mexico. The au-
in identifying the acronyms and their long-forms. The
                                                                thors thank the CONACYT for the computing resources
proposed model obtained its best performance in Spanish
                                                                brought to them through the Plataforma de Aprendizaje
language and obtained second rank in the shared task
                                                                Profundo para Tecnologías del Lenguaje of the Laborato-
and for all other languages, the results obtained are quite
                                                                rio de Supercómputo of the INAOE, Mexico and acknowl-
reasonble. As future work we would like to experiment
                                                                edge the support of Microsoft through the Microsoft Latin
the combination of embeddings and language models
                                                                America PhD Award.
    Languages F1-score Precision Recall Rank                      2019 Conference of the North American Chapter
     English
                0.87      0.84    0.89    5                       of the Association for Computational Linguistics:
      (Legal)                                                     Human Language Technologies, Volume 1 (Long
      English                                                     and Short Papers), 2019, pp. 4171–4186.
                    0.83       0.80     0.86     5
    (Scientific)                                             [10] S. Y. R. J. F. D. T. H. N. Amir Pouran Ben Veyseh,
      Spanish       0.90       0.90     0.91     2                Nicole Meister, Multilingual Acronym Extraction
                                                                  and Disambiguation Shared Tasks at SDU 2022, in:
      Danish        0.81       0.78     0.84     5                Proceedings of SDU@AAAI-22, 2022.
      French        0.81       0.81     0.80     4           [11] F. Balouchzahi, G. Sidorov, H. L. Shashirekha,
      Persian       0.59       0.92     0.43     3                ADOP FERT-Automatic Detection of Occupations
                                                                  and Profession in Medical Texts using Flair and
    Vietnamese      0.36       0.37     0.36     6                BERT, in: Proceedings of the Iberian Languages
Table 3                                                           Evaluation Forum (IberLEF 2021) co-located with
Performance of the proposed methodology                           the Conference of the Spanish Society for Natural
                                                                  Language Processing (SEPLN 2021), XXXVII Inter-
                                                                  national Conference of the Spanish Society for Nat-
                                                                  ural Language Processing., Málaga, Spain, Septem-
References                                                        ber, 2021, volume 2943 of CEUR Workshop Proceed-
 [1] C. A. Mack, How to Write a Good Scientific Pa-               ings, CEUR-WS.org, 2021, pp. 747–757. URL: http:
     per: Acronyms, Journal of micro/nanolithography,             //ceur-ws.org/Vol-2943/meddoprof_paper2.pdf.
     MEMS, and MOEMS 11 (2012) 040102.                       [12] A. P. B. Veyseh, F. Dernoncourt, T. H. Nguyen,
 [2] A. Barnett, Z. Doubleday, The Growth of Acronyms             W. Chang, L. A. Celi, Acronym Identification and
     in the Scientific Literature, eLife Sciences Publica-        Disambiguation Shared Tasks for Scientific Docu-
     tions, Ltd 9 (2020) e60080. URL: https://doi.org/10.         ment Understanding, in: Proceedings of the Work-
     7554/eLife.60080. doi:10.7554/eLife.60080.                   shop on Scientific Document Understanding co-
 [3] K. Taghva, J. Gilbreth, Finding Acronyms and their           located with 35th AAAI Conference on Artificial
     Definitions, IJDAR 1 (1999) 191–198. doi:10.1007/            Inteligence, SDU@AAAI 2021, Virtual Event, Febru-
     s100320050018.                                               ary 9, 2021, volume 2831 of CEUR Workshop Pro-
 [4] J. Liu, C. Liu, Y. Huang, Multi-Granularity Sequence         ceedings, CEUR-WS.org, 2021. URL: http://ceur-ws.
     Labeling Model for Acronym Expansion Identifica-             org/Vol-2831/paper33.pdf.
     tion, Information Sciences 378 (2017) 462–474.          [13] A. P. B. Veyseh, F. Dernoncourt, Q. H. Tran, T. H.
 [5] K. Jacobs, A. Itai, S. Wintner, Acronyms: Identi-            Nguyen, What does this Acronym Mean? Introduc-
     fication, Expansion and Disambiguation, Annals               ing a New Dataset for Acronym Identification and
     of Mathematics and Artificial Intelligence 88 (2020)         Disambiguation, in: Proceedings of COLING, 2020.
     517–532.                                                [14] A. Schwartz, M. Hearst, A Simple Algorithm for
 [6] K. Kirchhoff, A. M. Turner, Unsupervised Resolu-             Identifying Abbreviation Definitions in Biomedi-
     tion of Acronyms and Abbreviations in Nursing                cal Text, Pacific Symposium on Biocomputing. Pa-
     Notes using Document-level Context Models, in:               cific Symposium on Biocomputing 4 (2003) 451–62.
     Proceedings of the Seventh International Workshop            doi:10.1142/9789812776303_0042.
     on Health Text Mining and Information Analysis,         [15] D. Zhu, W. Lin, Y. Zhang, Q. Zhong, G. Zeng,
     2016, pp. 52–60.                                             W. Wu, J. Tang, AT-BERT: Adversarial Training
 [7] J. Charbonnier, C. Wartena, Using Word Embed-                BERT for Acronym Identification Winning Solution
     dings for Unsupervised Acronym Disambiguation,               for SDU@AAAI-21, CEUR Workshop Proceedings
     in: Proceedings of the 27th International Confer-            (2021).
     ence on Computational Linguistics, 2018, pp. 2610–      [16] N. Egan, J. Bohannon, Primer AI’s Systems for
     2619.                                                        Acronym Identification and Disambiguation, in:
 [8] M. Peters, M. Neumann, L. Zettlemoyer, W.-t. Yih,            Proceedings of the Workshop on Scientific Docu-
     Dissecting Contextual Word Embeddings: Archi-                ment Understanding co-located with 35th AAAI
     tecture and Representation, in: Proceedings of the           Conference on Artificial Inteligence, SDU@AAAI
     2018 Conference on Empirical Methods in Natural              2021, Virtual Event, February 9, 2021, volume 2831
     Language Processing, 2018, pp. 1499–1509.                    of CEUR Workshop Proceedings, CEUR-WS.org, 2021.
 [9] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert:          URL: http://ceur-ws.org/Vol-2831/paper30.pdf.
     Pre-training of Deep Bidirectional Transformers for     [17] C. Pan, B. Song, S. Wang, Z. Luo, BERT-based
     Language Understanding, in: Proceedings of the               Acronym Disambiguation with Multiple Training
                                                                  Strategies, in: Proceedings of the Workshop on Sci-
     entific Document Understanding co-located with
     35th AAAI Conference on Artificial Inteligence,
     SDU@AAAI 2021, Virtual Event, February 9, 2021,
     volume 2831 of CEUR Workshop Proceedings, CEUR-
     WS.org, 2021. URL: http://ceur-ws.org/Vol-2831/
     paper25.pdf.
[18] Z. Huang, W. Xu, K. Yu, Bidirectional lstm-crf
     models for sequence tagging, arXiv preprint
     arXiv:1508.01991 (2015).
[19] G. Lample, M. Ballesteros, S. Subramanian,
     K. Kawakami, C. Dyer, Neural Architectures for
     Named Entity Recognition, in: Proceedings of
     NAACL-HLT, 2016, pp. 260–270.
[20] X. Ma, E. Hovy, End-to-end Sequence Labeling via
     Bi-directional LSTM-CNNs-CRF, in: Proceedings
     of the 54th Annual Meeting of the Association for
     Computational Linguistics (Volume 1: Long Papers),
     2016, pp. 1064–1074.
[21] W. Rogers, A. R. Rae, D. Demner-Fushman, AI-NLM
     exploration of the Acronym Identification Shared
     Task at SDU@ AAAI-21., 2021.
[22] S. Y. R. J. F. D. T. H. N. Amir Pouran Ben Vey-
     seh, Nicole Meister, MACRONYM: A Large-
     Scale Dataset for Multilingual and Multi-Domain
     Acronym Extraction, in: arXiv, 2022.

</pre>