ABCD Team at ABSAPT 2024: Classification-Based
                                versus Generation-Based Approach for Aspect-Based
                                Sentiment Analysis in Portuguese
                                Dang Van Thin1,2,∗ , Nguyen Tuan Kiet1,2 , Duong Ngoc Hao1,2 and
                                Ngan Luu-Thuy Nguyen1,2
                                1
                                  University of Information Technology-VNUHCM, Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City,
                                Vietnam
                                2
                                  Vietnam National University, Ho Chi Minh City, Vietnam


                                            Abstract
                                            This paper presents our solutions for Aspect Term Extraction (ATE) and Sentiment Orientation Extraction
                                            (SOE), the core tasks of the ABSAPT 2024 shared task. We investigate both classification-based and
                                            generation-based approaches using different pre-trained language models on the provided TripAdvisor
                                            review dataset for these two sub-tasks. Our system achieved the top ranking (1st place) for Task 1-
                                            Aspect Term Extraction and a top 3 ranking for Task 2 - Sentiment Orientation Extraction. Additionally,
                                            our work showcases the performance of these models for both tasks on Portuguese reviews, contributing
                                            valuable insights for further research in this area.

                                            Keywords
                                            Aspect-based Sentiment Analysis, Portuguese language, Classification-based approach, Generative-based
                                            approach,


                                1. Introduction
                                Aspect-based Sentiment Analysis (ABSA) is a fine-grained approach to sentiment analysis that
                                addresses the limitations of traditional sentiment analysis, in which people’s comments on
                                specific aspects in reviews are overlooked.
                                   Inspired by similar competitions such as SemEval [1, 2, 3] and EVALITA [4], the Aspect-Based
                                Sentiment Analysis in Portuguese (ABSAPT) 2024 [5] at IberLEF 2024 [6] proposes to create an
                                Aspect-Based Sentiment Analysis for TripAdvisor reviews written in Portuguese. This shared
                                task involves two subtasks:

                                     • Aspect Term Extraction (ATE): give a set of reviews and identify all the aspects
                                       discussed within them.


                                IberLEF 2024, September 2024, Valladolid, Spain
                                ∗
                                    Corresponding author.
                                Envelope-Open thindv@uit.edu.vn (D. V. Thin); 21521042@gm.uit.edu.vn (N. T. Kiet); haodn@uit.edu.vn (D. N. Hao);
                                ngannlt@uit.edu.vn (N. L. Nguyen)
                                GLOBE https://nlp.uit.edu.vn/ (D. V. Thin); https://nlp.uit.edu.vn/ (N. L. Nguyen)
                                Orcid 0000-0001-8340-1405 (D. V. Thin); 0000-0002-3136-0661 (D. N. Hao); 0000-0003-3931-849X (N. L. Nguyen)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
    • Sentiment Orientation Extraction (SOE): give a set of reviews and pre-identified
      aspects, and determine the sentiment (positive, neutral, negative) expressed towards each
      mentioned aspect.

   The two subtasks of ABSA, Aspect Term Extraction (ATE) and Sentiment Orientation Ex-
traction (SOE), are indeed interrelated and build upon each other. ATE acts as the foundation,
identifying the specific aspects mentioned in the review. SOE then analyzes the sentiment
expressed towards those identified aspects.
   For instance, a review might mention ”the food was delicious” and ”the service was slow.” ATE
would identify ”food” and ”service” as aspects. SOE would then analyze the surrounding text to
determine the sentiment towards each aspect - positive for ”food” and negative for ”service.”
   These subtasks are crucial for various applications. In the context of online reviews, they
can help businesses pinpoint areas for improvement and focus customer satisfaction efforts on
specific aspects that matter most to their customers. Furthermore, ABSA finds applications in
social media analysis, product recommendation systems, and any domain where understanding
user opinions on specific aspects is valuable.
   By improving ABSA systems for Portuguese reviews, we can gain valuable insights from
customer feedback on various platforms like TripAdvisor. This can benefit businesses by helping
them understand customer satisfaction across different aspects of their services. Additionally,
improved ABSA can be used for market research, product development, and enhancing customer
service strategies.
   This shared task encourages participants to develop and evaluate ABSA methods for Por-
tuguese reviews, ultimately contributing to an enhanced understanding of how opinions are
expressed in Portuguese.


2. Related work
Several recent approaches have addressed the challenges of Aspect Term Extraction (ATE)
and Sentiment Orientation Extraction (SOE) using various techniques. Here, we explore some
relevant works that participated in a similar shared task.
   Team Deep Learning Brasil [7] leveraged a BERT-based model for the ATE task, while SOE
was addressed as a sentence pair classification task. Another team, NILC [8], converted the data
from document-level to sentence-level and utilized Conditional Random Fields to extract the
aspect terms in the review. Instead of training supervised learning models, [9] adopted a simpler
strategy relying on string matching techniques for Task 1. They also fine-tuned a BERT-base
model for sequence classification to identify the sentiment for an aspect term. Similarly, the
authors in [10] employed a BERT-base model fine-tuned for sequence classification to classify
the sentiment polarity for aspect terms. Another work, [11], focused on enriching features
through POS tagging, dependency parsing, and lemmatization to extract aspects. For SOE, they
implemented a two-stage process: first extracting meaningful surrounding text for each aspect
and then generating the sentiment polarity for each extracted aspect.
   Beyond these works, several other approaches have explored different methodologies, includ-
ing hybrid models and techniques for optimizing resource usage. [12] proposed a hybrid model
combining rule-based and machine-learning methods for Aspect Term Extraction (ATE) and
Sentiment Orientation Extraction (SOE). They used predefined rules based on POS tagging to
identify potential aspects, filtering out incorrect ones with a classifier, and employed Gradient
Boosting with 800 Decision Trees for sentiment analysis. Meanwhile, [13] mixed transfer learn-
ing, zero-shot learning, and ONNX optimization to leverage the BERT-based Alberto [14] model
efficiently. For ATE and SOE, they fine-tuned the model using Ktrain, and for sentiment analysis,
they used zero-shot learning with AlBERTo embeddings and a BiLSTM classifier, achieving
high inference speed with minimal CPU usage.


3. Methodology
3.1. Classification-based approach
For the classification-based approach, we utilize the fine-tuning pre-trained BERT-based lan-
guage model method to address two sub-tasks. For the Aspect Term Extraction task, we treat
it as the token-classification task and employ the BIO tagging scheme to represent the aspect
terms in the review. Let X = {𝑥1 , 𝑥2 , ..., 𝑥𝑛 } is the input review with 𝑛 words. The purpose of this
task is to predict a label sequence Y = {𝑦1 , 𝑦2 , ..., 𝑦𝑛 } where each 𝑦𝑖 𝜖{𝐵 − 𝐴𝑠𝑝𝑒𝑐𝑡, 𝐼 − 𝐴𝑠𝑝𝑒𝑐𝑡, 𝑂}
denotes the BIO tag for a token 𝑥𝑖 . 𝐵 − 𝐴𝑠𝑝𝑒𝑐𝑡 indicates the beginning of a multi-word aspect
term, 𝐼 − 𝐴𝑠𝑝𝑒𝑐𝑡 represents continuing words within the term, and 𝑂 indicates the words outside
any aspect term. We can formulate the Sentiment Orientation Extraction task as a sequence
classification task. The model takes two inputs: the review and the aspect term. These are
concatenated into a single sequence using special tokens, including [CLS] and [SEP]. The final
representation becomes [CLS] review [SEP] aspect term [SEP]. The output is a one-hot vector
with three dimensions corresponding to “negative”, “neutral”, and “positive” sentiment. Our
purpose in this approach is to utilize the power of pre-trained BERT-based language models for
the Portuguese language; therefore, we investigate different models as below:
    • BERTimbau [15]: a pre-trained BERT model for Brazilian Portuguese that achieves state-
      of-the-art performances on three downstream NLP tasks: Named Entity Recognition,
      Sentence Textual Similarity and Recognizing Textual Entailment.
    • mDeBERTa_v3 [16]: is a multilingual version of DeBERTa architecture and was trained
      with CC100 multilingual data.
    • mBERT [17]: a BERT model pre-trained on the top 104 languages with the largest
      Wikipedia using a masked language modelling (MLM) objective.
    • XLM-R [18]: This is a multilingual model pre-trained on 2.5TB text corpora containing
      100 languages, including Portuguese.
    • InfoXLM [19]: an XLM-RoBERTa model focuses on maximizing the mutual information
      between text data in different languages and at various granularities.
    • XLM-ALign [20]: an XLM-RoBERTa model specifically targets word-level alignment
      between parallel text corpora.

3.2. Generation-based approach
Instead of applying the classification-based approach, we also implement the generation-based
approach to extract the aspect terms and sentiment polarity for a given aspect. Inspired by the
previous works [21, 22, 23], we consider two sub-tasks as a conditional text generation task
by utilizing the power of pre-trained generative language models. To do that, we transform
the labels into a natural language string and fine-tune the encoder-decoder architecture for
two sub-tasks. For Task 1 - Aspect Term Extraction, we use a special [sep] word to separate
each aspect term mentioned in the review. For Task 2 - Sentiment Orientation Extraction, we
convert the numeric sentiment classes (-1,0,1) as the corresponding sentiment polarity words
(negativa, neutra, positiva) in Portuguese.
   In order to train the models, we implement two different fine-tuning strategies, including
the single-task and multi-task approach. For the single task, we train the models for each
task separately. In contrast, we merge and train two sub-tasks simultaneously by adding the
instruction prompt to distinguish two sub-tasks for the multi-task strategy. In this work, we
employ the pre-trained language generative models mT5 [24] and mT0 [25] with two versions
(base and large). This is a multilingual language model and supports the Portuguese language.

    • mT5 [24]: is pre-trained on a large multilingual dataset covering many languages. The
      model is trained using a unified text-to-text approach, where NLP tasks are framed as
      generating text from input text. This design choice simplifies using the model across
      different tasks and languages, providing a consistent methodology for various applications.
    • mT0 [25]: This is a multi-task prompted fine-tuning variant of mT5 (as cited in [24]) on
      various NLP tasks. The mT0 were trained on 30 new multilingual datasets, including one
      for sentiment analysis. Portuguese was one of the top 3 most-represented languages in
      the training set.


4. Experimental Setup
4.1. Data and Evaluation Metrics
Since the shared task only approves a final submission for both sub-tasks, we split the official
training dataset into a new training set and a validation set with an 8:2 ratio for the development
phase. For final submission, we will still train the best models on the official training set provided
by the organizers. The official dataset is collected from the TripAdvisor reviews written in
Portuguese, with 4,828 samples from 1,320 reviews for the train set; the test set comprises
283 unique review samples (task 1) and 1,176 samples from 282 reviews (task 2). As shown in
Table 1, two sub-tasks have an imbalance problem. Our analysis revealed a class imbalance
issue within the training dataset. Specifically, 69.28% of the aspect terms belong to the top 10
most frequent categories. This deviation is more imbalanced for the top 15 (78.66%) and top 20
(84.17%) aspects. These findings suggest potential challenges during model training. The model
might prioritize learning frequently occurring aspects due to their over-representation, leading
to a performance decline in identifying less frequent, yet potentially informative, aspects.
   To evaluate our models’ performance and address the class imbalance issue to some extent, we
adopted different metrics for each sub-task. For Aspect Term Extraction (ATE), we employed the
shared task metric of Accuracy along with Precision, Recall, and F1-score. These metrics provide
a comprehensive view of the model’s ability to identify both frequent and less frequent aspects
in the reviews. In Sentiment Orientation Extraction (SOE), we utilized Balanced Accuracy
Table 1
Distribution of Top 20 most frequent aspect terms and sentiment in the training set.
                                 Polarity                                                    Polarity
 ID   Aspect Term                                      Total   ID   Aspect Term                                    Total
                      negative    neutral   positive                              negative    neutral   positive
 1    hotel             116         135       651      902     11   limpeza          31          4        73       108
 2    quarto            177          62       423      662     12   internet         28          9        54       91
 3    localização        25          3        466      494     13   elevador         62         11        15       88
 4    café da manhã      67          38       241      346     14   rua              17         32        36       85
 5    atendimento        30          4        194      228     15   chuveiro         43          5        33       81
 6    funcionários       18          2        140      160     16   cidade            1         50        21       72
 7    preço              28          10       121      159     17   apartamento       7         14        42       63
 8    recepção           30          12       91       133     18   lojas             1         14        33       48
 9    serviço            27          17       87       131     19   aeroporto         0         44         4       48
 10   cama               11          7        112      130     20   cassino           4         11        20       35


Table 2
Examples predicted by classification-based models
  Predicted Aspects                                                               True Aspects
  [’hotel’, ’lojas’, ’rua’, ’quartos’, ’elevador’]                                [’rua’]
  [’funcionários’, ’quarto’, ’café da manhã’, ’padrão’, ’localização’]            [’café da manhã’, ’localização’]


(the shared task metric) along with Accuracy, F1-micro, F1-macro, and F1-weighted. This
combination assesses the model’s performance across all sentiment classes, considering both
overall accuracy and the balance between classes.

4.2. System Settings
In our classification approach, we experimented with all the BERT-based models mentioned
above offered by HuggingFace, including both base and large versions. We limited the maximum
input sequence length to 512 tokens. The learning rate was set at a low value of 3e-5, and we
trained the models for 10 epochs. The batch size was chosen based on available resources and
the specific model size. To optimize training time, we employed an Early Stopping Callback
that automatically halts training when performance improvement plateaus. All experiments for
this approach were conducted using NVIDIA P100 GPUs.
   For the generation-based approach, we employ the mT5 language models with the base1 and
large2 version downloaded directly from HuggingFace Hub. The maximum input and output
length is set as 700 and 128 tokens. The learning rate is set to 3e-4, and the models will be
trained for 20 epochs. We use the beam search as 5 to generate the target sequence. The batch
size value is found automatically depending on the resources and size of the model. All the
experiments were trained on the NVIDIA A100 with 80GB GPU.
Table 3
Results on the development set for Task 1: Aspect Term Extraction
                                                                    Evaluation Metrics
    Approach                  Model            Version
                                                         Accuracy    Precision Recall    F1-score
                                                base       92.57       77.72    92.57      84.50
                              BERTimbau
                                                large      92.57       79.84    92.57      85.74
                              mBERT             base       94.21       75.56    94.21      83.86
                              XLM-Align         base       93.77       75.88    93.77      83.88
    Classification-based                        base       95.30       76.04    95.30      84.59
                              Info-XLM
                                                large      93.66       78.78    93.66      85.58
                              mDeBERTa_v3       base       93.32       77.54    93.32      84.70
                                                base       94.75       76.14    94.75      84.43
                              XLM-R
                                                large      96.17       76.40    96.17      85.16
                                                base       80.45       79.85    80.45      80.15
                              single mT5
                                                large      81.95       80.40    81.95      81.17
                                                base       80.66       78.31    80.66      79.47
                              single mT0
                                                large      84.31       79.29    84.31      81.73
    Generation-based
                                                base       80.77       78.99    80.77      79.87
                              multi-task mT5
                                                large      80.67       79.89    80.67      80.28
                                                base       80.88       80.53    80.88      80.71
                              multi-task mT0
                                                large      82.06       82.25    82.06      82.15


5. Main results
In Table 3 and Table 4, we present the performance of two approaches with different models on
the development set for Task 1 and Task 2, respectively.
   For ATE, classification-based approaches consistently outperformed generation-based ap-
proaches. This suggests that directly classifying aspects within the review text might be more
effective for this task compared to attempting to generate them from scratch. It’s possible
that the complexity of aspect term variations and their dependencies on context are not fully
captured by the generation process in this setting. Interestingly, the gap in performance between
classification and generation-based approaches was narrower for metrics like precision and
F1-score compared to accuracy. This might be because classification models tend to detect
overly inclusive aspects, including all potential aspects the model identifies during evaluation,
as shown in Table 2. While this can inflate accuracy, it might not reflect a true understanding
of the most relevant aspects of the review.
   For SOE, the results revealed that generation-based approaches achieved competitive perfor-
mance, even surpassing most classification-based models in several metrics. This suggests that
these approaches can effectively capture sentiment orientation in Portuguese reviews.
   While classification-based models achieved high overall accuracy, they might exhibit limita-
tions in capturing the nuances of sentiment compared to generation-based approaches. This is
supported by potential biases towards certain sentiment classes observed in some classification
models.
1
    https://huggingface.co/google/mt5-base
2
    https://huggingface.co/google/mt5-large
Table 4
Results on the development set for Task 2: Sentiment Orientation Extraction
                                                                         Evaluation Metrics
 Approach               Model            Version
                                                   Accuracy   Micro F1    Macro F1 Weighted F1    Balanced Acc
                        BERTimbau         base       69.56     69.56        52.78        67.06        53.48
                        mBERT             base       73.18     73.18        50.21        67.60        54.81
                        XLM-Align         base       73.91     73.91        57.65        71.09        56.71
 Classification-based
                        Info-XLM          base       73.80     73.80        49.98        67.64        54.69
                        mDeBERTa_v3       base       81.57     81.57        76.27        82.14        78.24
                        XLM-R             base       71.84     71.84        54.14        68.97        55.27
                                          base       70.08     70.08        47.53        65.07        50.39
                        single mT5
                                          large      69.15     69.15        53.41        67.30        53.48
                                          base       81.25     81.25        73.64        80.73        73.42
                        single mT0
                                          large      82.40     82.40        74.99        81.90        74.21
 Generation-based
                                          base       80.23     80.23        72.60        79.90        71.87
                        multi-task mT5
                                          large      81.37     81.37        73.57        80.85        72.65
                                          base       81.99     81.99        74.42        81.50        73.88
                        multi-task mT0
                                          large      82.92     82.92        75.67        82.60        75.00


Table 5
The official ranking of our system for two sub-tasks.
                                 Task 1                                           Task 2
 Team
                Precision     Recall F1-score      Top   Balanced Acc      Precision Recall     F1-score   Top
 Emerson           6.57        6.68     6.68        2        78.40           76.80      78.40     77.47     1
 TeamUFPR            -           -        -         -        65.19           65.34      65.19     65.34     2
 Ours             85.52        73.04    63.73       1        57.13           56.83      56.93     56.83     3


   To further analyze the performance of our models, we also draw confusion matrices of the
best models from each approach for Task 2’s results presented in Figure 1. These confusion
matrices provide insightful comparisons of their performance across negative, neutral, and
positive sentiment categories. The generation-based approach demonstrates a high precision in
classifying positive sentiments (0.92), while its performance for neutral (0.54) and negative (0.79)
sentiments is comparatively moderate. Misclassification is most evident in neutral sentiments
being predicted as positive (0.32). In contrast, the classification-based approach exhibits a more
balanced performance with a precision of 0.86 for negative sentiments, 0.64 for neutral, and 0.85
for positive sentiments. Despite this balance, there is still a significant misclassification rate
between neutral and positive sentiments. These results indicate that while the generation-based
model excels in identifying positive sentiments, the classification-based model offers more
consistent performance across all categories, highlighting the strengths and weaknesses of each
approach in sentiment orientation extraction.
   Table 5 presents the official ranking results of Task 1 - ATE and Task 2 - SOE. Our team
secured first place in Task 1, achieving outstanding metrics with a precision of 85.52, recall of
73.04, and an F1-score of 63.73, surpassing all other competitors. For Task 2, our results in Task
2 include a balanced accuracy of 57.13, precision of 56.83%, recall of 56.93%, and an F1-score
of 56.83%. While our system achieved impressive results in the shared task, a more nuanced
analysis is necessary, given the limited number of participating teams. In Task 1, our system
outperformed the only other competitor, demonstrating superior performance. However, the
presence of only two teams makes it challenging to assert the overall efficacy and robustness of
Figure 1: Confusion Matrices of the Best Models from each approach on Task 2: Sentiment Orientation
Extraction


our approach, as the competition was not extensive. Similarly, in Task 2, the small number of
participants restricts the breadth of comparative analysis. Therefore, while our results indicate a
strong performance relative to the available competitors, further validation against a larger and
more diverse set of systems would be necessary to demonstrate the superiority of our approach
conclusively.


6. Conclusion and Future Work
In this work, we explored two approaches for aspect term extraction and sentiment orientation
extraction in the ABSAPT 2024 shared task. We fine-tuned various BERT-based architectures
(BERT, XLM-RoBERTa, RoBERTa, DeBERTa) for a classification-based approach. Additionally,
we investigated the effectiveness of generative models (mT5, mT0) with single-task and multi-
task strategies for a generation-based approach. Our team achieved notable results in the
competition, securing first place in Task 1 and reaching the top 3 in Task 2. These rankings
underscore the strength and versatility of our approaches, although further validation is needed
against a larger and more diverse set of systems. Future efforts can investigate ensemble methods
combining classification and generation approaches and develop techniques for solving class
imbalance issues. Additionally, future work should address capturing implicit aspects and
developing explainable models to build trust and enable error analysis.


Acknowledgments
This research is funded by Vietnam National University HoChiMinh City (VNU-HCM) un-
der grant number C2024-26-02. Dang Van Thin was funded by the Master, PhD Scholarship
Programme of Vingroup Innovation Foundation (VINIF), code VINIF.2023.TS117.
References
 [1] I. Pavlopoulos, Aspect based sentiment analysis, Athens University of Economics and
     Business (2014).
 [2] M. Pontiki, D. Galanis, H. Papageorgiou, S. Manandhar, I. Androutsopoulos, SemEval-2015
     task 12: Aspect based sentiment analysis, in: P. Nakov, T. Zesch, D. Cer, D. Jurgens (Eds.),
     Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015),
     Association for Computational Linguistics, Denver, Colorado, 2015, pp. 486–495.
 [3] M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, S. Manandhar, M. AL-Smadi,
     M. Al-Ayyoub, Y. Zhao, B. Qin, O. De Clercq, et al., Semeval-2016 task 5: Aspect based
     sentiment analysis, in: ProWorkshop on Semantic Evaluation (SemEval-2016), Association
     for Computational Linguistics, 2016, pp. 19–30.
 [4] L. De Mattei, G. De Martino, A. Iovine, A. Miaschi, M. Polignano, G. Rambelli, et al., Ate
     absita@ evalita2020: Overview of the aspect term extraction and aspect-based sentiment
     analysis task, in: CEUR WORKSHOP PROCEEDINGS, volume 2765, CEUR-WS, 2020, pp.
     67–74.
 [5] A. G. Gabriel, T. B. Alexandre, P. L. Emerson, A. d. F. Larissa, B. C. Ulisses, Overview
     of absapt at iberlef 2024: Overview of the task on aspect-based sentiment analysis in
     portuguese, Procesamiento del Lenguaje Natural 73 (2024).
 [6] L. Chiruzzo, S. M. Jiménez-Zafra, F. Rangel, Overview of IberLEF 2024: Natural Language
     Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the
     Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference
     of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org,
     2024.
 [7] J. R. S. Gomes, E. A. S. Garcia, A. F. B. Junior, R. C. Rodrigues, D. F. C. Silva, D. F. Maia,
     N. F. F. da Silva, A. d. S. Soares, et al., Deep learning brasil at absapt 2022: Portuguese
     transformer ensemble approaches, arXiv preprint arXiv:2311.05051 (2023).
 [8] M. T. Machado, T. A. S. Pardo, Nilc at absapt 2022: Aspect extraction for portuguese, in:
     Proceedings, 2022.
 [9] F. A. R. Neto, R. F. de Sousa, R. L. de Sales Santos, R. T. Anchiêta, R. S. Moura, Team piln at
     absapt 2022: Lexical and bert strategies for aspect-based sentiment analysis in portuguese.
     (2022).
[10] T. Heinrich, F. Marchi, Teamufpr at absapt 2022: Aspect extraction with crf and bert.
     (2022).
[11] F. M. Assi, G. B. Candido, L. N. dos Santos Silva, D. F. Silva, H. de Medeiros Caseli, Ufscar’s
     team at absapt 2022: Using syntax, semantics and context for solving the tasks. (2022).
[12] A. S. F. Mele, G. Vettigli, Sentna@ ate absita: Sentiment analysis of customer reviews using
     boosted trees with lexical and lexicon-based features, Proceedings of the 7th evaluation
     campaign of Natural Language Processing and Speech tools for Italian (EVALITA 2020),
     Online. CEUR. org (2020).
[13] M. Bennici, ghostwriter19@ ate_absita: Zero-shot and onnx to speed up bert on sentiment
     analysis tasks at evalita 2020, EVALITA Evaluation of NLP and Speech Tools for Italian-
     December 17th, 2020 (2020) 80.
[14] M. Polignano, P. Basile, M. De Gemmis, G. Semeraro, V. Basile, et al., Alberto: Italian
     bert language understanding model for nlp challenging tasks based on tweets, in: CEUR
     workshop proceedings, volume 2481, CEUR, 2019, pp. 1–6.
[15] F. Souza, R. Nogueira, R. Lotufo, BERTimbau: pretrained BERT models for Brazilian
     Portuguese, in: 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do
     Sul, Brazil, October 20-23 (to appear), 2020.
[16] P. He, J. Gao, W. Chen, Debertav3: Improving deberta using electra-style pre-training with
     gradient-disentangled embedding sharing, 2021. arXiv:2111.09543 .
[17] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional
     transformers for language understanding, in: J. Burstein, C. Doran, T. Solorio (Eds.),
     Proceedings of the 2019 Conference of the North American Chapter of the Association for
     Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short
     Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp.
     4171–4186.
[18] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave,
     M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at
     scale, in: D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual
     Meeting of the Association for Computational Linguistics, Association for Computational
     Linguistics, Online, 2020, pp. 8440–8451.
[19] Z. Chi, L. Dong, F. Wei, N. Yang, S. Singhal, W. Wang, X. Song, X.-L. Mao, H. Huang,
     M. Zhou, InfoXLM: An information-theoretic framework for cross-lingual language model
     pre-training, in: Proceedings of the 2021 Conference of the North American Chapter of the
     Association for Computational Linguistics: Human Language Technologies, Association
     for Computational Linguistics, Online, 2021, pp. 3576–3588.
[20] Z. Chi, L. Dong, B. Zheng, S. Huang, X.-L. Mao, H. Huang, F. Wei, Improving pretrained
     cross-lingual language models via self-labeled word alignment, in: Proceedings of the
     59th Annual Meeting of the Association for Computational Linguistics and the 11th
     International Joint Conference on Natural Language Processing (Volume 1: Long Papers),
     Association for Computational Linguistics, Online, 2021, pp. 3418–3430.
[21] S. U. S. Chebolu, F. Dernoncourt, N. Lipka, T. Solorio, Exploring conditional text generation
     for aspect-based sentiment analysis, in: K. Hu, J.-B. Kim, C. Zong, E. Chersoni (Eds.), Pro-
     ceedings of the 35th Pacific Asia Conference on Language, Information and Computation,
     Association for Computational Lingustics, Shanghai, China, 2021, pp. 119–129.
[22] H. Yan, J. Dai, T. Ji, X. Qiu, Z. Zhang, A unified generative framework for aspect-based
     sentiment analysis, in: C. Zong, F. Xia, W. Li, R. Navigli (Eds.), Proceedings of the 59th
     Annual Meeting of the Association for Computational Linguistics and the 11th International
     Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association
     for Computational Linguistics, Online, 2021, pp. 2416–2429.
[23] D. Van Thin, N. L. T. Nguyen, Aspect-category based sentiment analysis with unified
     sequence-to-sequence transfer transformers, VNU Journal of Science: Computer Science
     and Communication Engineering 39 (2023).
[24] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, C. Raffel,
     mT5: A massively multilingual pre-trained text-to-text transformer, in: K. Toutanova,
     A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell,
     T. Chakraborty, Y. Zhou (Eds.), Proceedings of the 2021 Conference of the North American
     Chapter of the Association for Computational Linguistics: Human Language Technologies,
     Association for Computational Linguistics, Online, 2021, pp. 483–498.
[25] N. Muennighoff, T. Wang, L. Sutawika, A. Roberts, S. Biderman, T. Le Scao, M. S. Bari,
     S. Shen, Z. X. Yong, H. Schoelkopf, X. Tang, D. Radev, A. F. Aji, K. Almubarak, S. Albanie,
     Z. Alyafeai, A. Webson, E. Raff, C. Raffel, Crosslingual generalization through multitask
     finetuning, in: A. Rogers, J. Boyd-Graber, N. Okazaki (Eds.), Proceedings of the 61st
     Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
     Association for Computational Linguistics, Toronto, Canada, 2023, pp. 15991–16111.