<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Jaén, Spain
* Corresponding author.
$ pedro.sanchez@uct.cl (P. Mirabal); s.hernandez@udc.es (S. Hernández-Alvarado); ji.abreu@ua.es (J. I. A. Salas)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Analyzing Sentiment, Attraction Type, and Country in Spanish Language TripAdvisor Reviews Using Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pedro Mirabal</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Suilen Hernández-Alvarado</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Ignacio Abreu Salas</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Database Laboratory, Universidade da Coruña.</institution>
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Departamento de Ingeniería Informática. Unversidad Católica de Temuco.</institution>
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>U.I. for Computer Research. University of Alicante.</institution>
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>This paper describes our participation in the Rest-Mex 2023 Sentiment Analysis Task. We proposed an ensemble of (i) a cascade of transformer-based two-class classifiers biased to lowering the Mean Average Error in Polarity, and (ii) multi-class transformer-based classifiers for the prediction of the Type and Location of the messages. Our system achieved a sentiment track score of 0.719. Sentiment Analysis, a subfield of Natural Language Processing (NLP), enables the examination of an individual's opinions towards various entities, including services and products, by categorizing them into distinct classes. These classes can encompass positive, negative, neutral, or even more nuanced gradations. This particular task has garnered significant interest due to the potential for stakeholders to utilize data obtained from social media platforms and specialized websites like Tripadvisor, empowering them to make informed decisions based on data-driven insights. Nonetheless, several challenges persist, such as the disparate availability of linguistic resources across diferent languages [1] To advance the field of Sentiment Analysis, several challenges have been established to foster research and development in this area. Notably, initiatives like SemEval, which commenced with its first edition in 2007, have played a crucial role in this regard. Additionally, other challenges such as IberLEF have contributed significantly to the advancement of Sentiment Analysis. More recently, a noteworthy challenge known as Rest-Mex has emerged, further enriching the landscape of sentiment analysis research and providing opportunities for researchers and practitioners to explore and address the complexities associated with sentiment classification</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sentiment Analysis</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Transformer Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>tasks [10, 11, 12, 13].</p>
      <p>
        In recent times, the field of Sentiment Analysis has witnessed notable advancements through
the utilization of Deep Learning techniques. A comprehensive exploration of this subject can be
found in a survey entitled "Deep learning for sentiment analysis: A survey" by Zhang et al. (2018)
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In a similar vein, several research teams have leveraged similar strategies and participated
in competitions pertaining to Sentiment Analysis, yielding commendable outcomes[
        <xref ref-type="bibr" rid="ref3">3, 4, 5</xref>
        ].
      </p>
      <p>It is worth noting the results obtained by the winners of the RestMex 2022 edition, where
the winning team [6] utilized a large collection of attributes computed with the UMUTextStats
tool combined with models based on BERT [16] and RoBERTa [17]. They achieved a score of
0.892, which was very close to the score obtained by the second team. In the case of the second
place [8], the most relevant part of their work is the meticulous preprocessing of the data they
carried out. They removed duplicate instances from the training dataset and translated any
opinion that was not in Spanish. If the translation was not possible, they discarded that instance.
The second-place team achieved their best result by using a variant of BERT, similar to the
ifrst-place team.</p>
      <p>In this paper, we further explored the proposal of a cascade of biased two-class classifiers for
predicting Polarity described in [9]. We ran a comparative study of diferent language models
in Spanish. We aim to provide a detailed account of our involvement in the Rest-Mex 2023
Sentiment Analysis Subtask [13] which primarily focuses on sentiment classification, wherein
the objective is to develop systems capable of accurately predicting the Polarity, the Location,
and the Type of attraction of tourist opinions concerning various locations in Mexico, Cuba,
and Colombia. The report is structured as follows. Section 2 describe the training dataset. In
section 3 we present details of our approach. Section 4 is devoted to the analysis of the results.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task and Data Description.</title>
      <p>&amp;ODVV 'LVWULEXWLRQ
N
N
N
N
N
N
N
N
3RODULW\
$WWUDFWLYH +RWHO 5HVWDXUDQW
7\SH
&amp;RORPELD &amp;XED
/RFDWLRQ
0H[LFR</p>
      <p>
        In this section, we describe the data provided by the organizers for this subtask and its
characterization. The corpus consists of 251.702 opinions. Each opinion is classified as an integer,
between [
        <xref ref-type="bibr" rid="ref1">1, 5</xref>
        ], where 1 represents the most negative polarity and 5 the most positive. For each
opinion, organizers also provided information about Location [Colombia, Cuba, Mexico] and
Type [Attractive, Hotel, Restaurant]. The organizers split the corpus 70% − 30% approximately.
70% of the data was delivered to the participants with complete information about each opinion,
and 30% was reserved for the final testing of competing models.
      </p>
      <p>Figure 1 shows the class distribution for Polarity, Type and Location. Analyzing the
representation of each of the classes, we detected that they had a high level of imbalance, with
class 5 as the majority class, with a total of 157, 095 instances, representing 62.41% of the
total, a great contrast with the class 1, for which only 5, 772 instances were provided, for the
2.21%. The presence of the rest of the classes is as follows: 60, 227 instances for class 4, 21, 656
instances for class 3 and 6, 952 instances for class 2, each representing 30.71%, 13.21% and
2.98% respectively. On the other hand, regarding the Location, the distribution is as follows:
66, 703 represents Colombia, 66, 223 represents Cuba, and 118, 776 for Mexico. While the data
imbalance in terms of location is not as significant as in the case of polarity, there is still an
over-representation of instances from Mexico. Finally, in terms of attraction type, there are
76, 042 instances categorized as Hotel, 64, 472 instances categorized as Restaurant, and the
remaining 111, 188 instances categorized as Attractive.</p>
    </sec>
    <sec id="sec-3">
      <title>3. System Description.</title>
      <p>Our approach (https://github.com/joseias/2023-rest-mex) implements an ensemble of three
classifiers for the multi-label classification problem posed by the challenge. We have separated
classifiers for the Polarity, the Type, and the Location of the review.</p>
      <p>For Type and Location, we use the architecture depicted in 2, as implemented by the
[Bert|Roberta]ForSequenceClassification modules in Huggin Face (https://huggingface.co). This
architecture has been used for the task by other authors such as [7] and [6].</p>
      <p>In the case of the Polarity, our approach further studies the proposal of the cascade of biased
classifiers described in [ 9]. The architecture is described in 2b. The main hypothesis is building
classifiers to learn the target class but with a bias towards the other classes with lower
missclassification costs. That is, in case of a miss-classification, the model is biased towards other
of the better options. For example, if the target is of polarity of 1, the error with lower cost in
terms of Mean Square Error (MSE) is to assign a polarity of 2. In the next section, we delve into
the details of this approach.
3.1. Cascade of Biased Two-class Classifiers.</p>
      <p>As was commented before, for Polarity we leveraged the second proposal of [9], an ensemble of
biased two-class classifiers arranged as shown in Fig. 2b. The ensemble is comprised of four
classifiers. The classifier at each stage is trained to separate the instances from the original
5 possible polarities into two categories, (i) the target for the stage , let us call it , (ii) the
others classes . Both,  and  are defined as biasing the classifier towards classes with low
miss-classification costs with respect to MSE.
Classifier
Dense
Dropout
Dense
encoded inp
ut
Feature Extractor</p>
      <p>Transformer</p>
      <p>Text
(a) BERT-based multi-class.</p>
      <p>Text</p>
      <p>Stage 2
Binary
Classifier
Stage 1 Category
Binary stage 1
Classifier</p>
      <p>Text (category stage 1) c</p>
      <p>Category
stage 2
Stage 4
Binary
Classifier</p>
      <p>Text (category stage 2) c</p>
      <p>Category
stage 3
Stage 3
Binary
Classifier</p>
      <p>Text (category stage 3) c</p>
      <p>Category 
stage 4
(Category stage 4) c
(b) Cascade classifier.</p>
      <p>For example, for the binary classifier at stage 1, 1 = {1} and 1 = {2, 3, 4, 5}. This classifier
is learning to tear apart instances with a Polarity of 1 and instances with other polarities. For
stage 2, the sets are defined as 2 = {1, 2} and 2 = {3, 4, 5}. This classifier is biased to
classify as 2 the instances with a Polarity of 1. At the inference step, this classifier is used after
the classifier at stage 1. Thus, if the later model miss-classifies a true 1 example, the hypothesis
is that the model at stage 2 will classify this instance as 2, contributing to minimizing MSE. The
values of  and  are set in a similar fashion for stage 3. And, in the step 4 they are 4 = {5}
and 4 = {1, 2, 3, 4}. It is worth noting that no model for Polarity of 4 is trained. This class is
inferred by the other models.</p>
      <p>To infer the class of an instance, among the original 5 categories, the example is sent to the
model at stage 1. If classified as 1 we assign Polarity of 1. In case of being classified as 1, the
example is sent to the model at step 2 following the same procedure. If finally the instance is
classified as 4, i.e. it reached the last stage, we assign Polarity of 4.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results.</title>
      <p>As a contribution over [9], in this work we aim to compare the performance of diferent language
models for Spanish when fine-tuned for the task. It is worth noticing that the models we studied
are not the only available. We prioritized them considering they are well documented, they
used diferent corpus for pre-training, are BERT or RoBERTa based models, and also the time
we had to run experiments. In all cases, we choose the base cased version of the model. We
trained cascade-based models for Polarity, as well multi-class models for Type, Location and
also for Polarity.</p>
      <p>In our experiments, we use as input the concatenation of the title and the content of the
review, together with the separator token. Training set was split stratified in sets for 
(90%, development () (10%) and  (10%). We used the  and  sets to fine-tune
each model. The best checkpoint was the one with higher balance accuracy, and in cases of
very small diferences (less than 0.001), the lower loss in  was the selection criteria.</p>
      <p>Table 1 summarizes some details about the models. For the fine-tuning, we set
hyperparameters such as learning rate, or weight decay as reported by the authors, except for the batch
size (32) and the number of training epochs (5). When to our knowledge the authors didn’t
report the values, defaults TensorFlow 2.11 were used. As a training algorithm, we use AdamW
with a linear learning rate decay scheduler. In evaluation time, we chose the best checkpoint
considering the balance accuracy metric since for Polarity the data set is highly unbalanced.</p>
      <p>Examining Table 1, we observed that most models degraded their performance after 1 or 2
epoch, except for BERTIN and RoBERTuito. This is a signal that they might need more
hyperparameter tuning, in particular BERTIN. The cascade models at stages 1, 2, and 4 achieved 0.00
recall, i.e. all instances were classified as class 3 or 5.</p>
      <p>Table 2 shows the Macro F1 for Polarity, Type, and Country for each classifier over the test
set we separated. From our experiments, we observed some interesting facts. It seems BERTIN
and RoBERTuito performance degrades notably in the presence of class imbalance, at least for
our experimental setup. This is not surprising in the case of RoBERTuito since the model was
trained only with Twitter data. However, the poor results from BERTIN in the cascade classifier
were not expected, deserving further studies. On the other hand, BETO and MarIA achieved
very discrete results, but in line with the challenge data, where our BETO model achieved 0.515</p>
      <p>For Type, all systems except RoBERTuito achieved results of about 0.99. Given the global
results of the task, it seems this problem is not as hard as the Polarity. The same conclusion
holds for the Country classification. It is worth noting that in this case, RoBERTuito managed
to achieve an F1 Macro of 0.93.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>In this paper, we have described the model proposed by UCT-UA in the Sentiment Analysis
subtask at Rest-Mex 2023. Also, we studied the performance of diferent language models in
Spanish. The study is in a very preliminary stage, thus is better to interpret observed results
with caution.</p>
      <p>The results in our primary submission were obtained from the model described in the Results
section as BETO, this model ranked 5ℎ out of 17 submissions, achieving 0.719 for the Sentiment
Track Score. Globally, it seems the cascade strategy designed to lower MAE does not perform
well with the Sentiment Track Score, at least with our experimental setup and the degree of
imbalance of the data. This is consistent with the results of [7] and [9] where fine-tuned BERT
model achieved better results than the cascade-based approach.</p>
      <p>As future work, we are interested in further studying the performance of the diferent models,
to draw more sound conclusions. In particular, those trained with general Spanish corpus, or
others from specialized domains such as review data. Also, it would be interesting to evaluate
multilingual models.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>This research has been funded by: the Generalitat Valenciana (Conselleria d’Educació,
Investigació, Cultura i Esport), through the project NL4DISMIS: Natural Language Technologies for
Dealing with dis- and misinformation (CIPROM/2021/021); MCIN/AEI/10.13039/501100011033
and by the European Union NextGenerationEU/ PRTR through the project ClearText &lt;
 2021 − 130707 − 00 &gt;.
[4] Pastorini, M., Pereira, M., Zeballos, N., Chiruzzo, L., Rosá, A. and Etcheverry, M.:
RETUYTInCo at TASS 2019: Sentiment Analysis in Spanish Tweets. In: Proc. of IberLEF@ SEPLN,
(2019)
[5] González, J., Pla, F. and Hurtado, L.: ELiRF-UPV at SemEval-2017 Task 4: sentiment
analysis using deep learning. In: Proceedings of the 11th international workshop on
semantic evaluation SemEval-2017, (2017)
[6] García-Díaz, J., Rodríguez-García, M., García-Sánchez, F. and Valencia-García, R.:
UMUTeam at REST-MEX 2022: Polarity Prediction using Knowledge Integration of
Linguistic Features and Sentence Embeddings based on Transformers. (2022)
[7] Vásquez, J. and Gómez-Adorno, H. and Bel-Enguix, G..: Bert-based Approach for Sentiment</p>
      <p>Analysis of Spanish Reviews from TripAdvisor, pages 165–170, (2021).
[8] Ramírez, S., Daniel, D. and Bedmar, I.: Recommendation System Rest-Mex 2022 for Mexican</p>
      <p>Tourism Using Natural Language Processing. 2022
[9] Abreu, J., Mirabal, P., Ballester-Espinosa, A.: Cascade of Biased Two-class Classifiers
for Multi-class Sentiment Analysis. on Proceedings of the Iberian Languages Evaluation
Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural
Language Processing (SEPLN 2021), XXXVII International Conference of the Spanish
Society for Natural Language Processing., Málaga, Spain, September 2021, CEUR Workshop
Proceedings, vol 2943, pages 185–191, 2021.
[10] Álvarez-Carmona, Miguel Á and Aranda, Ramón and Arce-Cárdenas, Samuel and
FajardoDelgado, Daniel and Guerrero-Rodríguez, Rafael and López-Monroy, A. Pastor and
Martínez-Miranda, Juan and Pérez-Espinosa, Humberto and Rodríguez-González, Ansel:
Overview of Rest-Mex at IberLEF 2021: Recommendation System for Text Mexican Tourism.</p>
      <p>Procesamiento del Lenguaje Natural, vol 67 (2021)
[11] Álvarez-Carmona, Miguel Á and Díaz-Pacheco, Ángel and Aranda, Ramón and
RodríguezGonzález, Ansel Y and Fajardo-Delgado, Daniel and Guerrero-Rodríguez, Rafael and
BustioMartínez, Lázaro; Overview of rest-mex at iberlef 2022: Recommendation system,
sentiment analysis and covid semaphore prediction for mexican tourist texts. Procesamiento
del Lenguaje Natural, vol 69 (2022)
[12] Álvarez-Carmona, Miguel Á and Aranda, Ramón and Guerrero-Rodríguez, Rafael and
Rodríguez-González, Ansel Y and López-Monroy, A Pastor; A Combination of Sentiment
Analysis Systems for the Study of Online Travel Reviews: Many Heads are Better than
One. Computación y Sistemas, vol 26, 2022.
[13] Álvarez-Carmona, Miguel Á and Díaz-Pacheco, Ángel and Aranda, Ramón and
RodríguezGonzález, Ansel Y and Bustio-Martínez, Lázaro and Muñiz-Sánchez, Victor and
PastorLópez, A Pastor and Sánchez-Vega, Fernando: Overview of Rest-Mex at IberLEF 2023:
Research on Sentiment Analysis Task for Mexican Tourist Texts. Procesamiento del Lenguaje
Natural, vol 71 (2023)
[14] Calvo, H., Gambino, O.: Cascading classifiers for Twitter sentiment analysis with emotion
lexicons. In: Proc. Int. Conf. on Intelligent Text Processing and Computational Linguistics,
pp. 270-280. (2016)
[15] Canete, J., Chaperon, G., Fuentes, R., Pérez, J.: Spanish pre-trained bert model and
evaluation data. In: Proc. of PML4DC at ICLR. (2020)
[16] Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional
transformers for language understanding. (2018)
[17] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., &amp; Stoyanov, V. RoBERTa: A Robustly</p>
      <p>Optimized BERT Pretraining Approach. (2019)
[18] Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y. and Potts, C.: Recursive
deep models for semantic compositionality over a sentiment treebank. In Proceedings of
the 2013 conference on empirical methods in natural language processing. pp. 1631-1642.
(2013)
[19] Gutiérrez-Fandiño, A., Armengol-Estapé, J., Pámies, M., Llop-Palao, J., Silveira-Ocampo, J.,
Carrino, C. P., &amp; Villegas, M. (2022). MarIA: Spanish Language Models. Procesamiento del
Lenguaje Natural, 68, pages 39-60, (2022)
[20] Cañete, J., Chaperon, G., Fuentes, R., Ho, J. H., Kang, H., &amp; Pérez, J. Spanish pre-trained
bert model and evaluation data. Pml4dc at iclr. (2020)
[21] De la Rosa, J., Ponferrada, E. G., Romero, M., Villegas, P., de Prado Salas, P. G., &amp; Grandury,
M. BERTIN: Eficient Pre-Training of a Spanish Language Model using Perplexity Sampling.</p>
      <p>Procesamiento del Lenguaje Natural, 68, 13-23. (2022)
[22] Pérez, J. M., Furman, D. A., Alemany, L. A., &amp; Luque, F. M. (2022, June). RoBERTuito: a
pretrained language model for social media text in Spanish. In Proceedings of the Thirteenth
Language Resources and Evaluation Conference (pp. 7235-7243).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Agüero-Torales</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abreu-Salas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>López-Herrera</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Deep learning and multilingual sentiment analysis on social media data: An overview</article-title>
          .
          <source>Applied Soft Computing</source>
          , vol
          <volume>107</volume>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Deep learning for sentiment analysis: A survey</article-title>
          .
          <source>In: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>
          . vol
          <volume>8</volume>
          , number 4 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>González</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurtado</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Pla</surname>
          </string-name>
          , F. :
          <string-name>
            <surname>ELiRF-UPV at</surname>
            <given-names>TASS</given-names>
          </string-name>
          2019:
          <article-title>Transformer Encoders for Twitter Sentiment Analysis in Spanish</article-title>
          .
          <source>In: Proc. of IberLEF@ SEPLN</source>
          , (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>