<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bert-based Approach for Sentiment Analysis of Spanish Reviews from TripAdvisor</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan Vasquez</string-name>
          <email>juanmvs@pm.me</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Helena Gomez-Adorno</string-name>
          <email>helena.gomez@iimas.unam.mx</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gemma Bel-Enguix</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto de Ingenier a, Universidad Nacional Autonoma de Mexico</institution>
          ,
          <addr-line>Mexico City</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Instituto de Investigaciones en Matematicas Aplicadas y en Sistemas, Universidad Nacional Autonoma de Mexico</institution>
          ,
          <addr-line>Mexico City</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents our approach to the Sentiment Analysis Task at REST-MEX 2021. The goal of the task is to predict the polarity of opinions on Mexican tourist sites. Recent advances in transfer-learning with pre-trained models in English have produced state-of-the-art results in sentiment analysis. In this work, we apply two Bert-based approaches for review classi cation in ve classes. Our rst approach consists of ne-tuning Beto, a Bert-like model pre-trained in Spanish. Our second approach focuses on combining Bert embeddings with the feature vectors weighted with TF-IDF. The results obtained using the standalone BERT model ranked rst in the task.</p>
      </abstract>
      <kwd-group>
        <kwd>Sentiment analysis</kwd>
        <kwd>Opinion mining</kwd>
        <kwd>Spanish language</kwd>
        <kwd>BERT</kwd>
        <kwd>Transfer learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Sentiment analysis is an area of research in Natural Language Processing (NLP)
whose goal is to extract a sentiment or emotion of a given opinion [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It is
considered a classi cation problem [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] where one sentiment is assigned to an
opinion. Recent advances in transfer learning have greatly improved the state
of the art in many NLP problems. Bert, for example, has achieved accuracy as
high as 89.7 in binary classi cation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        To encourage the research community in NLP to develop new research areas
in Spanish, the Iberian Language Evaluation Forum (IberLEF) organizes yearly
a comparative evaluation campaign [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In 2021, the REST-MEX task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
(Recommendation System for Text Mexican Tourism) was proposed, including two
sub-tasks. The rst one focused on a recommendation system for tourist sites
given a tourist's pro le and their personal preferences. The second task required
a sentiment analysis system able to classify an opinion about a Mexican tourist
place with a score between 1 and 5. Our team focused only on the second task.
It is worth noticing that our rst submitted run obtained the best results in the
second subtask.
      </p>
      <p>This paper is organized as follows: Section 2 states the task we worked on
and the data sets provided by the task organizers; Section 3 describes the two
systems that we designed and implemented; Section 4 details our experiments
and the obtained results; and Section 5 reports our conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Task and Data Description</title>
      <p>The sentiment analysis task required the prediction of a class for each review
provided in the evaluation set. The available classes were integers between 1 and
5. The reviews were taken from the website TripAdvisor and were written by a
tourist who evaluated a landmark in Guanajuato, Mexico. All of them were in
Spanish.</p>
      <p>
        The participants were provided with two di erent data sets; one for training
and one for evaluation. The training set was made up of 5197 rows and 9 columns
described as follows:
1. Index: The index of each opinion.
2. Title: The title that the tourist himself gave to his opinion.
3. Opinion: The opinion expressed by the tourist.
4. Place: The tourist place that the tourist visited and to which the opinion is
directed.
5. Gender: The gender of the tourist.
6. Age: The age of the tourist at the time of issuing the opinion.
7. Country: The Country of origin of the tourist.
8. Date: The date the opinion was issued.
9. Label: The label that represents the polarity of the opinion: [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1, 2, 3, 4, 5</xref>
        ].
      </p>
      <p>For our experiments, we only trained the models with the \Opinion" column.</p>
      <p>The classes in the training set were not balanced. The distribution can be
seen in Table 1</p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Approaches</title>
      <p>
        At 2020's edition of the task on Semantic Analysis at SEPLN, a Bert-like model
for sentiment analysis at three levels yielded the highest accuracy [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The
second best results were also obtained by applying a system based on Bert [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Even though these architectures were designed to classify tweets, their results
motivated us to work on Bert-based approaches.
      </p>
      <p>
        Another reason for working with Bert was that ne-tuning it for downstream
tasks, such as sentiment analysis, is computationally inexpensive [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This process
for sentiment analysis is made up of feeding Bert with task-speci c inputs and
passing the outputs through a classi cation layer. The results obtained in the
original Bert paper established a new state of the art in sentiment analysis at
three levels [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Because the reviews in the data sets were written in Spanish, we decided to
use Beto as our baseline model. This Bert-like system was pre-trained using a
corpus in Spanish with a similar size to that of the corpus used to train
BertBase [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Considering that Beto follows the same design principles as Bert, we
proceeded to execute a ne-tuning process for ve classes.
      </p>
      <p>
        Section 3.1 describes the approach for ne-tuning Beto, which ranked rst in
this Task. Section 3.2 outlines a second approach we took in hopes of improving
our results. In this system, we added a Bag-of-words feature vectors weighted
with TF-IDF, to the contextual embeddings generated by Beto after ne-tuning
it. The motivation behind this was that TF-IDF captures global information
from all the entries in the data set [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], while Bert only captures contextual
information from the attention mechanism [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>It is important to mention that, before proceeding with the ne-tuning, we
performed a data pre-processing step, which consisted of removing the quotation
marks in the reviews.</p>
      <p>The code for implementing the systems listed here can be found in our Github
repository.
3.1</p>
      <sec id="sec-3-1">
        <title>Fine-tuned Bert approach</title>
        <p>First, we passed the reviews through the Bert tokenizer (loaded with the weights
from Beto). Once the tokens were generated, we fed them to Beto with a
classi cation layer on top. This step executed the domain-speci c training (or
netuning). The next step was to repeat this same process on the o cial evaluation
set. Once we tokenized the reviews in this second set, we predicted the classes
by passing these tokens through the previously ne-tuned model.</p>
        <p>
          The hyperparameters used for the ne-tuning were the ones recommended
in the original Bert paper [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. These were:
{ Max length = 512
{ Batch size = 8
{ Optimizer = AdamW
{ Learning rate = 2e
{ Steps = 1e 8
{ Epochs = 4
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Fine-tuned Bert Approach with TF-IDF vectors</title>
        <p>We started by extracting the contextual embeddings generated after ne-tuning
Beto with the training set (we followed the same steps listed on Section 3.1).
Then, we obtained a new set of features by rst tokenizing the original
training set using Spacy's model \es dep news trf". Next, we calculated the TF-IDF
weights for those tokens. Once we had the two set of features, we concatenated
them into one set, which we then used for training a logistic regression
algorithm. Next, we generated the corresponding contextual embeddings from Beto
and TF-IDF bag-of-words features for the evaluation set. Finally, we predicted
the classes with the previously trained logistic regression model.</p>
        <p>For this system we utilized the same hyperparameters listed on 3.1.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>
        For this competition, the ranking was determined by measuring the systems
with the mean absolute error (MAE). As can be seen in Equation 1, this metric
outputs a nal number which is calculated by summing the magnitudes (absolute
values) of the errors to obtain the \total error" and then dividing the total error
by n [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>n
M AE = n 1 X jeij
i=1
(1)</p>
      <p>In order to get an overview of how our systems performed, we tested various
classi cation algorithms on a partition of the training set. For all the experiments
in Table 2, we tokenized the reviews using the Spacy model \es dep news trf".
The next step was to generate the TF-IDF weights. Then, we trained the di erent
classi cation algorithms. Finally, we evaluated each model.</p>
      <p>The results in Table 2 show the performance of the supervised algorithms,
of the ne-tuned Beto, and of Beto with the added TF-IDF feature vectors. It
is observed that ne-tuning Beto yields the best results in this task.</p>
      <p>Table 3 presents the results provided by the organizers of the competition.
These metrics were obtained on the evaluation set.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>The approach described in 3.1 obtained the lowest mean absolute error in the
Subtask 2 of this year's sentiment analysis shared task at IberLef. This suggests
that transfer learning is a functional approach for sentiment analysis with ve
classes. Even though we achieved the lowest mean absolute error among all the
teams in the shared task, we consider that this number is still very high. This
could be due to the di culty of doing classi cation among ve classes.</p>
      <p>One limitation of our approach is that transfer learning depends on the
corpora used to pre-train the model. This restricts the learning capabilities to
certain topics. Also, Beto was pre-trained using a corpus with a size similar to
that of Bert-Base. We hypothesize that using a pre-trained model with more
parameters would greatly improve our results.</p>
      <p>Another limitation of Beto is the computational power. Until now, these
architectures can only deal with a maximum length of 512 tokens per sequence.
This means that some reviews that exceed that token length are truncated before
the encoding, leading to a loss in data, and, therefore, in learning.</p>
      <p>Furthermore, we propose working on di erent approaches to multi-class
classi cation. While our second system did not produce better results than the rst
one, it did achieve the fth place among the participants. We hypothesize that
combining di erent sets of features generated by di erent language models could
improve the classi cation results when faced with ve classes.</p>
      <p>Finally, by manually analyzing the data set, we observed that sometimes the
classes had very little relation to the review. For example, the following review
was labeled with class 1: \If you go as a couple this place is a must, it is special to
climb and kiss on the third step, very romantic and emblematic". When reading
the review, one can not infer that this would be given such a low class. This kind
of annotation in the data set could be the reason behind the poor performance
obtained in this task.
This research was partially funded by CONACyT project CB A1-S-27780 and
DGAPA-UNAM PAPIIT grants number TA100520 and TA400121.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alvarez-Carmona</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aranda</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arce-Cardenas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fajardo-Delgado</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guerrero-Rodr</surname>
            <given-names>guez</given-names>
          </string-name>
          , R.,
          <string-name>
            <surname>Lopez-Monroy</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
            nez-Miranda,
            <given-names>J.</given-names>
          </string-name>
          , PerezEspinosa, H.,
          <string-name>
            <surname>Rodr</surname>
            guez-Gonzalez,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Overview of rest-mex at iberlef 2021: Recommendation system for text mexican tourism</article-title>
          .
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>67</volume>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Can~ete, J.,
          <string-name>
            <surname>Chaperon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fuentes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
          </string-name>
          , J.:
          <article-title>Spanish pre-trained bert model and evaluation data</article-title>
          .
          <source>PML4DC at ICLR</source>
          <year>2020</year>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dale</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Somers</surname>
            ,
            <given-names>H.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moisl</surname>
          </string-name>
          , H.:
          <article-title>Handbook of Natural Language Processing</article-title>
          . Marcel Dekker, Inc.,
          <source>USA</source>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Forum</surname>
            ,
            <given-names>I.L.E.</given-names>
          </string-name>
          :
          <article-title>Iberian languages evaluation forum</article-title>
          (
          <year>February 2021</year>
          ), https:// sites.google.com/view/iberlef2021
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moncho</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurtado</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pla</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Elirf-upv at tass 2020: Twilbert for sentiment analysis and emotion detection in spanish tweets</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2020</year>
          )
          <article-title>co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN</article-title>
          <year>2020</year>
          ), Malaga, Spain. vol.
          <volume>23</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Medhat</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korashy</surname>
          </string-name>
          , H.:
          <article-title>Sentiment analysis algorithms and applications: A survey</article-title>
          .
          <source>Ain Shams engineering journal 5</source>
          (
          <issue>4</issue>
          ),
          <volume>1093</volume>
          {
          <fpage>1113</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Meng</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          , Tayyar Madabushi, H.:
          <article-title>Uob at semeval-2020 task 12: Boosting bert with corpus level information</article-title>
          . arXiv e-prints pp.
          <source>arXiv{2008</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Palomino</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ochoa-Luna</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          : Palomino-Ochoa at tass 2020:
          <article-title>Transformer-based data augmentation for overcoming few-shot learning</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2020</year>
          ) (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Willmott</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matsuura</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance</article-title>
          .
          <source>Climate research 30(1)</source>
          ,
          <volume>79</volume>
          {
          <fpage>82</fpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>