<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BUAA's team in Rest-Mex 2023 - Sentiment Analysis: A Basic and Eficient Stylistic and Thematic Features Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lisette Guadalupe Castorena-Salas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fernando Sánchez-Vega</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrián Pastor López-Monroy</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Benemérita Universidad Autónoma de Aguascalientes (BUAA)</institution>
          ,
          <addr-line>Av. Universidad 940, 20100, Aguascalientes</addr-line>
          ,
          <country country="MX">México</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Mathematics Research Center (CIMAT)</institution>
          ,
          <addr-line>Jalisco S/N, Valenciana, 36023 Guanajuato, GTO</addr-line>
          <country country="MX">México</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The BUAA team's participation in REST-MEX 2023 and focuses on their exploration of essential features that capture writing style, such as character n-grams, as well as thematic elements like word usage and word n-grams. The Rest-Mex competition encompasses various subtasks, including the identification of tourist attraction types, which involves a thematic perspective. The prediction of the review's country of origin includes significant writing style components that can unveil the specific variant of Spanish used. Lastly, polarity prediction hypothesizes a blend of thematic components or commonly used words to express positive or negative opinions about tourist attractions, while also considering the author's writing style, such as the empathic or friendly tone used to describe the services and amenities of each attraction. The objective was to analyze diferent stylistic features of the texts and determine the most informative set in order to improve the classification in the analysis of Spanish tourist texts using the SVM algorithm. To achieve this, stylistic and thematic attributes were explored and combined using a two-stage hyperparameter search strategy. with this approach, a sentiment track score of 0.72 was achieved, securing the 6th place out of 17 participating teams. It is important to highlight that this result is significant, considering the simplicity of the proposed solution and a method that requires few computational resources.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Rest-Mex 2023</kwd>
        <kwd>Sentiment Analysis</kwd>
        <kwd>Bag of Words</kwd>
        <kwd>Char and word -grams</kwd>
        <kwd>Stylistic Features</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The tourism industry recognizes the importance of sentiment analysis in tourism texts. By
understanding the emotions expressed by tourists, companies, tourism boards, and researchers
can acquire invaluable insights into the factors influencing positive or negative sentiments.
This understanding enables stakeholders to make informed decisions, personalize oferings, and
enhance the overall travel experience.</p>
      <p>
        Rest-Mex is an international competition that focuses on sentiment analysis of tourism texts
in Spanish. The 2023 edition of the competition focuses on sentiment expressed in tourism texts
written in the Spanish language obtained from the TripAdvisor platform [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Spanish, being one
of the most widely spoken languages in the world, provides a vast corpus of user-generated
content, allowing us to gain a deep understanding of travelers’ experiences and emotions in
various regions. The purpose of this shared task competition is to motivate research focused
on Spain in the fields of sentiment analysis, detection of variants of Spanish, and specific
characteristics of the tourism industry, such as the types of attractions.
      </p>
      <p>
        The Rest-Mex competition problem is to determine the polarity, ranging from 1 to 5, of
opinions about tourist attractions. Additionally, we aim to predict the type of tourist place
visited, such as a hotel, restaurant, or attraction, as well as the country that was visited, whether
it is Mexico, Colombia, or Cuba [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Traditionally, sentiment analysis has been conducted by observing the words used in opinions.
Specific dictionaries or word sets associated with negative or positive polarity have been
employed, as well as ad hoc selections obtained through attribute selection techniques [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
While the words themselves can indicate polarity, how they are used must also be taken into
account. To address this aspect, we propose the incorporation of simple stylistic attributes [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
to discern the author’s style when determining the polarity expressed in a review. These simple
style attributes have proven efective in identifying polarity within contexts where texts exhibit
informality or lack meticulousness, such as social networks (as observed in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). Moreover,
they have found widespread applicability in languages with limited resources [
        <xref ref-type="bibr" rid="ref9">9, 10</xref>
        ], which
further underscores their utility. Hence, we advocate leveraging these simple style attributes,
(specifically char -grams) for sentiment analysis in the Spanish language. Furthermore, the
integration of these attributes holds significant potential for enhancing the recognition of
Spanish language variants since they reflect the stylistic features adopted by authors depending
on the variant they use.
      </p>
      <p>We believe that all subtasks could benefit from characterizing the writing style. Additionally,
we emphasize the importance of considering thematic aspects and sentiment analysis keywords.
To ensure comprehensive coverage of both thematic and stylistic characteristics, we have
employed character n-grams for capture these attributes. Furthermore, in order to reinforce
the thematic elements and polarity keywords, we have additionally incorporated attributes
obtained through a traditional bag-of-words approach [11].</p>
      <p>The proposed method, utilizing the concatenation of optimized bags of words and characters,
has demonstrated its eficiency and competitive performance when compared to computationally
demanding deep learning approaches. This approach achieved an the 6th place ranking among
a multitude of deep learning proposals. By combining carefully optimized word and character
representations, our method captures a comprehensive range of influences, ofering an efective
alternative that mitigates the need for extensive computational resources.</p>
    </sec>
    <sec id="sec-2">
      <title>1. Task and corpus description</title>
      <p>In this section, a detailed description of the tasks carried out to participate in the "Rest Mex 2023"
sentiment analysis competition will be presented. Additionally, a comprehensive explanation
will be provided regarding the corpora used in the research, which were crucial for developing
and evaluating sentiment analysis models.
1.1. Task
Given a specific opinion or review about a tourist destination, the main objective is to determine
the polarity or sentiment expressed in the text on a scale ranging from 1 to 5. The numerical
scale represents the spectrum of sentiment, where a rating of 1 indicates the highest degree of
dissatisfaction, while a rating of 5 reflects the highest degree of satisfaction.</p>
      <p>In addition to analyzing the polarity of sentiment, the competition also aimed to predict the
type of tourist site visited by the reviewer. The target categories include hotels, restaurants, and
attractions, allowing for a more comprehensive understanding of sentiment across diferent
aspects of the tourism industry.</p>
      <p>Furthermore, the competition sought to identify the origin of the reviewer, focusing on three
specific countries: Mexico, Colombia, and Cuba. By determining the geographical origin of the
reviewers, the analysis could potentially reveal cultural nuances and variations in sentiment
expression, providing valuable insights into regional perceptions and preferences.</p>
      <p>
        Through the Rest-Mex 2023 competition [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], participants were challenged to develop
innovative methodologies and machine learning models that can efectively address these aspects
of sentiment analysis in Spanish tourism texts. The ultimate goal was to accurately classify
sentiment polarity, determine the type of tourist site, and identify the origin of the reviewer,
thus enriching our understanding of travelers’ emotions, preferences, and perceptions across
various destinations.
      </p>
      <sec id="sec-2-1">
        <title>1.2. Development corpus: Rest-Mex 2022</title>
        <p>
          The database for the 2022 [12] competition consists of 30, 212 instances that were created since
2021 [13]. Some collaborators searched for the 30 most relevant tourist places in Guanajuato
and Jalisco, such as hotels, restaurants, and attractions, from the oficial TripAdvisor website.
Each instance contains the following information:
1. Title: the title that the tourist assigned to their review.
2. Opinion: the opinion expressed by the tourist.
3. Polarity: the polarity of the opinion: [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1, 2 ,3 ,4 ,5</xref>
          ].
4. Attraction: the type of place for which the opinion is being expressed: [“Hotel”,
“Restaurant” or “Attractive”].
        </p>
        <p>
          Furthermore, the polarity ranges from 1, indicating the highest degree of dissatisfaction, to 5,
representing the highest degree of satisfaction. It can be interpreted as follows: Very bad (1),
Bad (2), Neutral (3), Good (4), Very good (5).
In 2023 [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], the database from 2022 [12] was reused, and in addition, the 30 most relevant tourist
places from Puebla, Nuevo León, Veracruz, Cuba, and Colombia were added. This resulted in a
training dataset of 251, 702 instances. Each instance contains the following information:
1. Title: the title that the tourist assigned to their review.
2. Opinion: the opinion expressed by the tourist.
3. Polarity: the polarity of the opinion: [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1, 2 ,3 ,4 ,5</xref>
          ].
4. Country: the country that was visited.
5. Type: the type of place for which the opinion is being expressed: [“Hotel”, “Restaurant”
or “Attractive”].
        </p>
        <p>Just like in the previous corpus, the polarity range is from 1 to 5.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Methodology</title>
      <p>This section provides a detailed exposition of the methodology used in the investigation. It
thoroughly describes the steps and procedures followed, as well as the strategies used to achieve
reliable and significant results, and verify the relevance of the stylistic characteristics in the
diferent subtasks, as well as the convenience of joining these with the thematic characteristics
of the texts.</p>
      <sec id="sec-3-1">
        <title>Preprocessing</title>
        <p>Particularly, our dataset consists of reviews, so we need to preprocess them. The preprocessing
steps are described as follows:
- Tokenization: involves dividing a document into words (or character chunks if is the
case).
- Lowercasing: The idea is to replace all uppercase letters with lowercase letters.
- Stop words: Common words in the document that do not provide much information can
be removed.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Text Representation</title>
        <p>To apply machine learning tools, we need to generate a representation of the documents. Here
are descriptions of some used representations:
- -grams: are sequences of  continuous words or characters extracted from the text.
- Use idf: Enabling this option allows for inverse document frequency weighting.
- Smooth idf: It adds one to the document frequency as if there was an additional document
containing all terms from the collection only once.
- Sublinear tf: It replaces tf with 1 + log( ).
- Norm: There are two types of norms: 2 norm, which ensures the sum of squared vector
elements is 1, and 1 norm, which ensures the sum of absolute values of vector elements
is 1 [14].</p>
      </sec>
      <sec id="sec-3-3">
        <title>Feature Selection</title>
        <p>- Max features: Builds a vocabulary based on the given top frequency.
- Min df: Ignores terms in the document that have a frequency lower than the specified
threshold.</p>
        <p>The addition (or omission) of each of these techniques and the parameters with which they
are used define the set of hyperparameters of the representation. The subsection below describes
the hyperparameter selection methodology followed.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Procedure for Hyperparameter selection</title>
        <p>In this subsection, we will outline the procedure followed for selecting the hyperparameters
in our study. Hyperparameters play a crucial role in determining the performance of machine
learning models, and their optimal selection is essential for achieving accurate and robust
results. However, exhaustive exploration of all possible combinations through a greedy search
becomes computationally infeasible and time-consuming. Therefore, to address this challenge
and optimize the parameters efectively, a systematic strategy is essential, enabling an eficient
search approach that balances computational resources and performance optimization.</p>
        <p>
          We start by identifying the relevant hyperparameters for our model and their potential range
of values. This involved selecting two groups of hyperparameter characteristics and evaluating
the model’s performance using each combination. The selection of hyperparameters was carried
out in two stages. First, the best values for the hyperparameters of group 1 were chosen with
group 2 fixed. Then, the values of group 2 were optimized using the best configurations from
group 1. This approach was taken because evaluating all hyperparameter combinations would
require an impractically large grid, exceeding our available resources. The evaluations were
performed using the 2022 [12] database for polarity and attraction type tasks since the number
of instances was optimal for testing. However, for country prediction, we used a 20% subset of
the 2023 [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] training database since only Mexican texts were available in 2022.
        </p>
        <p>We used appropriate standard evaluation metrics such as  1 score (also known as  -measure)
and Mean Squared Error ( ) [15] to assess the model’s performance for each
hyperparameter combination. To ensure reliable performance evaluation and reduce the risk of overfitting,
we applied four-fold cross-validation technique. This technique allowed us to evaluate the
performance of diferent hyperparameter combinations on multiple data subsets and obtain
more robust results.</p>
        <p>The explored combinations of hyperparameters covered various dimensions. Firstly, we
examined diferent units of analysis, including word-based analysis related to thematic attributes,
as well as individual characters and characters surrounded by word boundaries or white spaces,
which are associated with stylistic attributes. Additionally, we considered other preprocessing
features, such as case inclusion/exclusion, stop word removal, minimum frequency threshold,
and the option to apply inverse frequency transformation or not.</p>
        <p>Regarding word -grams, we considered a wide range of options, ranging from unigrams to
5-grams, and also explored combinations between them, e.g., unigrams, bigrams, and trigrams.
For character -grams, we set a minimum range of 2 characters and a maximum of 9, exploring
all combinations within that range. Additionally, the maximum feature limit was set at 25, 000.</p>
        <p>After analyzing the results based on the selected evaluation metric on the second stage, we
identified the combinations that yielded the best performance with diferent ranges of word and
character -grams. Then, we kept the features of these selected models fixed and performed
additional combinations by varying the maximum feature.</p>
        <p>In addition to variations in the maximum feature limit, we also explored diferent norms
for model processing, considering options such as 1, 2, or none. Lastly, we made decisions
regarding the use of additional text representations, such as smooth IDF and sublinear TF.</p>
        <p>The features of the top four models from the second group of hyperparameter combinations
were selected. The thematic and stylistic-oriented representation attributes were then merged.
Finally, the best final models were obtained by combining the combinations that yielded the
best evaluations and were trained using the entire training dataset.</p>
        <p>By following this procedure, we aimed to find the optimal set of hyperparameters that
maximized our model’s performance. This iterative process allowed us to fine-tune the model
and improve its efectiveness.</p>
        <p>The diagram in Figure 1 summarizes the aforementioned process, illustrating the workflow
followed in the hyperparameter selection.</p>
        <p>Reviews
of tourist
places</p>
        <p>First stage:
Evaluate the combinations of
representation features using
4</p>
        <p>fold cross-validation.</p>
        <p>Group 1 characteristics:
Analysis Unit.</p>
        <p>Lowercase.</p>
        <p>Stop Words.</p>
        <p>Minimum Frequency.</p>
        <p>IDF.
n-grams.</p>
        <p>Selection of the best models</p>
        <p>from the first stage.</p>
        <p>The characteristics of the
top four models from Group
1 with thematic and stylistic
orientation are determined.</p>
        <p>Predictions with the data</p>
        <p>from 2023.</p>
        <p>The best models were</p>
        <p>trained using the
complete training set.</p>
        <p>Join the attributes of the
representations with thematic and</p>
        <p>stylistic orientation.</p>
        <p>Two thematic representations and
two stylistic representations were
selected, and combinations were
made between them.</p>
        <p>Second stage:
Evaluate the combinations of
representation features using</p>
        <p>4-fold cross-validation.</p>
        <p>Group 2 characteristics:
Maximum features.</p>
        <p>Norm.</p>
        <p>Smooth IDF.</p>
        <p>Sublinear term frequency.</p>
        <p>Selection of the best models
from the second stage.</p>
        <p>The characteristics of the</p>
        <p>top four models from
Group 2 with thematic and
stylistic orientation are
determined.</p>
      </sec>
      <sec id="sec-3-5">
        <title>Machine learning algorithm</title>
        <p>The algorithm applied for classification was Support Vector Machine (SVM) with a linear
kernel. The linear kernel allows the model to capture linear relationships between features and
sentiments.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Results</title>
      <sec id="sec-4-1">
        <title>First Stage</title>
        <p>In this section, we present the results of the experiments conducted in the research on sentiment
analysis of tourism texts, as well as the results obtained in the Rest-Mex 2023 competition.
3.1. Preliminary evaluation and hyperparameter setting
Next, we present the main findings and results obtained during the experiments. These results
are based on common evaluation metrics such as MAE and F-measure.</p>
        <p>In the first stage of the experiments, we explored the hyperparameter combinations from Group
1. These combinations involved varying diferent units of analysis (character, character with
word boundaries, and words), inclusion/exclusion of capitalization, removal of stop words,
minimum frequency threshold (1, 3 or 5), option to apply inverse frequency transformation or
not, and the aforementioned ranges of -grams. It’s worth noting that the maximum feature
limit was fixed at 25, 000 for this stage. A total of 1584 hyperparameter configurarions were
generated, and each combination was evaluated based on the  -measure. Figure 2 displays the
average, variability, maximum, and minimum values of the  -measure for each hyperparameter
while varying the remaining configurations. This analysis helps determine the range of coupling
between each hyperparameter and the rest of the hyperparameters in Group 1.</p>
        <p>Polarity</p>
        <p>Noticeable diferences in the average  1 scores among the three units of analysis are observed,
with the "character" attribute analysis obtaining the highest average. To provide a clearer view
of the maximum value, Figure 3 presents a zoomed-in representation.</p>
        <p>These results highlight the importance of carefully selecting features and settings in sentiment
analysis of tourism texts. Additionally, they provide valuable insights into the factors influencing
the accuracy and efectiveness of sentiment analysis models applied to this specific domain.</p>
        <p>In Figure 5, the most relevant -gram ranges are presented, with the limits of the graph
varying according to the task.</p>
        <p>Evaluation of the F−measure with respect to different n−gram ranges
Polarity</p>
        <p>Character</p>
        <p>For polarity prediction, it is observed that the bag of words containing 3-gram and 4-gram
characters, as well as single 4-gram characters, achieve the highest  1 scores. In general, the
4-gram characters show the best average. However, for word n-grams, it was found that the
n-grams consisting of 1, 2, and 3 words together yield better scores.</p>
        <p>Regarding the country and attraction tasks, the results are very similar for character n-grams,
with the 5-gram characters having the highest average. However, for word ranges, it was found
that the best model for the country task is based on word unigrams, while for the attraction
task, a combination of unigrams and bigrams achieves a better score.</p>
        <p>Based on these results, selected ranges were chosen, one with a smaller -gram range and
another with a wider range, which showed the best values in terms of the F-measure. These
selected ranges, along with the aforementioned attributes, were kept fixed for use in the second
stage.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Second stage</title>
        <p>In the second stage of the experiments, additional analyses were conducted with the aim of
refining and improving the results obtained in the first stage. The following are the most
relevant findings and results from this stage.</p>
        <p>In the first stage, certain hyperparameters were used to analyze the polarity task, including
diferent units of analysis (characters and words), from which three of the best configurations
were selected (the settings shown in the Table 1). In the second stage, additional hyperparameters
were incorporated, such as diferent norms (L1, L2, or none) and variations in the maximum
number of features in the bag-of-words (5000, 10000, 25000, 30000, and 60000). Diferent
smoothers, such as smooth IDF and sublinear TF, were also combined. In total, 180 combinations
were obtained for the polarity task.</p>
        <p>For the prediction of country and attraction, the best hyperparameter configurations were
also selected in the first stage, but in this case, two were chosen for character-based analysis and
two for word-based analysis. In the second stage, the same hyperparameters as in the polarity
task were added, resulting in a total of 240 combinations for these tasks.</p>
        <p>Figure 6 and Figure 7 show the variation in the use of Sublinear TF and Smooth IDF smoothers
when changing the maximum number of tokens or features in the bag-of-words matrix,
respectively.</p>
        <p>In general, we observe that as the maximum number of features increases, the  1 score also
tends to increase in both cases of smoothers. This indicates that a larger number of features
allows for capturing more relevant information in sentiment analysis of tourism texts.
0.45</p>
        <p>●● ●●
0.88 ●</p>
        <p>●
−
F
e
r
u
s
a
em0.86 ● ● ●</p>
        <p>● ●</p>
        <p>● ●
●
●
● ●
● ●
●</p>
        <p>Task</p>
        <p>Table 2 summarizes the best configurations identified for each specific task, considering both
the representation features used in the first stage and the additional features incorporated in
this stage.</p>
        <p>These optimal configurations were selected based on their performance in terms of the
 -measure. By presenting these configurations in a table format, it facilitates the comparison
and selection of the most efective options for each particular task.</p>
        <p>Table 3 shows the  -measure obtained for the best configurations along with their respective
ID.</p>
        <p>After selecting the best configurations from the second stage, the attributes of the
topicoriented and stylistic representations were combined to evaluate if better performance could be
achieved in the models. The results obtained for each of the tasks are presented below (Tabla 4).</p>
        <sec id="sec-4-2-1">
          <title>3.2. Competition Results</title>
          <p>Finally, the following combinations were submitted from the best models, and the results
obtained are shown in Table 5.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion</title>
      <p>The BUAA team’s participation in REST-MEX 2023 focused on exploring essential features that
capture writing style and thematic elements to improve the classification of Spanish tourist texts
using the SVM algorithm. When comparing the two types of explored attributes, character
ngrams and words, we found that character n-grams perform better in predicting the destination
and polarity in terms of the F-measure, while word n-grams are slightly better in predicting
the type of tourist place and considering the MAE for polarity. These findings were expected,
as speaker recognition relies more on stylistic elements, while identifying the type of tourist
attraction is more thematic. On the other hand, for polarity identification of individual classes,
character n-grams appear to be more efective, while for modeling polarity trend using MAE,
words are a slightly better option. It was also found that combining stylistic and thematic
attributes with the union of bag-of-words from both characters and words always allows for
better classification with a higher F1 score.</p>
      <p>In the sentiment analysis competition, we achieved a score of 0.72, ranking 6th out of 17
teams. This score surpasses the benchmark set by BERT (BaseLine-Beto-No-Fine-Tuning) and
is above the average of the participants. Our approach proved to be successful in the sentiment
analysis competition for tourist texts, achieving a notable score and outperforming most of the
competitors. Also, this result is particularly noteworthy given the simplicity of the proposed
solution and the minimal computational resources required.</p>
      <p>In conclusion, these results highlight the importance of considering diferent types of
attributes and their combination in sentiment analysis of tourism texts. By incorporating both
stylistic and thematic attributes, as well as using bag-of-words from characters and words,
the classification accuracy is improved, leading to better results in terms of the F1 measure.
These findings provide a solid foundation for the development of more accurate and efective
sentiment analysis models in the tourism domain, enabling companies and organizations in the
industry to make informed decisions and enhance the travelers’ experience.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Ethical Issues</title>
      <p>We consider it highly relevant to address tasks concerning languages from the global south,
which, despite having large populations, often lack attention in the development of language
technologies. While we acknowledge the value of exploring linguistic variants and their
identiifcation, it is essential to acknowledge that such methodologies can inadvertently perpetuate
market segmentation and biases in the provision of tourist and cultural oferings across countries
and populations with divergent socioeconomic levels.</p>
      <p>Additionally, it is noteworthy that the proposed methodology in this study operates within
resource constraints, yet still achieves competitive outcomes when compared to approaches
necessitating access to robust computing systems and graphical processing units (GPUs). Hence,
it is imperative to maintain research endeavors that prioritize sustainable, straightforward,
and eficacious methodologies, thereby enabling widespread adoption among populations with
limited access to the requisite hardware resources for deep learning techniques.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We sincerely thank the Municipality of Rincón de Romos, and the Government of Aguascalientes,
through INCyTEA, for their valuable financial support. Thanks to their generosity, we have
been able to carry out our research and present our paper at the IBERLEF 2023 Congress in
Jaén, Spain. Their financial backing has been crucial to our academic success, and we look
forward to continuing our collaboration in the future to promote research and development
in our community. Sanchez-Vega would like to thank CONACYT for its support through the
Program “Investigadoras e Investigadores por México” by the project No. 1311, ID. 11989.
supervised machine learning, in: A. Balahur, E. V. der Goot, A. Montoyo (Eds.), Proceedings
of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social
Media Analysis, WASSA@NAACL-HLT 2013, 14 June 2013, Atlanta, Georgia, USA, The
Association for Computer Linguistics, 2013, pp. 65–74. URL: https://aclanthology.org/
W13-1609/.
[10] J. Kapociute-Dzikiene, A. Krupavicius, T. Krilavicius, A comparison of approaches for
sentiment classification on lithuanian internet comments, in: J. Piskorski, L. Pivovarova,
H. Tanev, R. Yangarber (Eds.), Proceedings of the 4th Biennial International Workshop on
Balto-Slavic Natural Language Processing, BSNLP@ACL 2013, Sofia, Bulgaria, August 8-9,
2013, Association for Computational Linguistics, 2013, pp. 2–11. URL: https://aclanthology.
org/W13-2402/.
[11] Y. Sari, M. Stevenson, A. Vlachos, Topic or style? exploring the most useful features for
authorship attribution, in: E. M. Bender, L. Derczynski, P. Isabelle (Eds.), Proceedings of
the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe,
New Mexico, USA, August 20-26, 2018, Association for Computational Linguistics, 2018,
pp. 343–353. URL: https://aclanthology.org/C18-1029/.
[12] M. Á. Álvarez-Carmona, Á. Díaz-Pacheco, R. Aranda, A. Y. Rodríguez-González, D.
FajardoDelgado, R. Guerrero-Rodríguez, L. Bustio-Martínez, Overview of rest-mex at iberlef 2022:
Recommendation system, sentiment analysis and covid semaphore prediction for mexican
tourist texts, Procesamiento del Lenguaje Natural 69 (2022) 289–299.
[13] M. Á. Álvarez-Carmona, R. Aranda, S. Arce-Cardenas, D. Fajardo-Delgado, R.
GuerreroRodríguez, A. P. López-Monroy, J. Martínez-Miranda, H. Pérez-Espinosa, A. Y.
RodríguezGonzález, Overview of rest-mex at iberlef 2021: Recommendation system for text mexican
tourism (2021).
[14] S. Bird, E. Klein, E. Loper, Natural language processing with Python: analyzing text with
the natural language toolkit, " O’Reilly Media, Inc.", 2009.
[15] I. D. Dinov, Data science and predictive analytics, Cham, Switzerland (2018).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guerrero-Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>Álvarez-Carmona</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Aranda</surname>
            ,
            <given-names>A. P.</given-names>
          </string-name>
          <string-name>
            <surname>López-Monroy</surname>
          </string-name>
          ,
          <article-title>Studying online travel reviews related to tourist attractions using nlp methods: the case of guanajuato, mexico</article-title>
          , Current Issues in Tourism (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . doi:https://doi.org/10.1080/ 13683500.
          <year>2021</year>
          .
          <volume>2007227</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Álvarez</surname>
          </string-name>
          <string-name>
            <surname>Carmona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Rodríguez-Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fajardo-Delgado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pérez-Espinosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martínez-Miranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guerrero-Rodríguez</surname>
          </string-name>
          , L. BustioMartínez, Ángel Díaz-Pacheco,
          <article-title>Natural language processing applied to tourism research: A systematic review and future research directions</article-title>
          ,
          <source>Journal of King Saud University - Computer and Information Sciences</source>
          <volume>34</volume>
          (
          <year>2022</year>
          )
          <fpage>10125</fpage>
          -
          <lpage>10144</lpage>
          . URL: https: //www.sciencedirect.com/science/article/pii/S1319157822003615. doi:https://doi.org/ 10.1016/j.jksuci.
          <year>2022</year>
          .
          <volume>10</volume>
          .010.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Diaz-Pacheco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Álvarez</surname>
          </string-name>
          <string-name>
            <surname>Carmona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guerrero-Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A. C.</given-names>
            <surname>Chávez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Rodríguez-González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Ramírez-Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aranda</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence methods to support the research of destination image in tourism. a systematic review</article-title>
          ,
          <source>Journal of Experimental &amp; Theoretical Artificial Intelligence</source>
          <volume>0</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          . doi:
          <volume>10</volume>
          .1080/0952813X.
          <year>2022</year>
          .
          <volume>2153276</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <article-title>Álvarez-Carmona, Á</article-title>
          . Díaz-Pacheco,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Rodríguez-González</surname>
          </string-name>
          , L. BustioMartínez, V.
          <string-name>
            <surname>Muñis-Sánchez</surname>
            ,
            <given-names>A. P.</given-names>
          </string-name>
          <string-name>
            <surname>Pastor-López</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sánchez-Vega</surname>
          </string-name>
          ,
          <article-title>Overview of rest-mex at iberlef 2023: Research on sentiment analysis task for mexican tourist texts</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Molina-González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Martínez-Cámara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Martín-Valdivia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M. J.</given-names>
            <surname>Zafra</surname>
          </string-name>
          , esolhotel: Generación de un lexicón
          <article-title>de opinión en español adaptado al dominio turístico</article-title>
          ,
          <source>Proces. del Leng. Natural</source>
          <volume>54</volume>
          (
          <year>2015</year>
          )
          <fpage>21</fpage>
          -
          <lpage>28</lpage>
          . URL: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/ article/view/5090.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Moreno-Ortiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Hernández</surname>
          </string-name>
          ,
          <article-title>Lexicon-based sentiment analysis of twitter messages in spanish</article-title>
          ,
          <source>Proces. del Leng. Natural</source>
          <volume>50</volume>
          (
          <year>2013</year>
          )
          <fpage>93</fpage>
          -
          <lpage>100</lpage>
          . URL: http://journal.sepln.org/sepln/ ojs/ojs/index.php/pln/article/view/4664.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rocha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Scheirer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Forstall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cavalcante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Theophilo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Carvalho</surname>
          </string-name>
          , E. Stamatatos,
          <article-title>Authorship attribution for social media forensics</article-title>
          ,
          <source>IEEE Trans. Inf. Forensics Secur</source>
          .
          <volume>12</volume>
          (
          <year>2017</year>
          )
          <fpage>5</fpage>
          -
          <lpage>33</lpage>
          . URL: https://doi.org/10.1109/TIFS.
          <year>2016</year>
          .
          <volume>2603960</volume>
          . doi:
          <volume>10</volume>
          .1109/TIFS.
          <year>2016</year>
          .
          <volume>2603960</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Guo,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schütze</surname>
          </string-name>
          ,
          <article-title>Codex: Combining an SVM classifier and character n-gram language models for sentiment analysis on twitter text</article-title>
          , in: M. T. Diab,
          <string-name>
            <given-names>T.</given-names>
            <surname>Baldwin</surname>
          </string-name>
          , M. Baroni (Eds.),
          <source>Proceedings of the 7th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT</source>
          <year>2013</year>
          , Atlanta, Georgia, USA, June 14-15,
          <year>2013</year>
          , The Association for Computer Linguistics,
          <year>2013</year>
          , pp.
          <fpage>520</fpage>
          -
          <lpage>524</lpage>
          . URL: https://aclanthology.org/S13-2086/.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Habernal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ptácek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Steinberger</surname>
          </string-name>
          ,
          <article-title>Sentiment analysis in czech social media using</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>