<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multitask Classification of Mexican Tourist Reviews Using a Multi-Head Transformer Model Based on BETO</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan Javier Monsivais Borjón</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miguel Ángel Álvarez-Carmona</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro de Investigación en Matemáticas (CIMAT)</institution>
          ,
          <addr-line>Sede Monterrey, Monterrey Nuevo León</addr-line>
          ,
          <country country="MX">México</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Opinion analysis has become a crucial tool for understanding public sentiment across a wide range of domains, including the tourism industry In this study, we propose a deep learning approach for multitask classification of Spanish-language tourist reviews, leveraging the Rest-Mex 2025 dataset. We employ a pre-trained Transformer model, BETO, extended with a multi-head architecture capable of jointly predicting sentiment polarity, tourist town, and type of establishment. The textual data undergoes extensive preprocessing and label encoding. Our model achieves strong performance, notably in the classification of establishment type (  1macro = 0.976) and competitive results in town prediction ( 1macro = 0.623), a task involving 40 distinct classes. These results underscore the power of multi-head Transformers in complex, domain-specific NLP tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sentiment Analysis</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Rest-Mex Track</kwd>
        <kwd>IberLEF 2025</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, online review platforms have become central to how travelers share their experiences,
generating vast repositories of user-generated content [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3, 4, 5</xref>
        ]. This textual data, though
unstructured, is a goldmine of insights into customer sentiments, service quality perceptions, and destination
appeal [
        <xref ref-type="bibr" rid="ref2">6, 2</xref>
        ]. In tourism, automated analysis of these reviews enables businesses to refine their oferings
while empowering travelers to make well-informed decisions—a dynamic especially relevant in Mexico
[7], a nation with exceptional cultural, gastronomic, and geographic diversity and one of the country’s
largest economic pillars [8, 9, 10].
      </p>
      <p>Since its inception in 2021, the Rest-Mex shared task has served as a leading benchmark in applying
NLP to Mexican tourist texts. In 2021, the challenge focused on two tasks: predicting overall satisfaction
scores (recommendation systems) and classifying sentiment polarity from TripAdvisor reviews [11].
The 2022 edition expanded to include a third track: predicting the federal “COVID-19 epidemiological
semaphore” status [12, 13] from news texts [14]. By 2023, Rest-Mex had added text clustering as a new
task and broadened its dataset to include reviews from Cuba and Colombia, though sentiment and
type classification remained core [ 15]. Throughout these versions, the primary focus has remained on
sentiment polarity, type of place, and in later years, country-level categorization.</p>
      <p>The 2025 edition introduces a novel dimension by directly involving “pueblos mágicos” (magical
towns): participants must now precisely identify the specific town mentioned among forty Mexican
localities. This addition elevates the geographic granularity of the task, requiring models to discern
ifne-grained location cues—an essential step for geographically-aware tourism analytics [16, 17].</p>
      <p>Transformer-based models such as BERT and its Spanish-adapted variant BETO have consistently
excelled in such NLP challenges due to their strong contextual understanding. Their capacity to encode
semantic subtleties makes them ideal for handling diverse and nuanced tourist reviews [18].</p>
      <p>In this study, we propose an enhanced multilingual, multi-head architecture built atop BETO, aiming
to simultaneously predict three outputs: sentiment polarity, town (now including magic towns), and
establishment type. This multitask framework leverages shared representations and joint optimization
to achieve computational eficiency and improve generalization across tasks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>Before detailing the individual components—such as data collection, preprocessing, model design, and
evaluation—we begin with a high-level overview of our methodological approach. Drawing on best
practices in NLP, our pipeline adheres to a clear progression from data preparation to model deployment.
[19]</p>
      <p>First, we carry out comprehensive cleaning and normalization to ensure consistency and reduce noise
in the input text. This step establishes a solid foundation, as preprocessing decisions can significantly
influence downstream model performance [20].</p>
      <p>Next, we design a multitask Transformer-based architecture that leverages shared representations to
handle simultaneous predictions of sentiment, location, and type. Training proceeds under a unified
objective, combining losses for each task head. Finally, we evaluate the system using established metrics
like accuracy and macro F1-score, which are particularly suitable for handling class imbalances [21].</p>
      <p>In the following subsections, we detail each phase—starting with the dataset and preprocessing
steps—before exploring the model architecture, training setup, and evaluation strategies.</p>
      <sec id="sec-2-1">
        <title>2.1. Dataset and Preprocessing</title>
        <p>The foundation of any robust NLP model is high-quality, well-prepared training data. For this reason,
we began our methodology using the Rest-Mex 2025 training dataset, which comprises over 208,000
authentic Spanish-language tourist reviews. Figure shows the distribution for the three labels.</p>
        <p>To ensure reliable downstream performance, we implemented an extensive preprocessing pipeline
informed by established best practices in text classification and sentiment analysis:
1. Encoding Correction: We detect and remedy common encoding anomalies such as
misrepresented accents or fragmented characters—prevalent in user-submitted web text.
2. Pattern Removal: Employing custom regular expressions, we filtered out HTML remnants
and encoding artifacts (e.g., _xABCD_), preventing non-linguistic tokens from polluting the
vocabulary.
3. Whitespace &amp; Line Break Normalization: Multi-space sequences, tabs, and inconsistent
newlines were standardized to single spaces, improving tokenization stability without altering
semantics.
4. Unicode Normalization: Utilizing NFKC normalization via unicodedata.normalize, we
harmonized composite Unicode characters into canonical forms, reducing variation in accented
characters and special symbols.</p>
        <p>Empirical studies show that even simple cleaning decisions—such as normalization and encoding
repair—can significantly afect model performance, especially in sentiment tasks [4].</p>
        <p>Following preprocessing, we extracted the key columns: Review, Polarity, Town, and Type.
We discarded less relevant fields like Title (present in only 2 samples) and Region (not used for
optimization). Target labels were then converted into numerical indices using LabelEncoder, enabling
eficient mapping during model training.</p>
        <p>This thorough approach to dataset cleaning and label encoding established a consistent and
noisereduced foundation, crucial for the efective representation learning that underpins our multi-head
Transformer model.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Model Architecture: MultiHeadBETO</title>
        <p>Inspired by hard parameter sharing in multitask learning:contentReference[oaicite:2]index=2, we
designed a model named MultiHeadBETO, based on the BETO Transformer encoder [22]. The model
architecture consists of:
• A shared BETO encoder (dccuchile/bert-base-spanish-wwm-uncased [23]).
• Three task-specific classification heads branching from the shared [CLS] token:</p>
        <sec id="sec-2-2-1">
          <title>1. Polarity Head: 5-class linear layer. 2. Town Head: 40-class linear layer. 3. Type Head: 3-class linear layer.</title>
          <p>Training is guided by a joint loss function combining the cross-entropy of each head:
total = polarity + town + type</p>
          <p>This hard-sharing setup encourages shared contextual understanding while allowing task-specific
discrimination.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Training Setup</title>
        <p>The cleaned dataset was split 80/20 into training and validation sets using a fixed random seed.
Tokenization leveraged the BETO tokenizer with a maximum sequence length of 128.</p>
        <p>Training was performed using HuggingFace’s Trainer API with the following configuration:
• Epochs: 3
• Batch size: 16
• Mixed precision (FP16): Enabled
• Evaluation: Once per epoch
• Checkpoints: Max 2 saved
• Early stopping was optionally implemented through learning rate scheduling.</p>
        <sec id="sec-2-3-1">
          <title>This setup balances computational eficiency with model robustness.</title>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Evaluation Metrics</title>
        <sec id="sec-2-4-1">
          <title>Model performance was evaluated on the validation set using:</title>
          <p>• Accuracy: Fraction of correct predictions.
• F1-score (macro): Averaged harmonic mean of precision and recall to balance class-level
performance across imbalanced categories.</p>
          <p>These metrics are standard in multitask NLP evaluation:contentReference[oaicite:3]index=3 and are
particularly sensitive to the class imbalance present in Town and Polarity labels.</p>
        </sec>
        <sec id="sec-2-4-2">
          <title>Metric</title>
        </sec>
        <sec id="sec-2-4-3">
          <title>F1-score (macro) Accuracy</title>
        </sec>
        <sec id="sec-2-4-4">
          <title>Evaluation Loss Eval Runtime Samples/sec Steps/sec</title>
        </sec>
        <sec id="sec-2-4-5">
          <title>Polarity</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <sec id="sec-3-1">
        <title>Performance metrics on the validation set are summarized in Table 1.</title>
        <p>Type classification achieved near-perfect performance, likely due to distinguishable lexical patterns
across the three classes.</p>
        <p>Town classification was more challenging, given the 40-class imbalance and lexical overlap between
towns.</p>
        <p>Polarity classification yielded moderate results, reflecting the dificulty of detecting nuanced
sentiment in natural language.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>The results demonstrate that our multi-head architecture based on BETO efectively handles the
multitask challenges posed by Rest-Mex 2025. By sharing a common Transformer encoder and using
task-specific classification heads, the model benefits from cross-task representation learning—a strategy
well-supported by prior studies in multitask learning. For instance, trade-ofs between sentiment and
type classification arise naturally, yet the shared layers amplify performance across related tasks by
extracting mutually informative features.</p>
      <p>Notably, classification of establishment type achieved near-perfect performance, suggesting that
lexical cues and domain-specific vocabulary are highly discriminative. However, sentiment polarity
remains the most demanding task, as it requires detecting nuanced tone, sarcasm, and implicit opinions
within user-generated text. Enhancements such as task-specific attention modules or dynamic loss
weighting could improve model focus on sentiment subtleties. Similarly, town classification accuracy
could benefit from incorporating geographical priors or embeddings that capture spatial adjacency and
regional similarity—techniques proven efective in geo-aware NLP research.</p>
      <p>Implementing alternative sampling strategies during training, such as square-root or proportional
scheduling, might also balance gradient contributions across tasks and reduce overfitting in
underrepresented classes:contentReference[oaicite:4]index=4. Additionally, external linguistic and geographic
knowledge sources—such as knowledge graphs or gazetteers—could provide disambiguation support,
especially for town names with overlapping feature sets.</p>
      <p>Overall, while the current architecture ofers a strong baseline, our findings indicate clear avenues
for refinement through modular enhancements and richer contextual grounding.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We introduced MultiHeadBETO, a multitask Transformer model designed to jointly predict sentiment
polarity, magical-town afiliation, and establishment type from Spanish-language tourist reviews. The
model achieved outstanding classification results, especially for establishment type (  1macro = 0.976),
while delivering competitive performance in town and sentiment tasks. These outcomes confirm that
the combination of shared Transformer representations and task-specific heads is a powerful strategy
for multi-dimension text classification in specialized domains like tourism.</p>
      <p>These findings also highlight the versatility of multitask Transformer architectures in leveraging the
interdependence between tasks, thereby improving overall robustness and generalization—an advantage
underscored in previous studies:contentReference[oaicite:5]index=5. Moreover, incorporating “pueblos
mágicos” in the classification schema demonstrates the model’s adaptability to incorporate more granular
geographic tasks without disproportionately afecting performance.</p>
      <p>Our work afirms the potential of end-to-end models in extracting valuable insights from real-world
tourist review datasets. At the same time, persistent challenges in sentiment and fine-grained geographic
classification point to promising directions involving enriched linguistic and spatial modeling.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Future Directions</title>
      <p>Beyond the current scope, several research avenues warrant exploration. One promising line involves
the introduction of task-specific attention layers or dynamic weighting mechanisms aimed at improving
task calibration—especially for sentiment prediction, which still lags behind in F1-score. Another
frontier is the integration of geographic embeddings or knowledge graphs to provide richer spatial
context, which could bolster performance in town identification and reduce confusion among similar
locales.</p>
      <p>Furthermore, improvements in class balance through data augmentation or sophisticated sampling
techniques may mitigate skewed distributions in polarity and town labels. A comprehensive error
analysis leveraging attention visualization or misclassicfiation diagnostics could guide targeted refinements.
Finally, experimenting with more expressive or hierarchical classification heads—potentially involving
non-linear or deeper architectural blocks—may enhance the model’s capability to capture complex
dependencies and nuanced expression across tasks.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>The authors gratefully acknowledge the support provided by the Mexican Academy of Tourism Research
(AMIT) for the project “Balancing Tourism Text Data with Artificial Intelligence for Sentiment Analysis: A
Specialized Language Model Approach” funded through the Research Projects 2024 call. Additionally, this
work was also supported by the project “Text Generation for Data Balancing in Sentiment Classification:
Application to Tourism Data” under the CICIMPI 2024 call of the Centro de Investigación en Matemáticas
(CIMAT).</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>We declare that the present manuscript has been written entirely by the authors and that no generative
artificial intelligence tools were used in its preparation, drafting, or editing.
destination image in tourism. a systematic review, Journal of Experimental &amp; Theoretical Artificial
Intelligence 36 (2024) 1415–1445.
[4] M. Á. Álvarez-Carmona, R. Aranda, A. Y. Rodríguez-Gonzalez, D. Fajardo-Delgado, M. G. Sánchez,
H. Pérez-Espinosa, J. Martínez-Miranda, R. Guerrero-Rodríguez, L. Bustio-Martínez, Á.
DíazPacheco, Natural language processing applied to tourism research: A systematic review and future
research directions, Journal of king Saud university-computer and information sciences 34 (2022)
10125–10144.
[5] I. Castillo-Ortiz, M. Á. Álvarez-Carmona, R. Aranda, Á. Díaz-Pacheco, Evaluating culinary skill
transfer: A deep learning approach to comparing student and chef dishes using image analysis,
International Journal of Gastronomy and Food Science 38 (2024) 101070.
[6] Á. Díaz-Pacheco, R. Guerrero-Rodríguez, M. Á. Álvarez-Carmona, A. Y. Rodríguez-González,
R. Aranda, Quantifying diferences between ugc and dmo’s image content on instagram using
deep learning, Information Technology &amp; Tourism 26 (2024) 293–329.
[7] Á. Díaz-Pacheco, R. Guerrero-Rodríguez, M. Á. Álvarez-Carmona, A. Y. Rodríguez-González,
R. Aranda, A comprehensive deep learning approach for topic discovering and sentiment analysis
of textual information in tourism, Journal of King Saud University-Computer and Information
Sciences 35 (2023) 101746.
[8] S. Arce-Cardenas, D. Fajardo-Delgado, M. Á. Álvarez-Carmona, J. P. Ramírez-Silva, A tourist
recommendation system: a study case in mexico, in: Mexican international conference on artificial
intelligence, Springer, 2021, pp. 184–195.
[9] E. P. Ramirez-Villaseñor, H. Pérez-Espinosa, M. A. Álvarez-Carmona, R. Aranda, Design,
development, and evaluation of a chatbot for hospitality services assistance in spanish, Acta universitaria
33 (2023).
[10] A. Diaz-Pacheco, M. A. Álvarez-Carmona, A. Y. Rodríguez-González, H. Carlos, R. Aranda,
Measuring the diference between pictures from controlled and uncontrolled sources to promote a
destination. a deep learning approach (2023).
[11] M. Á. Álvarez-Carmona, R. Aranda, S. Arce-Cárdenas, D. Fajardo-Delgado, R. Guerrero-Rodríguez,
A. P. López-Monroy, J. Martínez-Miranda, H. Pérez-Espinosa, A. Rodríguez-González, Overview
of rest-mex at iberlef 2021: Recommendation system for text mexican tourism, Procesamiento del
Lenguaje Natural 67 (2021). doi:https://doi.org/10.26342/2021-67-14.
[12] M. Á. Alvarez-Carmona, R. Aranda, Determinación automática del color del semáforo mexicano
del covid-19 a partir de las noticias (2022).
[13] M. A. Álvarez-Carmona, R. Aranda, A. Y. Rodríguez-González, L. Pellegrin, H. Carlos, Classifying
the mexican epidemiological semaphore colour from the covid-19 text spanish news, Journal of
Information Science 50 (2024) 568–589.
[14] M. Á. Álvarez-Carmona, Á. Díaz-Pacheco, R. Aranda, A. Y. Rodríguez-González, D. Fajardo-Delgado,
R. Guerrero-Rodríguez, L. Bustio-Martínez, Overview of rest-mex at iberlef 2022:
Recommendation system, sentiment analysis and covid semaphore prediction for mexican tourist texts,
Procesamiento del Lenguaje Natural 69 (2022) 289–299.
[15] M. Á. Álvarez-Carmona, Á. Díaz-Pacheco, R. Aranda, A. Y. Rodríguez-González, V. Muñiz-Sánchez,
A. P. López-Monroy, F. Sánchez-Vega, L. Bustio-Martínez, Overview of rest-mex at iberlef 2023:
Research on sentiment analysis task for mexican tourist texts, Procesamiento del Lenguaje Natural
71 (2023) 425–436.
[16] M. Á. Álvarez-Carmona, Á. Díaz-Pacheco, R. Aranda, A. Y. Rodríguez-González, L. Bustio-Martínez,
V. Herrera-Semenets, Overview of rest-mex at iberlef 2025: Researching sentiment evaluation in
text for mexican magical towns, volume 75, 2025.
[17] J. Á. González-Barba, L. Chiruzzo, S. M. Jiménez-Zafra, Overview of IberLEF 2025: Natural
Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the
Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the
Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS. org, 2025.
[18] V. G. Morales-Murillo, H. Gómez-Adorno, D. Pinto, I. A. Cortés-Miranda, P. Delice, Lke-iimas
team at rest-mex 2023: Sentiment analysis on mexican tourism reviews using transformer-based
domain adaptation (2023).
[19] M. Á. Álvarez-Carmona, R. Aranda, R. Guerrero-Rodríguez, A. Y. Rodríguez-González, A. P.
LópezMonroy, A combination of sentiment analysis systems for the study of online travel reviews:
Many heads are better than one, Computación y Sistemas 26 (2022) 977–987.
[20] O. G. Toledano-López, M. Á. Álvarez-Carmona, J. Madera, A. Simón-Cuevas, Y. A. López-Rodríguez,
H. González Diéz, Polarity prediction in tourism cuban reviews using transformer with estimation
of distribution algorithms, in: International Workshop on Artificial Intelligence and Pattern
Recognition, Springer, 2023, pp. 335–346.
[21] J. D. Jurado-Buch, S. Minayo-Díaz, J. Tello, K. Chaucanes, L. Salazar, M. Oquendo-Coral, M. Á.
Álvarez-Carmona, A single model based on beto to classify spanish tourist opinions through the
random instances selection, 2023.
[22] A. B. García-Gutiérrez, P. E. López-Ávila, P. A. Gallegos-Ávila, R. Aranda, M. Á. Álvarez-Carmona,</p>
      <p>Balancing of tourist opinions for sentiment analysis task., in: IberLEF@ SEPLN, 2023.
[23] J. Ortiz-Zambrano, C. Espin-Riofrio, A. Montejo-Ráez, Lexical complexity assessment of spanish
in ecuadorian public documents, Procesamiento del Lenguaje Natural 74 (2025) 291–303.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guerrero-Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>Álvarez-Carmona</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Aranda</surname>
            ,
            <given-names>A. P.</given-names>
          </string-name>
          <string-name>
            <surname>López-Monroy</surname>
          </string-name>
          ,
          <article-title>Studying online travel reviews related to tourist attractions using nlp methods: the case of guanajuato, mexico</article-title>
          ,
          <source>Current issues in tourism 26</source>
          (
          <year>2023</year>
          )
          <fpage>289</fpage>
          -
          <lpage>304</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Olmos-Martínez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>Álvarez-Carmona</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Aranda</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Díaz-Pacheco</surname>
          </string-name>
          ,
          <article-title>What does the media tell us about a destination? the cancun case, seen from the usa, canada, and mexico</article-title>
          ,
          <source>International Journal of Tourism Cities</source>
          <volume>10</volume>
          (
          <year>2024</year>
          )
          <fpage>639</fpage>
          -
          <lpage>661</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Diaz-Pacheco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>Álvarez-Carmona</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Guerrero-Rodríguez</surname>
            ,
            <given-names>L. A. C.</given-names>
          </string-name>
          <string-name>
            <surname>Chávez</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          <string-name>
            <surname>RodríguezGonzález</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          <string-name>
            <surname>Ramírez-Silva</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Aranda</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence methods to support the research of</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>