<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of King Saud University</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1080/13683500.2021.2007227</article-id>
      <title-group>
        <article-title>LPQ Team at Rest-Mex 2025: BERT and LLM Approaches in Tourism Review Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Le Phu Quy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dang Van Thin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Information Technology-VNUHCM</institution>
          ,
          <addr-line>Quarter 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vietnam National University</institution>
          ,
          <addr-line>Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>75</volume>
      <fpage>2980</fpage>
      <lpage>2988</lpage>
      <abstract>
        <p>This study addresses the Rest-Mex 2025 challenge by developing a multi-task framework for Spanish tourism review analysis, focusing on sentiment polarity (1-5 scale), destination type classification (hotel/restaurant/attraction), and Magical Town identification. We explore transformer-based models (BETO, XLM-RoBERTa), hybrid architectures (BERT embeddings with XGBoost), domain adaptation, ensemble strategies, and LLaMA-3 fine-tuned specifically for Magical Town recognition. The approach provides a scalable pipeline for enhancing destination analytics through advanced NLP techniques.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Hope classification</kwd>
        <kwd>Spanish language</kwd>
        <kwd>English language</kwd>
        <kwd>sentiment analysis</kwd>
        <kwd>fine-tuning BERT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Mexico’s Pueblos Mágicos (Magical Towns) are vibrant destinations where history, culture, and economic
development intersect, attracting millions of travelers each year. These towns are celebrated for their
rich artisan traditions, historic landmarks, and breathtaking natural scenery, making them essential to
Mexico’s tourism industry. As digital platforms like TripAdvisor and social media grow in influence,
travelers share their experiences more widely than ever, creating a wealth of reviews that reveal
diverse sentiments, destination preferences, and regional identities. However, analyzing these texts
poses challenges, as they are written in various Spanish dialects and often include irony or local
slang[
        <xref ref-type="bibr" rid="ref1">1, 2, 3, 4</xref>
        ].
      </p>
      <p>Unlike past editions [5, 6, 7], the Rest-Mex 2025 [8, 9] shared task addresses these complexities by
introducing three key objectives. First, sentiment analysis helps determine the emotional tone of a
review, using a rating scale from 1 to 5. Second, destination classification identifies whether a review
refers to a hotel, restaurant, or tourist attraction. Lastly, geolocation detection pinpoints which Pueblo
Mágico the review describes. These tasks extend beyond academic interest—they play a crucial role in
shaping sustainable tourism strategies, improving infrastructure, and preserving the unique cultural
heritage of Mexico’s 40 designated Pueblos Mágicos. By extracting meaningful insights from online
reviews, researchers and policymakers can enhance visitor experiences while ensuring these towns
retain their distinctive charm for future generations.</p>
      <p>In this work, we present an evaluation of modeling strategies for tourism analytics, benchmarking
several approaches. First, we optimize baseline BERT models—including BETO (Spanish BERT), Roberto
(RoBERTa Spanish), and XLM-RoBERTa—to serve as our fundamental framework. Additionally, we
explore embedding-XGBoost hybrids, where sentence embeddings derived from these BERT variants
are fed into XGBoost classifiers fine-tuned with focal loss to better capture minority classes. To further
enhance destination classification, we introduce a domain-adapted BETO model, trained on 15GB of
Mexican tourism texts to efectively capture region-specific expressions. We also employ BERT ensemble
models by fine-tuning BETO, Roberto, and XLM-RoBERTa independently and aggregating their outputs
via both soft and hard voting, supplemented by metadata features such as regional keywords. Finally,
we fine-tune LLaMA-3 using LoRA, further enriching our approach.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Tourism review analytics has evolved significantly over the years, moving from early rule-based[ 10] and
traditional machine learning methods to sophisticated deep learning approaches. Early work in this field
focused on manual feature engineering for sentiment extraction and destination classification, but these
methods struggled with the variability and cultural nuances present in user-generated content. The
advent of transformer-based models, especially Spanish-specific variants like BETO[ 11] and Roberto
has greatly enhanced our ability to capture complex expressions and regional idioms in tourism reviews.</p>
      <p>More recent studies have leveraged hybrid architectures that combine the semantic strength of
transformer models with the robustness of classical classifiers. In particular, embedding-XGBoost
hybrids—where sentence embeddings from BETO, Roberto, and XLM-RoBERTa are fed into XGBoost
classifiers fine-tuned[ 12] with focal loss have been successful in addressing challenges related to class
imbalance and minority class emphasis[13]. Additionally, domain adaptation[14] via pre-training on
extensive tourism-specific corpora has proven efective for capturing regional expressions, enhancing
the performance of models in destination classification tasks.</p>
      <p>Ensemble methods[15] and large language models (LLMs) have further pushed the boundaries in
tourism review analysis. Independent fine-tuning of various BERT variants followed by ensemble
aggregation has demonstrated notable improvements in tasks like Magical Town identification. Moreover,
ifne-tuning LLaMA-3 using parameter-eficient methods like LoRA on culturally annotated datasets
has enabled more precise disambiguation of similar regional references. Our work builds on these
independent modeling strategies, ofering a modular approach that isolates and leverages the unique
strengths of each method to achieve state-of-the-art performance in tourism NLP.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Transformer-Based Classification Models</title>
        <p>The Transformer architecture has become a cornerstone in Natural Language Processing due to its
efective use of self-attention mechanisms. These mechanisms allow the model to capture the contextual
relationships between tokens, while positional encoding preserves the sequential order. Multi-head
attention further enables the parallel extraction of distinct patterns and representations from the input
text. Such principles form the theoretical basis of our classification approach.</p>
        <p>Implemented Model Variants: In our experiments, we employ four Transformer-based models:
• BETO: A Spanish language model adapted from BERT, which utilizes dynamic gradient
accumulation to address instability in gradient updates during training.
• Roberto: A variant similar in foundation to BETO, optimized for our specific domain
requirements.
• XLM-RoBERTa: A robust multilingual model fine-tuned using conventional truncation settings
to handle regional vocabulary and context.
• Domain-Adapted BETO: This model undergoes an additional phase of continual pre-training
on 18 million tokens from the hospitality domain, enhancing its understanding of tourism-related
texts.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Hybrid Embedding-XGBoost Framework</title>
        <p>In our approach, the process is divided into two main phases: extracting information from text and then
using that information to make predictions. First, a transformer model transforms the raw text into a
vector, which serves as a semantic summary that eficiently captures the context and meaning of the
original content. This vector eliminates the need for continuous, heavy computation in the subsequent
stages. For sentiment analysis, the extracted embedding is passed to an XGBoost model designed to
predict ratings while naturally respecting their ordinal relationship. This means that the model is set up
so that a higher rating is always treated as more positive than a lower one, ensuring that the predictions
follow the expected order and are consistent with the natural ranking of sentiments.</p>
        <p>For town or geographical classification, a similar XGBoost model is employed, but it is fine-tuned to
focus on features that are most relevant to location information. By analyzing which parts of the text
provide the strongest geographic signals, the model filters out less useful features, which simplifies the
decision process and improves overall eficiency. This selective approach ensures that the classification
is both fast and accurate. To further improve the performance of the system, we apply Bayesian
hyperparameter optimization. This method carefully adjusts key parameters such as model complexity
and regularization factors, helping to balance the trade-ofs between accuracy and speed while also
addressing potential class imbalances in the data. Overall, the hybrid framework not only separates the
heavy lifting of context extraction from the prediction tasks but also achieves faster inference times
compared to using a complete transformer model for every operation.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. LLaMA-3 Instruction Tuning</title>
        <p>In this approach, we adjust the LLaMA-3-8B model to enhance its understanding of geocultural contexts
in tourism. We use a method called Low-Rank Adaptation (LoRA), which adds a small number of
trainable components to specific layers of the model while keeping most of the original parameters
unchanged. This allows the model to learn tourism-focused information without losing its general
language abilities.</p>
        <p>To incorporate tourism domain expertise, trainable adapters are inserted into the model’s query
and value projection layers. This selective tuning enables the model to efectively absorb and use
domain knowledge while still relying on its robust pre-trained capabilities. A well-structured, three-part
prompt strategy guides the model’s response. To ensure the geographical information is accurate, two
validation methods are applied. First, the system employs pattern matching to filter out any inputs that
do not meet the expected format for town names. Second, a character-level similarity check is used as
a fallback to correct minor errors in the town names by comparing them against an oficial list. This
dual-check approach minimizes errors in geographic details and ensures the output remains precise.</p>
        <p>Overall, this instruction tuning framework adapts the LLaMA-3 model to be both knowledgeable
and reliable within the tourism domain. By combining targeted parameter tuning with a structured
prompting and validation system, the model is capable of generating detailed, accurate responses while
maintaining eficiency—a quality that is essential for deployment in resource-constrained environments.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Ensemble Strategies</title>
        <p>Our ensemble approach is centered on two key voting techniques: soft voting and hard voting, each
contributing uniquely to the final decision-making process.</p>
        <p>In soft voting, the models provide probabilistic estimates that reflect the confidence of each prediction.
These probabilities are combined in a way that gives higher influence to models with stronger
performance. This method allows the ensemble to capture subtle distinctions in the data, efectively leveraging
the context-aware abilities of transformer-based models when the diferences between classes are not
pronounced.</p>
        <p>In contrast, hard voting involves each model casting a clear, discrete vote for its predicted outcome.
The final prediction is determined by a majority rule—if the votes are tied, a simple tie-breaking
procedure selects the outcome. This approach provides decisiveness and transparency, ensuring that
the ensemble can deliver a clear prediction even when the individual model opinions diverge.</p>
        <p>Category Polarity</p>
        <p>Type
1 5,441
2 5,496
3 15,519
4 45,034
5 136,561
Hotel 51,410
Restaurant 86,720
Attractive 69,921</p>
        <p>Total 208,051 208,051</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <p>Our experiments employ the competition dataset from Rest-Mex 2025, containing tourist reviews of
Mexico’s special "magical towns". The data includes user opinions with sentiment ratings and venue
categories, showcasing real tourism feedback across diverse Mexican locations:</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Experiment Setting</title>
        <p>In this study, we employ the Rest-Mex 2025 dataset, a comprehensive corpus comprising 10,000
Spanishlanguage travel reviews collected from Mexican tourism destinations for the Rest-Mex 2025 challenge.
Each review is systematically annotated for three distinct classification tasks: polarity (rated 1–5),
destination type (restaurant, hotel, or attraction), and magical town (one of 40 distinct towns). The
corpus exhibits moderate class imbalance for polarity—skewed toward positive ratings (4 and 5)—and
for destination type, while the magical-town labels demonstrate high imbalance due to the rarity of
some towns. Pre-processing removes entries with missing values and concatenates review titles and
bodies using a [SEP] token to form a unified input sequence. Labels are numerically encoded (polarity:
0–4, destination type: 0–2, magical town: 0–39), and the data are partitioned through stratified sampling
into 80% training, 10% validation, and 10% test splits to preserve class distributions.</p>
        <p>Three transformer models are fine-tuned for each task: BETO
(dccuchile/bert-base-spanish-wwmcased), RoBERTa (bertin-project/ bertin-roberta-base-spanish), and XLM-RoBERTa (xlm-roberta-large).
For polarity and destination type classification, models predict 5 and 3 classes respectively, with a
maximum sequence length of 128 tokens; the magical-town task utilizes 256 tokens and predicts 40
classes. Fine-tuning is conducted for three epochs with a learning rate of 2e-5, the AdamW optimizer,
a batch size of 16, and gradient accumulation (4 steps for BETO and RoBERTa, 2 for XLM-RoBERTa).
Training employs mixed precision (FP16) for enhanced memory eficiency. Additionally, a
domainadapted BETO is created by pre-training on the corpus with masked-language modeling for two
epochs before task-specific fine-tuning. To explore complementary methodologies, final-layer [CLS]
embeddings from BETO, RoBERTa, and XLM-RoBERTa are fed to XGBoost classifiers. XGBoost is
configured with depth 6, learning rate 0.1, and 1,000 boosting rounds, utilizing early stopping on
validation loss and class weights, particularly for the highly imbalanced magical-town task. Ensemble
strategies include soft voting (probability averaging with weights tuned for macro-F1 on the validation
set) and hard voting (equal-weight majority vote). A LLaMA-3.2-3B-Instruct model is additionally
ifne-tuned via LoRA (  = 16,  = 32) in a multi-task setting to generate structured outputs for all three
labels, though this remains exploratory due to output-parsing challenges. Transformer models provide
robust baselines for Spanish NLP, while XGBoost and ensemble methods leverage complementary
inductive biases to ofset individual weaknesses.</p>
        <p>Evaluation relies on accuracy, macro-F1, and weighted F1 metrics to reflect performance across
imbalanced classes. These experimental settings balance computational feasibility with rigorous
analyF1 Score
BETO
RoBERTa
XLM-RoBERTa
BETO + XGBoost
RoBERTa + XGBoost
DA-BETO
Ensemble (Soft)
Ensemble (Hard)
LLM-FT
XLM-RoBERTa + XGBoost
Method
BETO
RoBERTa
XLM-RoBERTa
BETO + XGBoost
RoBERTa + XGBoost
XLM-RoBERTa + XGBoost
DA-BETO
Ensemble (Hard)
LLM-FT
Ensemble (Soft)
Method
BETO
RoBERTa
XLM-RoBERTa
BETO + XGBoost
RoBERTa + XGBoost
XLM-RoBERTa + XGBoost
DA-BETO
Ensemble (Soft)
Ensemble (Hard)
LLM-FT</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Main Results</title>
        <p>Our experimental evaluation reveals distinct performance patterns across the three classification tasks.
For polarity classification, XLM-RoBERTa combined with XGBoost achieved the best overall
performance, demonstrating the efectiveness of hybrid approaches. The type classification task showed
consistently high performance across all methods, with the soft-ensemble approach slightly
outperforming individual models. Town classification exhibited significant variation, with the standalone RoBERTa
model surprisingly outperforming more complex ensemble approaches, highlighting that optimal model
selection is highly task-dependent. Overall, these findings stress the importance of matching model
complexity to task characteristics rather than adopting a one-size-fits-all solution.</p>
        <sec id="sec-4-3-1">
          <title>Macro F1(Polarity)</title>
        </sec>
        <sec id="sec-4-3-2">
          <title>Macro F1(Type)</title>
        </sec>
        <sec id="sec-4-3-3">
          <title>Macro F1(Town)</title>
          <p>1st
2nd
3rd
HM</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Error Analysis and Discussion</title>
      <p>The most prominent weakness in our approach is observed in the Town Classification task, where our
model achieved a macro-F1 score of 0.4690, significantly lower than the top-performing system’s score
of 0.6919. This performance gap stems primarily from our use of metadata (Region) during training to
distinguish between towns. While this metadata enhanced the model’s ability to diferentiate towns in
the training data, it was not available in the test set. As a result, the model failed to generalize efectively,
particularly for less frequent towns, leading to poor classification performance.</p>
      <p>This issue highlights the critical importance of maintaining data consistency between training
and testing phases. The absence of the Region metadata during inference created a mismatch that
undermined the model’s predictive capability. To address this, potential solutions include eliminating
reliance on metadata entirely or integrating external knowledge sources, such as geographical or
cultural databases, to provide contextual cues independent of the training data. Additionally, improving
data balance—perhaps through oversampling or synthetic data generation—and enhancing the model’s
ability to handle linguistic variations, such as slang or dialects, could further increase its reliability and
robustness for future applications.</p>
      <sec id="sec-5-1">
        <title>5.1. Conclusion</title>
        <p>This study examines transformer-based and hybrid approaches for multi-task tourism review analysis,
focusing on the classification of magical towns in the Rest-Mex 2025 dataset. While the overall method
shows promise, the town classification task achieved a macro-F1 score of 0.4690—significantly lower
than the leading 0.6919—primarily due to using Region metadata during training that was not available
at testing, resulting in poor generalizability. These findings underscore the importance of consistent
feature availability across training and testing, suggesting that future models should avoid reliance
on such metadata by incorporating external knowledge, advanced data augmentation, and improved
handling of linguistic diversity like regional slang to ensure robust real-world performance..</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This research was supported by The VNUHCM-University of Information Technology’s Scientific
Research Support Fund.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>We declare that the present manuscript has been written entirely by the authors and that no generative
artificial intelligence tools were used in its preparation, drafting, or editing.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guerrero-Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Álvarez</surname>
          </string-name>
          <string-name>
            <surname>Carmona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>López-Monroy</surname>
          </string-name>
          ,
          <article-title>Studying online travel reviews related to tourist attractions using nlp methods: the case of guanajuato, mexico</article-title>
          ,
          <source>Current Issues in Tourism</source>
          <volume>26</volume>
          (
          <year>2023</year>
          )
          <fpage>289</fpage>
          -
          <lpage>304</lpage>
          . URL:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>