1. Introduction

Journal of King Saud University

10.1016/j

ELTSA-CUJAE at Rest-Mex 2025: A Novel Ensemble Learning Approach with Transformers for Mining Mexican Tourist Reviews

Miguel Angel Rivero-Tapia

Alfredo Simon-Cuevas

Ray Maestre Peña

0 Technological University of Havana “José Antonio Echeverría” , CUJAE , Cuba

2025

75 0000 0002

This paper describes the results of the team's participation in the Rest-Mex 2025 competition, focused on the analysis of tourist reviews related to Mexico's Magical Towns. The proposed solution addresses three main tasks: Sentiment polarity classification, establishment type identification, and detection of the associated Magical Town for each review. To this end, an ensemble of Transformer-based models was implemented, combined using the Zimmerman-Zysno operator. The proposed approach proves to be highly efective, outperforming individual models due to its ability to capture diferent nuances of language in the reviews. The results suggest that this strategy can be a valuable tool for opinion analysis in the tourism sector, enabling businesses and governments to monitor visitor satisfaction, optimize marketing strategies, and improve destination management. This work not only contributes to advances in natural language processing for domain-specific tasks but also ofers a reproducible framework for similar applications in other contexts.

eol>Sentiment Analysis Tourist reviews Transformer-based models Zimmerman-Zysno operator Text classification

1. Introduction

Sentiment analysis is the field dedicated to the computational study of opinions, sentiments, evaluations, attitudes, and emotions expressed by individuals toward entities such as products, services, organizations, people, issues, events, and their characteristics or attributes [ 1 ]. Opinions refer to a person’s viewpoint on a given topic, while sentiment or polarity refers to the emotion experienced by that person regarding the topic, usually identified as positive or negative.

The computational analysis of tourist reviews has become an essential tool for strategic decisionmaking in the tourism sector [ 2, 3, 4, 5 ]. In this context, the present work introduces an innovative solution developed for the Rest-Mex 2025 competition [6, 7], specifically focused on the analysis of opinions related to Mexico’s Magical Towns. Unlike past shared task [8, 9, 10], this edition of the challenge introduces significant complexities by requiring not only the classic sentiment polarity classification and establishment type identification but also the accurate recognition of the specific Magical Town mentioned in each review—a task particularly challenging due to the cultural and linguistic traits of these destinations.

Transformer models have revolutionized the computational approach to text classification in domainspecific scenarios [ 11, 12]. Architectures such as BERT [13], RoBERTa [14], and their specialized variants have demonstrated exceptional capabilities in capturing both emotional nuances and the linguistic particularities of domains where formal and informal language, regionalisms, and implicit cultural references coexist. These models, pre-trained on large corpora and later fine-tuned for specific tasks, significantly outperform traditional approaches by learning contextualized representations that allow them to accurately interpret complex expressions such as irony, contrast, or nuanced evaluations.

The strategic combination of multiple Transformer architectures has emerged as an efective approach, overcoming the limitations of individual models by capturing diferent levels of meaning and context in reviews. This paradigm has established new benchmarks in classification accuracy, especially in languages like Spanish, where specialized resources are more limited than in English.

Unlike traditional approaches that use lexical models or monolithic architectures, the proposed solution is based on a Transformer ensemble specifically adapted to capture the subtleties of Spanishlanguage tourism content. The implemented solution strategically combines several Transformer models using the Zimmerman-Zysno operator [15], a fuzzy fusion method developed by German researchers Bernd Zimmerman and Peter Zysno. This operator is notable for its ability to aggregate probabilistic value judgments in a non-additive manner, balancing between strict conjunctions (AND) and permissive disjunctions (OR) through a compensation parameter ( ). Unlike conventional averaging methods, the Zimmerman-Zysno operator preserves each model’s relative uncertainty and dynamically adjusts their influence in the final prediction, which is critical for domains with subjective language such as tourism [15], [16].

This article is structured with a section presenting the unique corpus of over 200,000 TripAdvisor reviews used in the competition, followed by a detailed description of the proposed Transformer ensemble system. Subsequently, experimental results validating the approach’s efectiveness are analyzed. The paper concludes with a discussion of the broader implications of this research and future directions for automated analysis of tourism-related content.

2. Task Description and Dataset

The Rest-Mex 2025 dataset represents a significant advancement in the analysis of tourist perceptions, focusing exclusively on Mexico’s Magical Towns. This corpus contains 208,051 reviews, equivalent to 70% of the total available data, while the remaining 30% is reserved for evaluation. Each entry includes six key fields: the review title, the full review text, sentiment polarity (on a scale from 1 to 5), the evaluated Magical Town, the corresponding region, and the type of establishment (hotel, restaurant, or tourist attraction).

The polarity distribution shows a characteristic pattern of this type of analysis, with a marked predominance of positive reviews. This distribution is presented in Table 2.

As shown in Table 2, the most positive reviews (class 5) account for 65.64% of the dataset, while the negative classes (1 and 2) together represent only 10,937 instances (5.26%). This natural imbalance in tourist review data poses significant challenges for predictive modeling.

On the other hand, the distribution of establishment types is presented in Table 3: Restaurants appear as the most reviewed category (41.68%), highlighting the importance of the gastronomic experience in Mexico’s Magical Towns. Tourist attractions and hotels follow with 69,921 and 51,410 reviews respectively.

The geographic distribution reveals a notable concentration in coastal and colonial destinations, as seen in Table 4:

This strategic selection of 40 towns allows the study of specific regional patterns, ranging from the Yucatán Peninsula to the central highlands, which is particularly valuable for Subtask 3 that requires comparative analysis across regions. Destinations such as Valladolid (11,637), Bacalar (10,822), and Palenque (9,512) represents the southeast, while Valle de Bravo (5,959) and Teotihuacán (5,810) cover the country’s central area.

One of the key innovations of this edition is the precise geographic labeling, which enables detailed analysis by locality and region. The corpus is encoded in UTF-8 to preserve the richness of the Spanish language.

3. Evaluation Metrics

The participating systems are evaluated using standard metrics such as Precision, Recall, and F1-score, applied to each proposed subtask. For sentiment polarity classification, the per-class F-measure is used, where () represents the F1-score for class obtained by system , with = {1, 2, 3, 4, 5} as the set of possible categories.

The popularity resolution (Res ) is computed as the average of F-measures across all classes, using the following formula:

Res () = ∑︀|=|1 () ||

For thematic resolution (Res ), performance is assessed in three specific categories: Attraction ( ), Hotel ( ), and Restaurant (), by averaging their respective F-measures: (1) Res () = ∑︀|M=T1L| MTL ()

4. System Description

Sentiment analysis applied to the tourism domain can be approached as a text classification task. This technique is part of Natural Language Processing (NLP) and aims to determine the category to which a given text fragment belongs [17].

The most common methodology for solving text classification problems is based on supervised learning. In this paradigm, a structured dataset is used, composed of pairs of elements: on the one hand, the inputs (texts), and on the other, the outputs (corresponding labels or categories). The goal is to train a model capable of automatically predicting the category of a new text, even if it was not part of the training data. It is important to note that the possible classifications are limited to the labels present in the initial dataset. (1, 2, . . . , ) = ︃( ∏︁ =1 )︃1− (︃ · ∏︁(1 − ) =1 )︃

4.1. Pre-processing

Sentiment analysis requires pre-processing to prepare the text data and facilitate the extraction of relevant information from reviews. While pre-processing tasks vary depending on the objective, certain steps are crucial and indispensable for training Transformer models, whereas others can be omitted.

Transformer models have been trained on millions of texts that retain information such as capitalization and punctuation. Therefore, it is unnecessary to convert text to lowercase or remove punctuation marks, as the model has learned to leverage these features to generate high-quality representations. Similarly, removing stop words (such as articles and prepositions) is also unnecessary, since they are already included in the training corpus and are properly handled by the tokenizer [ 18 ].

The first step in pre-processing is removing special characters, as they are not understood by the models. Then, for the model training phase, the text was converted into a specific format interpretable by the language model. (3) (5)

4.2. Model Training and Ensemble

Text classification is performed using an ensemble of Transformer models, where predictions are optimally combined via the Zimmermann-Zysno operator [15]. This operator is a nonlinear fuzzy fusion method that aggregates probabilistic judgments from multiple models, solving the problem of weighting each model’s relative influence on the final decision. Mathematically, the operator is defined for each class as:

where ∈ [ 0, 1 ] represents the probability assigned by the -th model to class , and ∈ [ 0, 1 ] is the compensation parameter that regulates the balance between conjunction (t-norm) and disjunction (t-conorm) operations. The term is computed adaptively as: =

(1, . . . , ) (1, . . . , ) + (1 − 1, . . . , 1 − ) ,

with (1, . . . , ) = ∏︁ .

This formulation allows the operator to act as an adaptive hybrid. When → 0, conjunction dominates (equivalent to the product of probabilities), requiring high consensus among models to assign confidence to a prediction. When → 1, disjunction prevails (similar to a compensated sum), where partial disagreement between models does not invalidate the prediction. For intermediate values ( ≈ 0.5), the operator balances both extremes through a weighted geometric mean, ideal for balanced distributions.

In practice, the ensemble operates on the probabilistic outputs of each Transformer model for the possible classes (e.g., polarities 1-5 or establishment types). Each model generates a vector = [1, 2, . . . , ], where ∈ [ 0, 1 ] is the probability assigned to class and ∑︀ =1 = 1.

The aggregation process consists of three stages: 1. Per-class computation: For each class , the Zimmermann-Zysno operator is applied to the probabilities 1 , 2 , . . . , from the models, yielding an aggregated probability vector = [1, 2, . . . , ]. 2. Normalization: To ensure ∑︀ =1 = 1, softmax normalization is applied: (norm) =

∑︀ =1

. (norm). 3. Final prediction: The selected class is arg max

The interpretation of is crucial in the applied context. For instance, a low value ( = 0.2) favors fuzzy intersection (AND), useful when individual model accuracy is high but their errors are independent, reducing false positives at the cost of discarding minority predictions. An intermediate value ( = 0.5) implements an optimal trade-of between sensitivity and specificity, recommended for uniformly distributed data. A high value ( = 0.8) prioritizes fuzzy union (OR), capturing weak but consistent signals from at least one model, which is relevant in tasks with high semantic noise, such as polarity classification in tourism reviews, where models may diverge in subjective interpretations but provide complementary information.

Consider three models (1, 2, 3) and two classes (A and B) with = 0.5, where the original outputs are 1 = [0.7, 0.3], 2 = [0.6, 0.4] and 3 = [0.8, 0.2]. The aggregation for class A yields: = (0.7 · 0.6 · 0.8)0.5 · (1 − (1 − 0.7)(1 − 0.6)(1 − 0.8))0.5

= (0.336)0.5 · (0.976)0.5 ≈ 0.58, while for class B we obtain ≈ 0.13. After normalization, the final vector (norm) ≈ [0.61, 0.39] predicts class A. This example illustrates how controls the weighting between agreements and disagreements. This approach explains why high values (e.g., = 0.8) may be optimal for minority classes or scenarios with semantic noise, where partially consistent signals should be preserved. The solution workflow is shown in Figure 1. (6) (7)

4.3. Training and Evaluation

To train the models, a representative subset of the available data was used: 15% of the opinions for the sentiment polarity classification (Task 1) and establishment type identification (Task 2), and 40% for the detection of the specific “Pueblo Mágico” (Task 3). This decision was primarily due to computational constraints, as training Transformer models on full datasets requires significant computational capacity, both in hardware and processing time.

The training and ensemble process consists of three fundamental stages. In the first stage, pre-trained Spanish Transformer models, including BETO [13], ALBERT [ 18 ], BERTIN [14], and ELECTRA [ 19 ], were selected and fine-tuned using the aforementioned data subsets. Each model was individually adapted to the three tasks. In the second stage, probabilistic predictions were generated for the test reviews by each model. Finally, in the third stage, these predictions were combined using the Zimmerman-Zysno operator, optimizing the parameter to balance precision and robustness.

4.4. Evaluation and Discussion

The experimental evaluation of the solution was carried out on the set of Spanish-language tourist reviews provided by the Rest-Mex 2025 competition. Each review contains a unique ID, a title, and the opinion text. It is worth noting that this dataset is one of the few available in Spanish for evaluating this type of solution. The results were computed using the equations described in the previous section.

The evaluation process considers four pre-trained Transformer models: BETO [13], ALBERT [ 18 ], BERTIN [14], and ELECTRA [ 19 ]. A fine-tuning process was applied to each model. In addition, each model was trained on the dataset reviews for 2 epochs using a batch size of 16. The choice of this batch size is based on several factors: large sizes may cause fluctuations in gradient updates; while they accelerate convergence, they can also lead to overfitting, harming the model’s generalization.

Table 5 shows the hyperparameter configuration used in the experiments.

To evaluate the performance of the Zimmerman–Zysno operator, experiments were conducted using diferent values of the parameter (0.2, 0.5, and 0.8), which allows adjusting the balance between strict conjunctions and permissive disjunctions in the prediction fusion process. These values were selected to explore a representative range, spanning from a more conservative approach ( = 0.2) to a more flexible one ( = 0.8). Additionally, an ensemble based on average voting across the four Transformer models (BETO, ALBERT, BERTIN, and ELECTRA) was implemented as a baseline, in order to contrast the proposed method’s performance with traditional model combination techniques. The detailed results of this comparison are presented in Table 6, thus enabling a quantitative evaluation of the Zimmerman–Zysno approach in the context of tourism sentiment analysis.

The results demonstrate that the Zimmerman–Zysno operator with = 0.8 achieves the best overall performance across most key metrics, particularly in polarity classification and the identification of Mexican Magical Towns. It outperforms both the classic average voting ensemble and other configurations of the operator. Although average voting showed a slight advantage in the task of establishment type classification, the versatility of the Zimmerman–Zysno approach with = 0.8 is reflected in its robustness to handle data imbalance and capture complex linguistic nuances. In Table 7, we compare the performance of our best solution against the top result reported in the Rest-Mex 2025 competition.

Although our solution shows competitive performance in the classification of establishment types, there is a significant gap in the polarity and geographical location tasks, especially in the identification of Pueblos Mágicos. These diferences may be attributed to limitations in the size of the data subset used for model training. Table 8 presents a comparison between our best-performing solution and the average result of the Rest-Mex 2025 competition. This comparison provides context for assessing our approach in relation to other participating systems, highlighting its competitive advantages.

The results show that our solution consistently outperforms the average of the competition across all evaluated metrics, with particularly notable improvements in the classification of establishment types and the overall Track Score. This superior performance validates the efectiveness of the ZimmermanZysno operator for integrating multiple Transformer models, especially in tasks where tourism-related language presents high variability. The positive diferences in polarity and geographic location reinforce that our approach achieves a balance between generalization and adaptation to specific domains, even under limited computational resources.

5. Conclusions

This work explored the potential of Transformer models combined through the Zimmerman–Zysno operator for sentiment analysis in tourist reviews related to Mexico’s Magical Towns. The proposed solution proved efective, outperforming the average performance of the Rest-Mex 2025 competition across all evaluated metrics—particularly in the classification of establishment types and in the overall Track Score.

Although a gap remains compared to the top-performing result of the competition—especially in the identification of Pueblos Mágicos—the Zimmerman–Zysno approach with = 0.8 stood out for its ability to handle data imbalance and capture complex linguistic nuances. This validates its applicability in highly subjective domains such as tourism.

This work not only contributes to the advancement of Natural Language Processing for Spanish in specific contexts, but also ofers a reproducible framework for future opinion analysis applications.

6. Acknowledgments

This work has been partially supported by the National Program of Science and Technology PN223LH004: Automatics, Robotics, Artificial Intelligence, of the Ministry of Science, Technology and Environment, under grant project PN223LH004-038.

Declaration on Generative AI

We declare that the present manuscript has been written entirely by the authors and that no generative artificial intelligence tools were used in its preparation, drafting, or editing.

Online Resources

The results and oficial rankings of the shared task can be accessed through the following link: • Rest-Mex 2025 Results

[1]

Liu , L. Zhang, A survey of opinion mining and sentiment analysis , in: Mining text data , Springer, 2012 , pp. 415 - 463 .

[2]

B. A.

Sparks ,

Browning , The impact of online reviews on hotel booking intentions and perception of trust , Tourism Management 32 ( 2011 ) 1310 - 1323 . of text classification system using state-of-the-art nlp models , Computational Intelligence and Neuroscience 2022 ( 2022 ) 1883698 .

[18]

Sun ,

Qiu ,

Xu ,

Huang , How to fine-tune bert for text classification? , in: China National Conference on Chinese Computational Linguistics , Springer, 2019 , pp. 194 - 206 .

[19]

Wang ,

Hu ,

Zhang , Sentiment analysis of commodity reviews based on albert-lstm , Journal of Physics: Conference Series 1651 ( 2020 ) 012022 .