=Paper=
{{Paper
|id=Vol-3858/paper2
|storemode=property
|title=Prediction Model of National Visitors to the Moche Route in Peru based on Time Series and Neural Networks
|pdfUrl=https://ceur-ws.org/Vol-3858/paper2.pdf
|volume=Vol-3858
|authors=Oscar Serquén,Roger Alarcón,Jessie Bravo,Carlos Valdivia,Janet Aquino
|dblpUrl=https://dblp.org/rec/conf/ithgc/SerquenAJVL23
}}
==Prediction Model of National Visitors to the Moche Route in Peru based on Time Series and Neural Networks==
Prediction Model of National Visitors to the Moche Route in Peru based on Time Series and Neural Networks Oscar Serquén1, Roger Alarcón2, Jessie Bravo3, Carlos Valdivia4, Janet Aquino5 1 2 3 4 5Digital Transformation Research Group, Pedro Ruiz Gallo National University, Juan XXIII 395, Lambayeque, Perú Abstract The pandemic affected all economic sectors in the world, one of the most critical being the tourism sector, which is why the institutions involved need to manage urgent actions for its reactivation; innovation and digital transformation using disruptive technologies are important. The objective of the study is to use machine learning based on neural networks and time series to predict the influx of national visitors on the Moche Route of Peru, becoming a contribution to the use of artificial intelligence in favor of the social and economic development of the region. A methodology composed of 4 stages was developed: (1) data collection, (2) model analysis, (3) model development, and (4) model evaluation. Open access data was used during the period from January 2011 to December 2019, applying a recurring predictive process to determine the data in the pandemic years, using the algorithm based on time series and neural networks, finally, evaluated its operation and the proximity of the prediction to the real data. In conclusion, the model presents optimal results for all the tourist attractions of the Moche Route, demonstrating its prediction effectiveness, allowing the entities in charge of the tourism sector to have a tool for planning tourist itineraries and the necessary resources to cope to future demand. Keywords Time series, neural networks, prediction, machine learning, tourism 1 1. Introduction The COVID-19 pandemic had an unprecedented impact on the tourism industry worldwide, paralyzing travel and considerably reducing tourist arrivals. The reactivation of tourism today is a very important issue for countries and their authorities who must adequately manage tourism demand and volume. This reactivation is using information technology, changing traditional tourism and allowing the information-based tourism industry to develop, generating innovation [1], likewise, [2] highlights the growing use of new technologies and artificial intelligence (AI) techniques in the tourism sector that allows improvement in speed, creativity and knowledge of the service, in the search of improving tourist satisfaction. In Peru, tourism activity by 2023 will contribute 2.5% to the National GDP [3], projecting the arrival of 2.2 million foreign visitors and 34.3 million trips by domestic tourists. The income generated by domestic tourism from 2012 to 2022 still does not achieve the income achieved before the pandemic [4]. One of the most visited tourist routes is the Moche Route that integrates the regions of La Libertad and Lambayeque, presenting archaeological, natural, cultural and landscape attractions [5]. In these regions of the northern coast of Peru, some of the most important pre-Columbian civilizations: Moche, Chimú and Sicán, being promoted by the National Strategic Tourism Plan 2025 [6]. Similarly, the Peruvian state, in July 2023, enacted Law No. 31814 which promotes the use of AI within the framework of the national digital transformation process with the purpose of promoting the economic and social development of the country [7]. An important advance for AI ITHGC 2023: IV International Tourism, Hospitality & Gastronomy Congress, October 25–27, 2023, Lima, Peru aserquen@unprg.edu.pe (O. Serquén); ralarcong@unprg.edu.pe (R. Alarcón); jbravo@unprg.edu.pe (J. Bravo); cvaldivias@unprg.edu.pe (C. Valdivia); jaquino@unprg.edu.pe (J. Aquino) 0000-0001-9968-493X (O. Serquén); 0000-0002-2895-9120 (R. Alarcón); 0000-0001-6841-2536 (J. Bravo); 0000-0002-2895-9120 (C. Valdivia); 0000-0003-0536-3882 (J. Aquino) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) CEUR ht tp: // ceur -ws .or g Works hop I SSN1613- 0073 Pr oceedi ngs CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings is open data, where, according to the United Nations Electronic Government Report 2022, Peru is located in first place in Latin America [8]. 1.1. Tourism and digital transformation In the era of digital transformation, AI has revolutionized numerous sectors, so [9, 10] point out that forecasting future trends is of utmost importance for managers and decision makers in different sectors and specifically tourism has been no exception. The complex characteristics of tourist arrival series, such as seasonality, randomness and non-linearity, make tourist arrival forecasting still a difficult task [11]. 1.2. Predictive models in tourism Predictive models are components of AI that serve as an analytical tool that uses historical data to predict future values. According to [12] there are several models for the prediction of tourists or visitors, among them: Random Forest (RF), neural networks (RN) and support vector machines (SVM), however, there is no single technique or algorithm that delivers consistent forecasts, but the results depend on the model and technique applied, the number of observations and the characteristics of the dataset. In research by [13], he notes that RNs have been shown to be particularly accurate in univariate time series forecasting environments. In the study [14] it is analyzed and managed to predict tourism demand in Vietnam using a multilayer perceptron artificial RN prediction model. In the case of data during the COVID-19 pandemic, dummy variables were used to model the impact of the pandemic and the number of tourists expected. Other techniques are also used, as stated [15], through the analysis of historical tourism volume and convolutional neural network models, which also allowed estimating tourism demand during the COVID-19 pandemic. Taking into account the above, the main objective of the research is to use machine learning based on neural networks and time series to predict the influx of national visitors in the Moche Region of Peru, constituting a contribution to the use of artificial intelligence in favor of the social and economic development of the country. 2. Materials and methods Figure 1 describes the methodology applied in this research, which consists of four main stages: (1) data collection, (2) model analysis, (3) model development, and (4) model evaluation. The Python version 3 programming language and Google Colab storage were used for processing. Figure 1: Applied Research Methodology 2.1. Data collection The data used in the proposed model were divided into two dimensions, as described in Table 1, during the period between January 2011 and December 2019. The years from 2020 to 2022, the period of the COVID-19 pandemic, no data are recorded, so their treatment is explained in the development of the proposed model. Table 1 Variables used in the prediction of the model. Dimension Variable Description Type Temporal DATE Year and Month of visit Numerical Arrival of National TOTAL_NATIONAL Number of domestic Numerical Tourists tourists visiting tourist attractions on a monthly basis The time dimension represents the year and month within the evaluated period and the dimension Arrival of National Tourists are data extracted from the Internet [16], from the Open Data Platform of the Government of Peru, which expresses the number of national tourists who visited the tourist attractions of the Moche Route in Peru on a monthly basis. The tourist attractions of the Moche Route are located in the departments of La Libertad and Lambayeque, in the northern part of Peru, which are detailed in Table 1. Table 2 Tourist attractions on the Moche Route. Location Tourist Attraction Type La Libertad Huaca Arco Iris Archaeological Complex Archaeological site La Libertad Huaca del Sol y la Luna Archaeological Complex Archaeological site La Libertad Huaca el Brujo Archaeological Complex Archaeological site La Libertad Chan Chan Site Museum Museum Lambayeque Brüning National Archaeological Museum Lambayeque Huaca Chotuna Site Museum - Chornancap Museum Lambayeque Huaca Rajada Site Museum – Sipan Museum Lambayeque Tucume Site Museum Museum Lambayeque Sican National Museum Museum Lambayeque Royal Tombs of Sipan Museum Museum Lambayeque Pomac Forest Historical Sanctuary Museum 2.2. Model Analysis Preprocessing was carried out, for which the data was standardized by applying a scale with a range between -1 and 1, which allows the data to be scaled, without losing the originals, then the transformation of the same from rows to columns of 36 elements and 72 rows, which will serve as input data to the model. The choice of 36 input elements provided better results when the model was applied, with respect to larger or smaller quantities. The pre-processed data maintained the time series structure, for which they were transformed into data required for the application of the selected algorithm. The prediction algorithm used is time-series neural networks, which is a method of artificial intelligence that teaches computers to process data in a similar way to the human brain. The type of neural network applied is feedforward (unidirectional or forward-propagating networks), made up of three layers, a dense layer, a flatten layer and a dense layer. In addition, the hyperbolic tangent activation function, applied in the dense layers, was used [17]. 2.3. Model Development The development of the model begins in the data division stage, selecting from the 72 existing rows, 60 rows for the training stage and 12 rows for model validation. The algorithm makes block predictions of 12-month national visitors, for which it takes input information from the previous 36 months. This process is repeated recurrently, using as data the information collected from 2011 to 2019 to predict the year 2020, subsequently the predicted data from 2020 was added as input data to the model and the year 2021 was predicted, the results obtained again served as input to the model, having aggregate data from the year 2020 and 2021, for the year 2022 the same process was carried out and the prediction for the year 2023 was completed. 2.4. Evaluation of the model The evaluation of the model was based on the metrics: mean absolute error (MAE) and the mean square error (MSE). MAE is defined as a loss function calculated from the sum of the absolute differences between the expected value and the predicted variables and MSE measures the average of the squared errors, that is, the difference between the estimated value and the predicted value, the which were useful to examine the accuracy of the prediction model. In both metrics, the lower the calculated value, the better the prediction model obtained will be. 3. Results and Discussion The neural network model, as seen in Figure 2, has an input layer of 36 neurons formed by the number of monthly visitors in blocks of 36 months, whose data was scaled in the data preparation process. As part of the neural network, a hidden layer is included, which allows prediction functions to be executed to generate an output layer, which corresponds to the prediction of month 37, still in its scaled form. This process is repeated taking the generated output as a new input to the neural network. Figure 2: Developed neural network model For the evaluation, a graphical and numerical comparison was made between the real data and the data predicted by the model, which correspond to the entire Moche Route separated by regions. Figure 3 shows the comparison between the years 2019 and 2022, of the 4 tourist attractions that correspond to the La Libertad region. It can be seen that the predicted data follow the trend and adjust to the actual data, it is also worth mentioning that as the year prediction progresses, a substantial improvement is observed between the prediction and the actual data. Figure 3: Comparison of Actual Values with Predicted Values - La Libertad. Figures 4 and 5 show the comparison of the actual and predicted data for the 7 tourist attractions that correspond to the Lambayeque region, in which 4 attractions present values that fit very well with the data using a neural network model, while in 3 tourist attractions the predicted data vary significantly from the actual data, so it can be indicated that the model does not fit these values as well. 2019 2020 2021 2022 Figure 4: Comparison of Actual Values with Predicted Values - Lambayeque. 2019 2020 2021 2022 Figure 5: Comparison of Actual Values with Predicted Values - Lambayeque. The MAE and MSE metrics were the performance measures used to evaluate the results obtained from the 11 different locations corresponding to the Moche Route. Small values in the residual statistics, MAE and MSE, reflect better goodness-of-fit, i.e., values close to zero are considered a good prediction. The accuracy of the predictions generated by the model was evaluated using a training set and a test set. A 5-year {Xt} training set (estimation period) and 4-year {Xt} test data (validation period) were considered and used for month-to-month prediction. Table 3 presents the results of the evaluation of the model based on the metrics for each of the different tourist attractions, observing the results it is found that in the year 2020 there are 3 tourist attractions out of the 11 that have very high values (highlighted in bold) in the MSE metric (Huaca el Brujo, Tucume Site Museum and Huaca Chotuna Site Museum - Chornancap) which could indicate a lower than expected forecast. For the year 2021, it is reduced to 2 tourist attractions with high values in this metric, also indicating a forecast below expectations. But in 2022 and 2023, all attractions have small values, indicating a better fit of the prediction in the model. Finally, in the analysis of the year 2023, it can be seen that the Huaca del Sol y de la Luna, Brüning National Archaeological Museum and the Huaca Arco Iris present the lowest results, which indicates that the data of these attractions fit very well with the predictions made by the model. The rest of the tourist attractions also present a good prediction. Similar results can be seen for the MAE metric. Table 3 Evaluation results using MSE and MAE as metrics. MSE MAE Tourist Attraction 2020 2021 2022 2023 2020 2021 2022 2023 Huaca Arco Iris Archaeological 0.0573 0.0493 0.0411 0.0400 0.1775 0.1650 0.1550 0.1465 Complex Huaca el Brujo Archaeological 0.1039 0.0697 0.0527 0.0586 0.2160 0.1987 0.1547 0.1788 Complex Huaca del Sol y la Luna 0.0620 0.0629 0.0479 0.0287 0.1863 0.1775 0.1534 0.1182 Archaeological Complex Chan Chan Site 0.0928 0.1093 0.0742 0.0702 0.2149 0.2347 0.1882 0.1939 Museum Brüning National 0.0982 0.0809 0.0515 0.0395 0.2251 0.1899 0.1605 0.1454 Archaeological Royal Tombs of 0.0503 0.0504 0.0492 0.0410 0.1568 0.1529 0.1574 0.1453 Sipan Museum Sican National 0.0889 0.0888 0.0951 0.0908 0.1804 0.1890 0.1820 0.2082 Museum Tucume Site 0.1017 0.0748 0.0745 0.0582 0.2355 0.1973 0.1853 0.1749 Museum Huaca Rajada Site 0.0465 0.0487 0.0537 0.0488 0.1596 0.1578 0.1529 0.1506 Museum – Sipan Huaca Chotuna Site Museum - 0.1345 0.2172 0.0955 0.0921 0.2280 0.3243 0.2115 0.1922 Chornancap Pomac Forest Historical 0.0680 0.0756 0.0673 0.0576 0.2019 0.1976 0.1869 0.1660 Sanctuary According to the results of the forecasts of the 11 tourist attractions, the two tourist attractions with the best prediction results are described below. In Figure 6, you can see the forecast of national visitors to the tourist attraction Brüning National Archaeological Museum, identifying that the trend of visitors is maintained over the years, the behavior of visitors between the months of September to November increases, while from December to June it has a tendency to decrease. Figure 6: Forecast of national visitors at the Brüning National Archaeological Museum. Figure 7 shows the forecast of national visitors to the Huaca del Sol y de la Luna archeological site, showing that the trend of visitors is maintained between 2021 and 2023. In addition, it can be seen that the months of July and August maintain high values for all years, while the months of April and May have the lowest values. Figure 7: Forecast of national visitors in the Huaca del Sol y de la Luna Complex. The results of the predictions made thanks to the application of computer science in tourism, through artificial intelligence, allow to understand the large sets of tourism data, analyze them through machine learning techniques to identify patterns and future trends in the prediction and management of tourism demand, allowing the optimal allocation of resources, improving the experience of tourists and contributing to more informed decision making at the business and government level. 4. Conclusions This study presents a predictive model, based on neural networks and time series to determine the prediction of visitors to the Moche Route in the years 2020 to 2023, applying a recurrent predictive process. In the recurring predictive process, the MSE and MAE metrics improved their results in each year evaluated, identifying in 2020 only three tourist attractions: Huaca el Brujo, Túcume Site Museum and Huaca Chotuna - Chornancap Site Museum with prediction below expectations; On the other hand, by 2021 it was reduced to two tourist attractions and finally, For the years 2022 and 2023, all the attractions present small values in the metrics, indicating a better fit of the prediction in the applied model. The prediction model reached optimal values with the MAE and MSE evaluation metrics, which achieves significant results of the expected demand of national tourists for the tourist attractions of the Moche Route. This research is framed as a contribution to Law No. 31814 on the use of Artificial Intelligence in the Peruvian state, since it will allow the entities in charge of the tourism sector to have a tool for the planning of tourist itineraries and the necessary resources to face the future demand. References [1] Kong, Y.: Real-time processing system and Internet of Things application in the cultural tourism industry development. Soft Computing. 27, 10347–10357 (2023). https://doi.org/10.1007/s00500-023-08304-8 [2] Dangwal, A., Kukreti, M., Angurala, M., Sarangal, R., Mehta, M., Chauhan, P.: A Review on the Role of Artificial Intelligence in Tourism. Presentado en Proceedings of the 17th INDIACom; 2023 10th International Conference on Computing for Sustainable Global Development, INDIACom 2023 (2023) [3] El Peruano: Titular del Mincetur: Actividad turística contribuirá este año con 2.5% al PBI nacional [Entrevista], https://elperuano.pe/noticia/214558-titular-del-mincetur- actividad-turistica-contribuira-este-ano-con-25-al-pbi-nacional-entrevista [4] SIT-MINCETUR: Sistema de Inteligencia Turística, https://www.mincetur.gob.pe/centro_de_Informacion/mapa_interactivo/index.html [5] Bravo, J., Alarcón, R., Valdivia, C., Serquén, O.: Application of Machine Learning Techniques to Predict Visitors to the Tourist Attractions of the Moche Route in Peru. Sustainability (Switzerland). 15, (2023). https://doi.org/10.3390/su15118967 [6] PENTUR: Plan Estratégico Nacional de Turismo del Perú-PENTUR, https://www.gob.pe/institucion/mincetur/informes-publicaciones/22123-plan- estrategico-nacional-de-turismo-del-peru-pentur. [7] El Peruano: Ley que promueve el uso de la Inteligencia Artificial en favor del desarrollo económico y social del país, http://busquedas.elperuano.pe/dispositivo/NL/2192926-1. [8] Vilches, C.: Biblioguias: Desde el gobierno digital hacia un gobierno inteligente: UN E- Government Survey, https://biblioguias.cepal.org/gobierno-digital/un-egovernment- survey. [9] Salehi, S.: Employing a Time Series Forecasting Model for Tourism Demand Using ANFIS. Journal of Information and Organizational Sciences. 46, 157–172 (2022). https://doi.org/10.31341/jios.46.1.9. [10] Kirtil, İ.G., Aşkun, V.: Artificial Intelligence in Tourism: A Review and Bibliometrics Research. Advances in Hospitality and Tourism Research (AHTR). 9, 205–233 (2021). https://doi.org/10.30519/ahtr.801690. [11] Liang, X., Wu, Z.: Forecasting tourist arrivals using dual decomposition strategy and an improved fuzzy time series method. Neural Comput & Applic. 35, 7161–7183 (2023). https://doi.org/10.1007/s00521-021-06671-7. [12] De Jesus, N.M., Samonte, B.R.: AI in Tourism: Leveraging Machine Learning in Predicting Tourist Arrivals in Philippines using Artificial Neural Network. International Journal of Advanced Computer Science and Applications. 14, 816–823 (2023). https://doi.org/10.14569/IJACSA.2023.0140393. [13] Semenoglou, A.-A., Spiliotis, E., Assimakopoulos, V.: Data augmentation for univariate time series forecasting with neural networks. Pattern Recognition. 134, (2023). https://doi.org/10.1016/j.patcog.2022.109132. [14] Nguyen, L.Q., Fernandes, P.O., Teixeira, J.P.: Analyzing and Forecasting Tourism Demand in Vietnam with Artificial Neural Networks. Forecasting. 4, 36–50 (2022). https://doi.org/10.3390/forecast4010003. [15] Wu, B., Wang, L., Zeng, Y.-R.: Interpretable tourism demand forecasting with temporal fusion transformers amid COVID-19. Applied Intelligence. 53, 14493–14514 (2023). https://doi.org/10.1007/s10489-022-04254-0. [16] datosTurismo, http://datosturismo.mincetur.gob.pe/appdatosTurismo/Content1.html. [17] Rivas-Asanza, W., Mazon-Olivo, B., Mejia, F.: Capítulo 1: Generalidades de las redes neuronales artificiales. Presentado en junio 29 (2018).