Aspect-based Sentiment Analysis for Improving Attractiveness in Shrinking Areas Raffaele Manna1,2,*,† , Giulia Speranza1,2,† , Maria Pia di Buono1,2,† and Johanna Monti2,† 1 Dahlia srl, via Duomo 219, Naples, 80139, Italy 2 University of Naples "L’Orientale", via Chiatamone 61/62, Naples, 80121, Italy Abstract In this paper we present the motivations, the methodology and the data used to develop a platform aimed at improving the information about peripheral and shrinking areas in order to foster their attractiveness. As case study, we select the internal area of the Ufita Valley in Irpinia (Campania, Italy). The platform shows through maps and statistics the insights on the cultural attractions in the area of interest on the basis of an aspect-based sentiment analysis model trained on the Google reviews. The platform, addressed to local administrations, is intended as a tool for obtaining an overview of public sentiment towards cultural sites, understanding strengths and weaknesses, as well as for supporting governance and intervention policies for these sites. Keywords Shrinking Areas, Cultural Tourism, Local Administrations, ABSA 1. Introduction stitutional capacities to improve and promote cultural tourism and highly depends on external resources to cope Shrinking areas, as reported in Grasland et al. [1], are with the different problems they may face [4, 5, 6]. internal and rural regions affected by depopulation, de- In order to support public administrations and insti- mographic decline and a rise in the proportion of elderly tutions in their local governance a data-driven decision people, remoteness of public services and scarcity of in- making approach could prove effective [7]. Several types frastructures. Shrinking areas are also characterised by of data, from reports to surveys, reviews and social media, geo-morphological fragility and poor accessibility which can be automatically leveraged to empower administra- causes in some cases also economic impoverishment tions to make informed choices, guide strategic decisions [2, 3]. Furthermore, the high percentage of emigration as well as discover new insights, identify weaknesses among young generations towards larger centres, the and strengths, to enhance public services or optimise re- absence of job opportunities and employment, the aban- source allocation and investments. Effective data analysis donment of buildings, houses and land is also causing can indeed be used to derive essential information for the disappearance of traditions, customs, local knowl- making prevision, statistics, anticipate trends, and dis- edge and artisan expertise. The shrinkage is caused by cover areas of improvements thus transforming raw and a multitude of natural, political and economic factors disconnected data into rich knowledge to be interpreted common to many similar areas around Europe [4, 5]. In and reused wisely. order to face the problem and propose innovative and In this paper, we present the development of the KiNE- effective solutions, local governments and public admin- SIS Project platform to support local administrations in istrations should work towards a common strategic plan improving attractiveness of the shrinking internal area of and the establishment of a collection of actions and activ- the Ufita Valley in Irpinia (Campania Region, Italy) with ities together with stakeholders, social groups, citizens, a specific focus on cultural sites and tourism. The paper companies and organisations [4]. is organised as follows. Section 2 offers an overview of Compared to larger centres, governance in shrinking the project; Section 3 delves into existing research on areas is indeed penalised by a lack of financial and in- the topic, providing context for our approach. Section Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- 4 outlines the specific methodology we applied. Follow- nized by CINI, May 29-30, 2024, Naples, Italy ing this, Section 5 discusses the conclusions and outlines * Corresponding author. potential directions for future research. † These authors contributed equally. $ rmanna@dahliasrl.it (R. Manna); gsperanza@dahliasrl.it (G. Speranza); mpdibuono@unior.it (M. P. di Buono); 2. The KiNESIS Project jmonti@unior.it (J. Monti)  0009-0006-6285-8557 (R. Manna); 0000-0002-9249-492X The KNowledgE alliance for Social Innovation in Shrinking (G. Speranza); 0000-0001-7284-6556 (M. P. di Buono); villages Project (KiNESIS)1 is an Erasmus+ Programme of 0000-0002-4563-5988 (J. Monti) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License 1 Attribution 4.0 International (CC BY 4.0). https://www.kinesis-network.eu/homesite/1/1/home-page.html CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings the European Union co-founded project coordinated and In the following sections, we propose a study related managed by the University of Naples "L’Orientale" which to the analysis and extraction of information related to gathers together many academic institutions, stakehold- specific categories in relation to different cultural sites ers and organisations across Europe, in particular Italy, in areas at risk of depopulation. Specifically, we show Spain, Germany, The Netherlands and Estonia. The Ki- the development of a model capable of extracting differ- NESIS Project addresses the topic of international coop- ent dimensions of intervention regarding cultural sites eration focused on shrinking areas with the aim of pro- to guide public administrations in potential investments moting and fostering ideas, developing and sharing best for the maintenance and enhancement of cultural attrac- practices, projects, workforce, productivity and attrac- tions present in the territory. In this context, some stud- tiveness. The Project’s objectives are to revitalise depop- ies show how investments in the tourism sector can be ulated, shrinking and marginalised areas by stimulating beneficial for the growth of inland areas at risk of depop- entrepreneurship and entrepreneurial skills; to create lo- ulation [16]. cal living laboratories to promote social inclusion and en- trepreneurial development; to experimenting new, inno- vative and multidisciplinary approaches in teaching and 4. Methodology learning; to facilitate the exchange, flow and co-creation Within the KiNESIS Project, with the aim of supporting of knowledge at a local and global level. During the years, the local administrations and institutions in the improve- several activities have been carried out as part of the ment of the attractiveness in the shrinking internal area KiNESIS Project such as co-participation tables, work- of the Ufita Valley (Italy), we developed a user-friendly shops and conferences, training sessions and summer visualisation platform trained on an Aspect-Based Sen- schools, internships and Erasmus+ students exchanges, timent Analysis classification system for cultural attrac- Hackathon and fairies, publication of handbooks, reports, tions and cultural sites. In this section, the methodology scientific documents and best practices, the creation and adopted for training the Aspect-Based Sentiment Analy- dissemination of promotion materials. sis (ABSA) classification system for cultural attractions and potentially of tourist interest in the Irpinia area will 3. Related Works be discussed. Specifically, in Section 4.1, the data col- lected and on which the ABSA model was trained will The cultural tourism sector is increasingly driven by the be described; in Section 4.2, the model architecture and use of data-driven approaches [8, 9]. In this context, data the elicited outputs will be presented; and finally, the from user-generated content platforms allows for the integration into the data analysis visualisation platform collection of increasingly up-to-date and real-time in- will be presented. sights capable of identifying trends to guide decisions around economic activities [10]. Specifically with regard 4.1. Data and Exploratory Data Analysis to tourism-related economic activities, in recent years, methods using Natural Language Processing (NLP) tech- The textual data used for training the ABSA model were niques have been applied to hotel reviews to extract from collected from the Google Maps platform. Specifically, user-generated content the sentiment and perceptions data on reviews of cultural attractions in Irpinia were of users in relation to various categories relevant to the collected. The names of the attractions were collected 2 structure and the services it offers [11, 12]. In addition, using the resources offered by Sistema Irpinia . Sistema NLP and topic modeling methods have been applied to Irpinia is an interactive platform that promotes sites of user-generated content in reference to quality dimen- historical, artistic, architectural, cultural, environmental sions in the museum field [13]. Additionally, other studies and food and wine heritage of Irpinia. This platform have investigated the attractiveness of Italian cities using contains 408 cultural sites. user-generated content. Specifically, users’ behaviour Therefore, the reviews related to the cultural attrac- has been measured to identify the annual trend of pho- tions present on Sistema Irpinia were extracted from tographic activity in cities [14]. User-generated content Google Maps. A total of 9504 reviews were extracted. (UGC) on social media and review platforms related to Of these, about 4% are represented by reviews that do tourist attractions represents a valuable source of infor- not convey any textual information, showing only the mation for guiding decisions towards more informed user rating expressed on Google Maps through the as- economic growth in regions that potentially benefit from signment of stars. For ABSA model training purposes, cultural tourism [15]. However, information from UGC these latter reviews were removed. Before training the has often been analysed by considering only user ratings ABSA model, an exploratory analysis was conducted on or focusing on large tourist hubs such as Italian art cities the information conveyed by the data extracted in the [14]. 2 https://sistemairpinia.provincia.avellino.it/it/node/4 Type # Churches 160 Historical buildings 65 Castles 48 Other places 36 Archeological area 24 Religious complexes 24 Castles - historic palaces 8 Total 365 Table 1 Type of cultural sites manner previously described. This information focused on: 1) the number of cultural sites present in each munici- Figure 1: Availability of visits for cultural sites. pality in the Irpinia area; 2) the type of cultural attraction; and 3) the accessibility to the type of cultural site. The largest number of cultural sites present and represented that allow the site to be reached and to the presence of on the Google Maps platform belongs to two municipal- any explanatory totems within cultural sites. For exam- ities (Rocca San Felice and Bonito) in the Ufita Valley, a ple, in the following review extracted from Google Maps consortium of municipalities participating in the KiNE- in reference to the Goleto Abbey located in Sant’Angelo SIS Project. Specifically, the municipality of Rocca San dei Lombardi: Felice is represented by 14 cultural sites, while Bonito has 13 cultural sites. L’abbazia del Goleto è ancora CHIUSA Table 1 shows the number of types of cultural sites PER LAVORI DI RISTRUTTURAZIONE che extracted from the Sistema Irpinia platform and conse- dovrebbero terminare il 4 giugno 2024. Spero quently for which reviews were found. The most rep- venga rispettata la data di consegna dei la- resented type is related to churches (160), followed by vori perché è sempre un piacere visitare il historical buildings (65), castles (48), and the ‘other place’ complesso. L’abbazia è spettacolare e mi as- petto che al termine della ristrutturazione, type (36), which represents noble residences. The least lo sarà ancor di più. Complimenti alla ditta represented, although the most extensive in terms of spa- dell’architetto X. (The Goleto Abbey is still tial extent, are archaeological areas and religious com- CLOSED FOR RESTORATION WORK which plexes. should be completed on June 4, 2024. I hope For each cultural site that falls into one of these types, the deadline for the work will be respected as it categories/aspects were assigned to the reviews using the is always a pleasure to visit the complex. The distant supervision method [17, 18] in combination with abbey is spectacular and I expect it to be even the information from the overall rating score given by more so after the renovation is complete. Con- the review stars. Specifically, this method allows to build gratulations to the firm of architect X.) an annotated dataset without or with little human inter- The extracted aspects are related to ‘Accessibility’ and vention. In fact, rule-based heuristics are used in order ‘Appearance of the place’, as well as an overall score that to produce labeled data and on these labeled data pro- shows the ratio for all aspects. As far as the ‘Accessi- duced being then used to train a model. The rule-based bility’ aspect is concerned, the identified text portion is heuristic consists of lists of words, primarily adjectives ‘CLOSED FOR RESTORATION WORK’, while for the ‘Ap- and adverbs, that can signal positive and negative char- pearance of the place’ aspect the identified text portion acteristics related to the aspects identified as salient for is ‘The abbey is spectacular’. describing the conditions of a cultural site. In the case of The application of the distant supervision method al- the ABSA model, four categories/aspects were identified lowed us to annotate the data at our disposal with mini- that are able to describe the conditions of cultural sites. mal human intervention. Subsequently, an exploratory The aspects related to cultural sites are the following: 1) analysis of the annotated data at our disposal was carried accessibility; 2) signage; 3) appearance of the place and out. In Figure 1, for example, the frequencies related to 4) overall score. the accessibility of the cultural sites present in the dataset Specifically, ‘Accessibility’ refers both to the availabil- are shown. In addition, as previously mentioned, the ‘Ac- ity of opening hours for the public and to the provision cessibility of cultural sites’ refers to two different use of equipment to make the visit more accessible to a wide functions. In the case of Figure 1, accessibility is shown range of visitors, while ‘Signage’ refers both to road signs in terms of availability for visits. For example, in Figure Figure 2: Availability of Archeological Areas. Figure 3: Overall score - Positive. 1, 64 sites are available only during the hours dedicated to religious functions. Instead, the ‘To complete’ label with 52 cases refers to closed cultural sites that are not available for visits or are undergoing renovation work. As previously mentioned, the dataset also includes in- formation concerning the type of cultural site among its characteristics. This information allows for more granu- lar information in relation to the different aspects iden- tified. For example, Figure 2 shows the frequencies of availability for visits only for cultural sites of the ‘Archae- ological Areas’ type. In this case, for example, we can note that in 2 cases it was reported that the archaeological Figure 4: Overall score - Negative. area is open all year round. In addition, the dataset also includes user reviews of cultural sites in the form of star ratings. These were used in conjunction with the identified aspects to balance the In Figure 4 are shown the negative textual spans about overall score. Specifically, the textual spans related to the ’Appearance of the place’ with judgements around the the aspects of ‘Accessibility’, ‘Appearance of the Place’, state of neglect of some cultural sites; about the ’Signage’ and ‘Signage’ were identified. A sentiment and emotion with the reporting of the complete absence of explana- lexicon3 was then applied to these spans to add a score tory panels or damaged and in relation to ’Accessibility’ associated with each aspect. The score from the lexicon with comments on the lack of services or structural defi- was used in conjunction with the user’s rating score in ciencies. the review. These scores from the lexicon for each aspect identified in the text, together with the scores assigned 4.2. ABSA Model and Platform by the user, were used as supervision labels to train the ABSA model. Figure 3 shows the textual spans extracted This section outlines the fine-tuning process of the cho- for different aspects in relation to the positive overall sen model and the implementation of the platform pro- score. totype to be made available to public administrations Specifically, Figure 3 shows textual spans related to: for effectively directing policies towards cultural sites Accessibility with the span ‘free site’; Signage with with potential appeal for cultural tourism. The platform’s ‘guides available’; and Appearance of the place with tex- objective is to provide insights to inform decisions on tual spans related to the beauty and spaciousness of the which aspects of a specific cultural site to focus on in cultural sites. order to prepare it for the influx of tourists. In this context, we applied ABSA to analyze and clas- 3 sify user reviews of cultural sites and attractions in the The sentiment and emotion lexicon used comes from this reposi- tory: https://saifmohammad.com/WebPages/nrc-vad.html and the Ufita Valley. We employed the XLM-Roberta-base model4 , lexicon for the Italian language was used. Specifically, the NRC a multilingual pre-trained transformer-based language Valence, Arousal, and Dominance (VAD) Lexicon was used. This model[19], for the ABSA task. The model was fine-tuned resource includes a list of more than 20,000 words and their va- using a dataset of user reviews described in section 4.1. lence, arousal, and dominance scores. These scores represent the 4 emotional qualities of the words. https://huggingface.co/FacebookAI/xlm-roberta-base Figure 5: ABSA Platform - Map. Figure 6: ABSA Platform - Statistics. The dataset was divided into training (60%) and testing site. For the statistics, you can select the name of the (40%) sets. The fine-tuning process involved optimis- cultural site and the aspect of interest and view the corre- ing the model parameters to minimise the loss function, sponding information. For example, Figure 6 shows the which measured the discrepancy between the predicted information related to Abbazia del Goleto in relation to and actual sentiment labels. We employed a multi-task the ‘Appearance of the place’. learning approach, training the model to simultaneously In this case, when selecting the aspect to view for a perform two tasks: specific cultural site, the user will find the information extracted using two methods: a pie chart and a bar chart. • Aspect Category Classification: Classifying The first shows the sentiment scores associated with the each textual span into its corresponding aspect selected aspect, while the second shows the top-n words category (e.g., ‘Accessibility’, ‘Signage’, ‘Appear- (sorted by frequency of use) extracted from the reviews ance’). and associated with a particular aspect. Indeed, in Figure • Overall Sentiment Score Assignment: Assign- 6, we can observe that the majority of reviews express ing a sentiment score (ranging from 1 to 10) to a very favourable opinion of the Goleto Abbey, as more each textual span based on the sentiment ex- than half (54%) have a very high ‘Appearance of the place’ pressed towards the corresponding aspect. score (9) and use words such as "evocative," "historical," "charming," and "beautiful view." The overall sentiment score for each review was cal- culated as the average of the individual aspect scores, weighted by the number of mentions of each aspect. Ad- 5. Conclusion and Future Works ditionally, the overall score was adjusted based on the user’s overall judgement of the review (positive, nega- In this paper, we present the implementation of a plat- tive, or neutral). The fine-tuned model’s performance form capable of extracting and classifying sentiment in- achieves an F1-score of 78% in extracting the correct tex- dices for various aspects from online reviews of cultural tual spans for each aspect and assigning the correct score. sites. Specifically, the ABSA model and the platform pro- This performance represents the average value across the totype were implemented within the KiNESIS project, model’s performance. The platform developed to provide which aims to investigate methods for mitigating the public administrations with insights into cultural sites effects of ongoing depopulation in rural areas across Eu- features two main data visualisation modes: a map and rope. In this context, the platform is proposed to public cultural site-specific statistics, as illustrated in Figure 5. administrations as a tool to support their policies re- The map is populated with markers corresponding to garding cultural tourism sites of potential interest. In- the coordinates of cultural sites and displays the weighted deed, the platform can be a valuable tool for obtaining average of the scores for each identified aspect of the cul- an overview of public sentiment towards cultural sites, tural site. This approach visualises and considers the as well as a tool for directing active intervention policies frequencies of the different scores (ranging from 1 to for the maintenance of specific aspects related to cultural 7, negative to positive) for each review of the cultural sites. The platform is currently under development and is site. The map provides a quick overview of the sentiment only being tested for the Ufita Valley area and for Italian- associated with the identified aspects for each cultural language reviews. Additionally, thanks to the KiNESIS site. Additionally, the platform offers a second data visu- Project, activities and data collection have already begun alisation tool that focuses more on the specific cultural to extend the training and testing phase to the Oldambt region (The Netherlands) in collaboration with the Dutch planning cultural tourism, Technological Forecast- partner of the KiNESIS Project. In this context, therefore, ing and Social Change 162 (2021) 120345. the ABSA model and platform will be tuned and made [10] M. P. A. Austin, P. Austin, M. M. Marini, A. Sanchez, available in other languages to analyse data from other C. Simpson-Bell, J. Tebrake, Using the Google Places European regions. API and Google Trends data to develop high fre- quency indicators of economic activity, Interna- tional Monetary Fund, 2021. Acknowledgments [11] Y. Liu, T. Teichert, M. Rossi, H. Li, F. Hu, Big data for big insights: Investigating language-specific drivers The authors gratefully acknowledge the financial support of hotel satisfaction with 412,784 user-generated provided by the Erasmus+ Programme of the European reviews, Tourism Management 59 (2017) 554–563. Union for the KiNESIS Project (Grant Agreement 621651- [12] S. U. S. Chebolu, F. Dernoncourt, N. Lipka, T. Solorio, EPP-1-2020-1-ITEPPKA2-KA). Survey of aspect-based sentiment analysis datasets, arXiv preprint arXiv:2204.05232 (2022). References [13] D. Agostino, M. Brambilla, S. Pavanetto, P. Riva, The contribution of online reviews for quality eval- [1] C. Grasland, R. Ysebaert, B. Corminboeuf, uation of cultural tourism offers: The experience of N. Gaubert, N. Lambert, I. Salmon, M. Baron, italian museums, Sustainability 13 (2021) 13340. S. Baudet-Michel, E. Ducom, D. Rivière, et al., [14] S. Giglio, F. Bertacchini, E. Bilotta, P. Pantano, Using Shrinking regions: a paradigm shift in demog- social media to identify tourism attractiveness in raphy and territorial development, Ph.D. thesis, six italian cities, Tourism management 72 (2019) Parlement Européen; Direction Générale des 306–312. politiques internes de l’Union . . . , 2008. [15] A. Torre, H. Scarborough, Reconsidering the esti- [2] T. Amodio, Territories at risk of abandonment in mation of the economic impact of cultural tourism, italy and hypothesis of repopulation, Belgeo. Revue Tourism Management 59 (2017) 621–629. belge de géographie (2022). [16] M. H. Guimarães, L. C. Nunes, A. P. Barreira, [3] S. De Rubertis, Dinamiche insediative in italia: T. Panagopoulos, Residents’ preferred policy ac- spopolamento dei comuni rurali, Perspectives on tions for shrinking cities, Policy Studies 37 (2016) rural development 2019 (2019) 71–96. 254–273. [4] A. Haase, G.-J. Hospers, S. Pekelsma, D. Rink, [17] A. Go, R. Bhayani, L. Huang, Twitter sentiment Shrinking areas: Front-runners in innovative cit- classification using distant supervision, CS224N izen participation, The Hague: European Urban project report, Stanford 1 (2009) 2009. Knowledge Network, 2012. [18] A. Giannakopoulos, D. Antognini, C. Musat, [5] D. Rink, P. Rumpel, O. Slach, C. Cortese, A. Violante, A. Hossmann, M. Baeriswyl, Dataset construction P. C. Bini, A. Haase, V. Mykhnenko, B. Nadolu, via attention for aspect term extraction with distant C. Couch, et al., Governance of shrinkage: Lessons supervision, in: 2017 IEEE International Confer- learnt from analysis for urban planning and pol- ence on Data Mining Workshops (ICDMW), IEEE, icy, Leipzig: Helmholtz Centre for Environmental 2017, pp. 373–380. Research (2012). [19] A. Conneau, K. Khandelwal, N. Goyal, V. Chaud- [6] D. Rink, A. Haase, K. Großmann, M. Bernt, hary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, C. Couch, M. Cocks, A. Violante, C. Cortese, L. Zettlemoyer, V. Stoyanov, Unsupervised cross- P. Calza Bini, How shrinkage and local governance lingual representation learning at scale, CoRR are interrelated across urban europe: a comparative abs/1911.02116 (2019). URL: http://arxiv.org/abs/ view, 2011. 1911.02116. arXiv:1911.02116. [7] R. Matheus, M. Janssen, D. Maheshwari, Data sci- ence empowering the public: Data-driven dash- boards for transparent and accountable decision- making in smart cities, Government Information Quarterly 37 (2020) 101284. [8] T. Kalvet, M. Olesk, M. Tiits, J. Raun, Innovative tools for tourism and cultural tourism impact as- sessment, Sustainability 12 (2020) 7470. [9] M. T. Cuomo, D. Tortora, P. Foroudi, A. Giordano, G. Festa, G. Metallo, Digital transformation and tourist experience co-design: Big social data for