Paving the Way for Personalized Museums Tours in the Metaverse Alex Falcon1,* , Beatrice Portelli1,2 , Ali Abdari1,2 and Giuseppe Serra1 1 University of Udine, Italy 2 University of Naples Federico II, Italy Abstract Museums play a central role in the preservation and communication of human history. With the advent of powerful and accessible Virtual Reality technologies, new Metaverse Museums started being developed, creating new possibilities for discovering and experiencing knowledge from all over the world. In anticipation of this technology becoming more widely adopted, we must prepare tools to aid future visitors in finding and navigating the museums and exhibitions which are more relevant to their current interests. In this light, Deep Learning methods could be of great use for modeling Metaverse museums, retrieving the most relevant ones, and creating personalized tours of the artifacts contained within them. In this paper, we present our research project, “Personalized Museum Tours in the Metaverse” led by the Artificial Intelligence Laboratory of Udine (AILAB Udine), detailing its objectives, proposed methodology, and future directions. Keywords Metaverse, Cross-modal Understanding, Multimedia, Virtual Museums 1. Introduction and Motivation Museums are important institutions in our society, which are open to the public and conserve, acquire, exhibit, and communicate testimonies of man and their environment. As society changes and transforms in time, museums also have renewed themselves, adapting their objectives, roles, and the way they interact with the public. In particular, nowadays it is not unusual for them to leverage and incorporate digital technologies to offer a more immersive experience for the users, for example through downloadable guides, QR-codes next to the artworks to access detailed descriptions, and even online exhibitions. The peak expression of this evolution would be a completely Digitized Museum, or a Museum in the Metaverse, accessible through Augmented Reality (AR) technologies or even completely built-in Virtual Reality (VR). There are several reasons why Metaverse Museums could be a beneficial evolution of this institution: wider access to history and knowledge from all over the IRCDL 2024: 20th conference on Information and Research science Connecting to Digital and Library science, February 22–23, 2024, Bressanone, Brixen, IT * Corresponding author. $ falcon.alex@spes.uniud.it (A. Falcon); portelli.beatrice@spes.uniud.it (B. Portelli); abdari.ali@spes.uniud.it (A. Abdari); giuseppe.serra@uniud.it (G. Serra)  0000-0002-6325-9066 (A. Falcon); 0000-0001-8887-616X (B. Portelli); 0000-0002-4482-0479 (A. Abdari); 0000-0002-4269-4501 (G. Serra) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Figure 1: A still shot from two Metaverse museums. (left) VR Museum of Fine Arts. (right) Smithsonian American Art Museum “Beyond The Walls”. world; a reduction in carbon emissions caused by people physically traveling to the museum, especially for long-distance travels; a safer way to visit exhibitions in case of healthcare crises (e.g., the recent COVID-19 pandemic); a more accessible way to visit exhibitions for people with physical impairment or other restrictions that prevent them from safely visiting museums in person. Additionally, a museum in the Metaverse allows for more flexibility and interactivity in the contents presented to the visitors. For example, it could contain recreations of historical places and artifacts that no longer exist (e.g., project Rekrei1 ), and provide access to places too delicate for physical interaction. Consider the Lascaux Cave in France, closed due to the paintings’ deterioration from light, air changes, and carbon monoxide from visitors’ breath2 . However, if Metaverse Museums were to become more common and accessible, it could be difficult for users to choose among all the possible interesting exhibitions and navigate the immense amount of artifacts and information. This is an area where Artificial Intelligence (AI) and Deep Learning (DL) techniques can be leveraged to alleviate the cognitive load on the user and enhance the discoverability of the artworks. Our project aims to develop a multimedia system for the creation of personalized Metaverse museum tours based on the user interests, formulated through a textual query. Realizing such a system entails an in-depth analysis of heterogeneous types of data, including visual (3D scenes, paintings, statues, etc) and textual (artifact descriptions and user queries). Notably, this project deals with many innovative aspects: the analysis of Metaverse museums and their artifacts, the estimation of their relevance to the user interests, and the automatic generation of personalized tours, blending the artifacts’ descriptions with the desires of the user and vast amounts of external knowledge. 1 https://rekrei.org/about 2 https://archeologie.culture.gouv.fr/lascaux/en/preservation-lascaux-hill Metaverse Museums I’d like to see 1 paintings of people playing together M1 M2 M3 M4 … Mn on a sunny day. 2 3 top-k museums 4 Metaverse At museum M1, you will find 5 Query Analysis & … Personalized M1 M2 Mk beautiful paintings relevant to your Understanding Query-Artifact Tour Creation query. First, you should see .. Relevance At museum Mk, you will find 2 beautiful paintings relevant to your query. First, you should see .. Figure 2: Overview of the proposed system. (1) The user formulates a query. (2) The Query Understand- ing module encodes its semantic contents with a compact representation. (3) The Metaverse Analysis & Query-Artifact Relevance module encodes and ranks the museums according to their relevance to the query. (4) The Personalized Tour Creation module generates a brief tour of each relevant museum. 2. Personalized Metaverse Museums Tour Creation: Project Description The research project stems from the need for tools to understand, represent, and navigate complex scenes such as digitized Metaverse museums. The final objective is to develop a system to create personalized Metaverse museum tours, given a user query. Figure 2 presents an overview of the proposed system. The user interacts with the system by formulating a query in natural language. The Query Understanding module analyzes the user query and represents it by means of a compact representation wherein its semantics are preserved. This representation will serve as guidance for the Metaverse Analysis & Query-Artifact Relevance module. In fact, this module processes all the available Metaverse museums and ranks them according to their relevance to the user query. After a subset of the most relevant museums is identified, the Personalized Tour Creation module outputs for each of them a suggested itinerary covering the relevant artifacts and explaining how they relate to the user desires. 2.1. Query Understanding: Modelling the User Query The advancements in natural language processing made in the past few years enabled interactive applications to understand the semantics of user-generated queries. These include word- level techniques, such as word2vec [1], GloVe [2], and fastText [3], which compute a vector representation for each word, hence using the words as a unit of meaning; and sentence-level techniques, such as BERT [4], T5 [5], and GPT [6], which use sentences as a unit of meaning. In practice, to have a better understanding of the user query and its implications, in our project we will employ a combination of both word- and sentence-level techniques. Notably, the user queries may be objective and ask for specific authors or art schools, but they also may be much more subjective and focus on feelings or contain very general and imprecise prompts (e.g., “happy people playing on a beach” or “mysterious doctors performing an autopsy”). Therefore, it becomes crucial to integrate these two types of information into the training procedure. To achieve this goal, we plan to do the following: for each painting, we consider both an objective description of its visual contents and one or more subjective descriptions capturing the emotions that it sparked in the observer. Then, during training, the knowledge from both types of descriptions is integrated into the representations of the visual artifacts and the user queries. Another tool to relate subjective and objective descriptions is represented by models trained for formality style transfer [7], as they paraphrase a sentence from one style to another while preserving the semantic meaning. They could be used either to generate training data or to transform the user query and use it to match more artifacts. 2.2. Metaverse Analysis & Query-Artifact Relevance: Modelling the Museum and the Cross-modal Interactions To understand the contents of a Metaverse museum and relate them to the user query, both local (fine-grained, artifact-level) and global (coarse, museum- or room-level) information are needed. A representation is then computed for each museum by aggregating these two sources of information. Finally, they are ranked according to their relevance to the user query, obtaining the final output of this module. Local Information. To provide the subsequent modules with precise knowledge of the relevant artifacts present in the room/museum, the artifacts must first be identified and analyzed. To automatize this process, several well-known and widely used object detectors in the vision community can be leveraged, such as YOLO [8] and Faster R-CNN [9]. Once the artifacts are localized, multiple methodologies can be followed to obtain the needed local information. A first approach represents each object in the scene with its categorical and spatial attributes (e.g., size, position, and rotation in the 3D space), as done in [10]. Although these attributes can create a representation with discriminative qualities, there are two limitations: first, the obtained representation lacks visual grounding; second, it will not lie in the same embedding space as the query, hence requiring a finetuning step to align the visual and textual spaces. Another possible approach uses a jointly trained vision-language model (e.g., CLIP [11]) to extract a representation for each artifact. Interestingly, since such an approach involves learning a joint vision-language embedding space, the artifact representation will lie in the same joint embedding space as its own description, hence making it easier to understand whether it is relevant to the query. We followed a similar approach with positive results in [12], where we developed a system to rank digital apartments according to a textual query. Global Information. To obtain the global information, a naive solution could aggregate the pieces of local information obtained in the previous step, e.g., by using a Variational Au- toencoder [10] to represent a 3D scene through its objects as done in our previous work [13] about Metaverse scene retrieval. Also, knowledge graphs can aggregate the local information by discovering relationships among the artifacts in a room (or museum) [14, 15]. An alter- native method, diverging from the aggregation of local information, involves the use of 3D Convolutional Neural Networks (CNNs, e.g., [16, 17]) to capture finer-grained spatial relations between the artifacts. Spatial attention plays a key role in reducing the amount of redundancy in the learned representations due to the presence of large empty spaces such as walls, halls, etc. However, the obtained representations would not lie in the same embedding space of the queries, hence requiring further finetuning to align the two spaces. Query Relevance and Ranking. The final step performed by this module consists of ranking the museums according to their relevance to the user query. Two important aspects should be considered during this step. First, a relevance score should be computed for each artifact, since few of them are likely to be highly relevant to the query, and those few will impact the final relevance of the museum as well as becoming the key elements of the personalized tour. Second, the relevance value for the museum is influenced by several global aspects, including the number of relevant artifacts per room (or within the museum), and how the relevance values are distributed (e.g. if there are many low-relevance artifacts or only a few highly relevant ones). Moreover, further constraints could be added, such as ranking the museums based on how much the relevant artifacts are clustered within the museum’s rooms. The output of this step is a ranking of the museums, where for each museum a few artifacts are highlighted as the most relevant ones for the query. 2.3. Personalized Tour Creation: Presenting Suggestions to the User For each of the museums selected by the previous module, the Personalized Tour Creation module creates a textual summary of the relevant artifacts and motivation on their relevance to the user query and interests. To put together this information, we will use a Large Language Model (LLM) such as ChatGPT, or open-source alternatives such as LLaMa [18] and the recent Phi2 [19]. These models are impressive tools to automatically create coherent and well-written summaries. Moreover, since they are pretrained on large amounts of textual data from the Web, they also have external knowledge of the artifacts which may be useful in adding more insights to the personalized tour. Finally, due to the model’s prompt-based nature, the tour can be personalized by creating specialized prompts. For instance, the summary obtained by prompting ChatGPT with “create a summary of why the artifacts The Annunciation by Leonardo da Vinci, The Annunciation by Benvenuto Tisi, and The Annunciation by Sandro Botticelli are relevant to the user interests defined by ‘I want to see the graceful Holy Mary being announced the birth of Jesus by the angel’” contains both the motivation for the relevance (e.g., “The angel Gabriel’s announcement to Mary is depicted with a sense of divine tranquility, emphasizing Mary’s graceful response to the news”) and some key features of the artifact or the style pursued by the author (e.g., “Da Vinci’s meticulous attention to detail and his ability to convey emotions through facial expressions make this artwork a captivating representation of the Annunciation”). 2.4. Dataset Creation At the time of writing, there are no public datasets containing a collection of Metaverse museums. Therefore, we decide to create a new dataset as a resource for us and future researchers. In particular, the dataset should contain a large amount of digital museums, where each room contains multiple artifacts, including paintings, 3D sculptures, and other items of interest. Moreover, to enable fine-grained cross-modal applications and to estimate the relevance of each item to the user query, detailed descriptions of the artifacts are needed. As mentioned in the previous sections, both subjective and objective textual descriptions are crucial to include in the dataset, due to the wide range of possible user queries. To obtain a large amount of digital museums, we resort to the creation of a synthetic dataset. This process involves several steps: first, collecting a diverse set of artifacts, including 2D and 3D art; second, obtaining meaningful descriptions for the artifacts, both objective and subjective; third, creating each 3D museum with multiple rooms, each containing several artifacts. Metaverse Museums Creation. We are currently developing an automatic approach to create 3D museums, each containing multiple rooms with a variable number of sculptures and/or paintings. The items are either attached to the walls or placed in key positions in the room (e.g., corners or center of the room). The items will be grouped in the rooms based on topic, artist, theme, or other criteria, in order to create realistic thematic exhibitions. 2D Artwork Collection. To collect painting data and descriptions, we consider several public datasets, focusing on the type of annotations. Image captioning techniques like MPlug [20] and datasets such as ArtCap [21] provide objective descriptions. However, they both lack expert insights into interpretation and historical context, and subjective feelings expressed by observers. For the insights, SemArt [22] offers contextual details and artistic comments on over 21k paintings from the Web Gallery of Art (WGA). ArtEmis [23] includes 400k non-expert annotations about the emotions felt while observing fine art, while ArtELingo [24] adds multi- lingual annotations and insights into cultural variations in emotional responses to the same artwork. 3D Artwork Collection. While there are many popular datasets for 3D objects (e.g., ShapeNet [25]), large open datasets on 3D statues and sculptures are not readily available. Recently, Objaverse [26] and ObjaverseXL [27] were released, which contain over 800k and 10M annotated 3D objects respectively. Among these objects, there are also artistic statues and sculptures which are relevant for our purposes. Unfortunately, the textual annotations are sparse and do not contain an extensive description of the item’s visual appearance or any subjective comments. To address this, captioning techniques could be used to obtain objective descriptions, whereas artistic comments could be scraped (e.g., from the WGA when possible). 3. Related work 3.1. Research on Applications for the Metaverse A widespread range of applications have been designed for the Metaverse, ranging from enter- tainment to virtual shopping [28, 29] and apartments scouting [12], from education to healthcare [30, 31] and industrial applications [32, 33]. Although some Metaverse museums are available online (e.g., the Museum of Other Realities3 , and The Vordun Museum4 ), the research on them is still in their infancy [34, 35] and many topics on their automatic understanding are widely 3 https://www.museumor.com/ 4 https://secondlife.com/destination/vordun-museum-and-gallery unexplored. For instance, the task of retrieving multimedia-enriched scenarios by means of textual queries was only recently introduced [13]. Differently, in this project, we aim to create personalized Metaverse museum tours in an automatic way. 3.2. Retrieving and Ranking Artistic Artifacts The task of retrieving and ranking artifacts according to their relevance to a user query is similar to the more general text-to-image retrieval task, which has seen great improvements in recent years, leading to the development of high-performing tools [11]. Conde and Turgutlu [36] developed a direct transposition of CLIP for the art domain, a pretrained model that can be used to describe images containing artworks or retrieve artworks given a description. Other interesting approaches include an unsupervised method [37] to discover the relationships among different artworks and schools only based on visual features, and different approaches employing knowledge graphs to capture the underlying relationships between other artistic attributes too, such as style, movement, subject, and period [14, 15]. 3.3. Creating a Personalized Museum Tour Suggesting a museum tour or a city itinerary to tourists is a difficult topic which sparked interest in the research community since before the advent of deep learning. Early approaches for creating personalized demo tours included three main steps: first, a user model is created by asking the users to rate artworks; second, an art recommendation step is performed based on the user model and a set of predefined topics; third, a tour creation tool selects a fixed amount of artworks from those recommended in the previous step [38, 39]. More recent literature on this topic often focuses on improving or optimizing the itinerary based on several constraints, e.g., travel costs [40] or visit duration [41]. Three main differences set our work apart from the previous ones. First, the art recommendation part is done automatically through cross-modal analysis, letting the user inform the model about their interests via text, therefore avoiding the imposition of predefined topics. This approach is suitable for modeling user interests, whether they are described using technical terms or a simpler lexicon. It also helps in learning how to contextualize the user query within the artwork visuals and textual descriptions. Moreover, our procedure does not require every artwork to be tagged based on a set of predefined topics, demanding expert knowledge to create a minimal set of comprehensive tags and extensive human efforts from experts to tag every artwork featured in a virtual museum, therefore limiting scalability. Second, the different context (Metaverse) makes some of the previously investigated constraints less relevant, such as optimizing travel costs or even visit duration, since the user can visit the museum during different sessions and across several days from the comfort of their home. Third, our aim is to not only provide a list of artifacts to the user, but to inform them, creating a brief summary explaining why the user should visit a certain museum, while specifying a selection of the most relevant artifacts. We envision LLMs as the main tool to create a coherent tour and to provide contextual information related to the artifacts. Nonetheless, external knowledge could also be integrated from knowledge bases, similarly to [42], which combines LLMs and Wikipedia to generate in-depth descriptions of artworks including background information about the history of the painting too. 4. Conclusions In this paper, we presented a research project detailing the first steps to realize an innovative multimedia system for creating personalized museum tours in the Metaverse. This project is motivated by the increasing attention driven to the Metaverse and its vast possibilities in terms of interactive and immersive experiences, and the importance of spreading knowledge on fine arts and cultural heritage across the globe. We provided an overview of the proposed system, and in-depth description of its components and how they relate to state-of-the-art methodologies in vision-language understanding. Given the lack of suitable public datasets for experimentation, we also provided a detailed explanation of the data collection process that is being currently carried out. We hope that in the near future the dataset will be a valuable resource for researchers and practitioners working on the intersection of Metaverse and multimedia understanding, and that this project will spark interest in the computer science and cultural heritage community fostering collaborative endeavors. Acknowledgements This work was supported by the PRIN 2022 “MUSMA” - CUP G53D23002930006 - “Funded by EU - Next-Generation EU – M4 C2 I1.1”, and by the Department Strategic Plan (PSD) of the University of Udine–Interdepartmental Project on Artificial Intelligence (2020-25). References [1] Q. Le, T. Mikolov, Distributed representations of sentences and documents, in: Proceedings of ICML, PMLR, 2014, pp. 1188–1196. [2] J. Pennington, R. Socher, C. D. Manning, Glove: Global vectors for word representation, in: Proceedings of EMNLP, 2014, pp. 1532–1543. [3] A. Joulin, É. Grave, P. Bojanowski, T. Mikolov, Bag of tricks for efficient text classification, in: Proceedings of EACL, 2017, pp. 427–431. [4] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of NAACL-HLT, 2019, pp. 4171–4186. [5] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, The Journal of Machine Learning Research 21 (2020) 5485–5551. [6] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., Improving language under- standing by generative pre-training (2018). [7] S. Rao, J. Tetreault, Dear sir or madam, may I introduce the GYAFC dataset: Corpus, benchmarks and metrics for formality style transfer, in: Proceedings of NAACL-HLT, 2018, pp. 129–140. [8] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of IEEE CVPR, 2016, pp. 779–788. [9] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems 28 (2015). [10] H. Yang, Z. Zhang, S. Yan, H. Huang, C. Ma, Y. Zheng, C. Bajaj, Q. Huang, Scene synthesis via uncertainty-driven attribute synchronization, in: Proceedings of IEEE/CVF ICCV, 2021, pp. 5630–5640. [11] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, in: Proceedings of ICML, 2021, pp. 8748–8763. [12] A. Abdari, A. Falcon, G. Serra, Farmare: a furniture-aware multi-task methodology for recommending apartments based on the user interests, in: Proceedings of IEEE/CVF ICCV, 2023, pp. 4293–4303. [13] A. Abdari, A. Falcon, G. Serra, Metaverse retrieval: Finding the best metaverse environment via language, in: Proceedings of 1st International Workshop on Deep Multimodal Learning for Information Retrieval, 2023, pp. 1–9. [14] G. Castellano, V. Digeno, G. Sansaro, G. Vessio, Leveraging knowledge graphs and deep learning for automatic art analysis, Knowledge-Based Systems 248 (2022) 108859. [15] N. Garcia, B. Renoust, Y. Nakashima, Contextnet: representation and exploration for paint- ing classification and retrieval in context, International Journal of Multimedia Information Retrieval 9 (2020) 17–30. [16] J. Hou, B. Graham, M. Nießner, S. Xie, Exploring data-efficient 3d scene understanding with contrastive scene contexts, in: Proceedings of IEEE/CVF CVPR, 2021, pp. 15587–15597. [17] S. Peng, K. Genova, C. Jiang, A. Tagliasacchi, M. Pollefeys, T. Funkhouser, et al., Openscene: 3d scene understanding with open vocabularies, in: Proceedings of IEEE/CVF CVPR, 2023, pp. 815–824. [18] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al., Llama 2: Open foundation and fine-tuned chat models, arXiv preprint arXiv:2307.09288 (2023). [19] M. Abdin, J. Aneja, S. Bubeck, C. César Teodoro Mendes, W. Chen, A. Del Giorno, R. El- dan, S. Gopi, S. Gunasekar, M. Javaheripi, P. Kauffmann, Y. T. Lee, Y. Li, A. Nguyen, G. de Rosa, O. Saarikivi, A. Salim, S. Shah, M. Santacroce, H. Singh Behl, A. Tau- mann Kalai, X. Wang, R. Ward, P. Witte, C. Zhang, Y. Zhang, Phi-2: The surpris- ing power of small language models, https://www.microsoft.com/en-us/research/blog/ phi-2-the-surprising-power-of-small-language-models/, 2023. Accessed: February 12, 2024. [20] C. Li, H. Xu, J. Tian, W. Wang, M. Yan, B. Bi, J. Ye, H. Chen, G. Xu, Z. Cao, et al., mplug: Effective and efficient vision-language learning by cross-modal skip-connections, in: Proceedings of EMNLP, 2022, pp. 7241–7259. [21] Y. Lu, C. Guo, X. Dai, F.-Y. Wang, Artcap: A dataset for image captioning of fine art paintings, IEEE Transactions on Computational Social Systems (2022). [22] N. Garcia, G. Vogiatzis, How to read paintings: semantic art understanding with multi- modal retrieval, in: Proceedings of ECCV Workshops, 2018. [23] P. Achlioptas, M. Ovsjanikov, K. Haydarov, M. Elhoseiny, L. J. Guibas, Artemis: Affective language for visual art, in: Proceedings of IEEE/CVF CVPR, 2021, pp. 11569–11579. [24] Y. Mohamed, M. Abdelfattah, S. Alhuwaider, F. Li, X. Zhang, K. Church, M. Elhoseiny, Artelingo: A million emotion annotations of wikiart with emphasis on diversity over language and culture, in: Proceedings of EMNLP, 2022, pp. 8770–8785. [25] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al., Shapenet: An information-rich 3d model repository, arXiv preprint arXiv:1512.03012 (2015). [26] M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. VanderBilt, L. Schmidt, K. Ehsani, A. Kembhavi, A. Farhadi, Objaverse: A universe of annotated 3d objects, in: Proceedings of IEEE/CVF CVPR, 2023, pp. 13142–13153. [27] M. Deitke, R. Liu, M. Wallingford, H. Ngo, O. Michel, A. Kusupati, A. Fan, C. Laforte, V. Voleti, S. Y. Gadre, E. VanderBilt, A. Kembhavi, C. Vondrick, G. Gkioxari, K. Ehsani, L. Schmidt, A. Farhadi, Objaverse-xl: A universe of 10m+ 3d objects, arXiv preprint arXiv:2307.05663 (2023). [28] A. Dawson, et al., Data-driven consumer engagement, virtual immersive shopping ex- periences, and blockchain-based digital assets in the retail metaverse, Journal of Self- Governance and Management Economics 10 (2022) 52–66. [29] W. Song, Y. Gong, Y. Wang, Vtonshoes: Virtual try-on of shoes in augmented reality on a mobile device, in: IEEE ISMAR, 2022, pp. 234–242. [30] H. Laaki, Y. Miche, K. Tammi, Prototyping a digital twin for real time remote control over mobile networks: Application of remote surgery, IEEE Access 7 (2019) 20325–20336. [31] Y. Liu, L. Zhang, Y. Yang, L. Zhou, L. Ren, F. Wang, R. Liu, Z. Pang, M. J. Deen, A novel cloud-based framework for the elderly healthcare services using digital twin, IEEE Access 7 (2019) 49088–49101. [32] M. H. Farhat, X. Chiementin, F. Chaari, F. Bolaers, M. Haddar, Digital twin-driven machine learning: ball bearings fault severity classification, Measurement Science and Technology 32 (2021) 044006. [33] A. Siyaev, G.-S. Jo, Towards aircraft maintenance metaverse using speech interactions with virtual objects in mixed reality, Sensors 21 (2021) 2066. [34] Z. Gao, T. C. Braud, et al., Vr-driven museum opportunities: digitized archives in the age of the metaverse, Artnodes (2023) 1–14. [35] M. C. Longo, R. Faraci, Next-generation museum: A metaverse journey into the culture, Sinergie Italian Journal of Management 41 (2023) 147–176. [36] M. V. Conde, K. Turgutlu, Clip-art: Contrastive pre-training for fine-grained art classifica- tion, in: Proceedings of IEEE/CVF CVPR, 2021, pp. 3956–3960. [37] G. Castellano, E. Lella, G. Vessio, Visual link retrieval and knowledge discovery in painting datasets, Multimedia Tools and Applications 80 (2021) 6599–6616. [38] L. Aroyo, N. Stash, Y. Wang, P. Gorgels, L. Rutledge, Chip demonstrator: Semantics- driven recommendations and museum tour generation, in: International Semantic Web Conference, Springer, 2007, pp. 879–886. [39] Y. Wang, N. Stash, R. Sambeek, Y. Schuurmans, L. Aroyo, G. Schreiber, P. Gorgels, Culti- vating personalized museum tours online and on-site, Interdisciplinary science reviews 34 (2009) 139–153. [40] J. L. Sarkar, A. Majumder, A new point-of-interest approach based on multi-itinerary recommendation engine, Expert Systems with Applications 181 (2021) 115026. [41] P. Yochum, L. Chang, T. Gu, M. Zhu, An adaptive genetic algorithm for personalized itinerary planning, IEEE Access 8 (2020) 88147–88157. [42] Z. Bai, Y. Nakashima, N. Garcia, Explain me the painting: Multi-topic knowledgeable art description generation, in: Proceedings of IEEE/CVF ICCV, 2021, pp. 5422–5432.