The CHROME Manifesto: integrating multimodal data into Cultural Heritage Resources Francesco Cutugno Felice Dell’Orletta Isabella Poggi Università degli Studi Istituto di Linguistica Computazio- Università di Roma3 di Napoli “Federico II” nale del CNR - Pisa isabella.poggi@uniroma3.it cutugno@unina.it ItaliaNLP Lab - www.italianlp.it felice.dellorletta @ilc.cnr.it Renata Savy Antonio Sorgente Università degli Studi di Salerno Istituto di Scienze Applicate e Sistemi Intelligenti rsavy@unisa.it del CNR – Pozzuoli antonio.sorgente@cnr.it tegrati in una piattaforma nella quale è Abstract possibile effettuare una annotazione mul- tidimensionale, sono anche utilizzati per English. The CHROME Project aims at la virtualizzazione di ambienti tridimen- collecting a wide portfolio of digital re- sionali e il porting in ambienti di gaming. sources oriented to technological applica- tion in Cultural Heritage (henceforth CH). The contributions for the realisation of such objective come from the efforts 1 Introduction of computer scientists, psychologists, ar- chitects, and computational linguists, The CHROME project was born with the inten- who constitute an interdisciplinary tion of creating a framework and methodology to equipe. We are collecting and analyzing collect, represent and analyze cultural heritage texts, spoken materials, architectural sur- contents and present them through artificial veys, and human motion videos, attempt- agents whose behavior is inspired by accurate ing the integration of these data in a mul- analysis of expert guides, museum curators and tidimensional platform based on multi- tour operators. These gatekeepers are those pro- level annotation systems, game engines fessional figures possessing a significant amount importing, and virtualization techniques. of knowledge concerning how people should be As case of study we choose to work on guided in the exploration of cultural contents. In the magic travel along three Charterhous- this sense, they act as mediators between cultural es located in Campania region: S. Marti- heritage and visitors by using a set of communi- no in Naples, S. Lorenzo in Padula (Sa- cation strategies, both verbal and non-verbal, lerno) and S. Giacomo, in Capri. aimed at maintaining a high level of engagement and delivering high-quality content. Italiano. Il progetto CHROME (Cultural The overall experience of accessing cultural Heritage Resources Orienting Multimod- heritage is greatly enriched by these professional al Experiences – PRIN 2015 MIUR) si figures: their knowledge and experience, there- pone come scopo la raccolta di una am- fore, should not be overlooked when designing pia gamma di risorse digitali da utiliz- artificial agents oriented to cultural heritage zare in applicazione tecnologiche per il presentation. As this knowledge is primarily miglioramento della fruizione dei beni based on experience collected on the field, the culturali (CH). A questo obiettivo con- CHROME project aims at recording the perfor- corrono interdisciplinarmente informati- mance of gatekeepers in a sensible environment ci, psicologi, architetti, linguisti che so that formal analysis of their behavior can be collezionano testi, registrazioni di par- documented and studied. The result of this pro- lato, rilievi architettonici, video e human cess (see Fig. 1), conducted jointly by humanities motion capture. Questi dati sono poi in- and computer scientists, will lead to the formali- zation of a model describing the behaviors generation for the realization of interaction adopted by gatekeepers when presenting cultural models in natural language; heritage. This will then be used to control a hu- • RomaTre (Roma, University RomaTre), will manoid robot designed to follow similar presen- confront the theme of multimodal communica- tation strategies. Taking in account this aim, the tion and gesture analysis. main goals of the project are to: collect and pro- vide the scientific community with reference da- As case of study we choose to work on the tasets to study human-human interaction during magic travel along three Charterhouses located in the presentation of cultural heritage by profes- Campania region: S. Martino in Naples, S. Lo- sionals; investigate the structure of the texts con- renzo in Padula (Salerno) and S. Giacomo, in tained in the collected corpus in order to produce Capri. All the texts, the architectural surveys and automatic approaches supporting text generation the audio-video recordings, in other words, all for oral presentations in cultural heritage domain; the digital resources that we have and will collect provide a reference computational model to sup- and that we describe in the next sections, concern port development of artificial agents exhibiting with these wonderful sites. coherent and engaging behavioural strategies. In addition to the orality degree of the assembled 2 The Challenge presentations, special attention will be attributed to non-verbal aspects. Specifically, CHROME An interesting aspect of the CHROME project is will concentrate on enriching the presentation to tackle some methodological and technological with consistent prosody and gestures. Finally, challenges. another goal is to evaluate the impact of these A first challenge regards the role of gatekeep- agents in simplifying access to cultural heritage ers in shaping visitors’ experience. In fact, the and attract visitors in cultural sites. communication in museums is considered an For the realization of such goals, five research important issue even if museum specialists have groups are involved in the CHROME projects been reproached to not do enough in this field covering different scientific and humanistic dis- (Antinucci, 2014), with some exceptions. Many ciplines that complement each other. The equipe advancements have been obtained concerning the is highly interdisciplinary and is formed of lin- attempt to understand museum visitors needs and guists (with specific competences in prosody, to look for new ways of communication to im- pragmatics, paralinguistics, and non-verbal be- prove the experience of visiting museums. Inves- havior analysis), computational linguists and tigations about visitors psychological approach computer scientists (with skills in Artificial Intel- (Dufresne-Tassé C. & Lefebvre A., 1995) helped ligence and Human Machine Interaction) The museologists to develop possible methods not teams involved in the project are: only to exhibit artefacts but also to give them • UrbanEco (Naples – Federico II) an interdisci- sense, providing further explanations. So muse- plinary team formed by computer scientists, ar- um experts may better know visitors, and they chitects, linguists, aiming at collecting 3D ar- are ready to be helped by technology (Cataldo L., chitectural surveys and speech and gesture cor- 2011). pora. UrbanEco is also designing multimodal Moreover, another important aim regards the interaction systems; sub-partner linked to this extraction of concepts and expressive forms from unit is the “Polo Museale della Campania - texts. Natural Language Processing technologies MiBaCT” the local section of the Italian Cul- are crucial in the process of converting textual tural Ministry managing more than 30 muse- documents into knowledge resources. New tech- ums in our region; niques for the automatic acquisition of linguistic • ILC (Pisa – CNR) will develop systems for knowledge from texts are needed. Terminology automatically extracting and organizing lin- extraction is a central field of research for a guistic and domain knowledge from domain- number of applications, such as Ontology Learn- specific corpora; ing and Text Mining. Different methodologies • UniSa (University of Salerno) will analyze have been proposed so far to automatically ex- tract domain terminology from texts. Term ex- texts and will afford the theme of prosodic traction systems make use of various degrees of analysis of spoken material finalized at speech linguistic filtering and of statistical measures synthesis issue; ranging from raw frequency to Information Re- • ISASI (Pozzuoli, CNR) will afford the chal- trieval measures such as TF-IDF (Salton et al., lenge of CH question answering and language 1988), up to more sophisticated methods such as user model is pointed out by literature in gesture the C-NC Value method (Frantzi et al., 1999) or and Conversational Analysis. Concerning the use contrastive approach (Bonin et al., 2010). of words and iconic gestures in didactic explana- Another important issue we are going to man- tions to children and expert and novice adults, age is the analysis of social behaviors in dissem- their adaptation to the Speaker’s Recipient De- ination contexts. The specificities of guided tours sign and their efficacy for comprehension, have been investigated in (Mondada, 2013), who (Campisi & Ӧzyürek, 2013) show that people use studies the distribution of knowledge among more words when addressing to adults, but wider guides. This stresses the need to adapt to differ- and more informative gestures for children. Also, ent people during visits; while the relevance of a precision was defined as providing details on the Fig. 1 The CHROME interdisciplinary chart topic of one’s discourse (Vincze et al., 2014), while vagueness is how blurred are the bounda- 3 CHROME methodology ries of one’s ideas or discourse. CHROME is a cross-disciplinary project focused Spoken text analysis and, prosodic analysis on combining computational linguistics and be- and synthesis will also be addressed. Advanced havior analysis methods with expertise in muse- use of parametric speech synthesis, such as fo- ology to formalise computational models of cus/prominence generation by prosodic modifi- gatekeepers (see Fig. 1). The main result of this cation or expressive prosody modelling, has been research will be the Gatekeeper Computational tested in some research projects (i.e. ALIZ-E). Model (GCM) to generate engaging presenta- Pushing forward prosodic analysis on gatekeep- tions of cultural heritage. The project is orga- ers’ performance can improve the knowledge nized in three main phases. The data collection needed to synthesize natural specialized speech. phase foresees recording of gatekeepers present- Finally, the technologies to mediate the access ing cultural contents and surveying activities to to digital cultural heritage will be considered. In collect reference texts and annotated 3D models. order to dynamically assemble and present narra- During data analysis, these resources will be an- tives, a formalism to represent different aspects notated and examined to obtain the GCM. Ac- of cultural stories (i.e. (Mele & Sorgente, 2013)) tivities will compare oral expressions with ex- as reported by gatekeepers is necessary. By pressions found in texts to automatically select providing semantically annotated multimedia fragments that can compose the final presenta- materials and contents obtained collecting a doc- tion together with gestures and prosody synthe- umental basis, it is possible to use mash-up tech- sis. 3D models annotation will allow to connect niques to dynamically assemble contents and presentation to automatic selection of auxiliary synchronize them with the available media. material. Demonstrator implementation will serve for the validation of the GCM, to dissemi- 2018) is a structured vocabulary containing nate the research results and estimate the impact around 40,000 concepts and descriptions related of the approach in a real environment. to fine art, architecture, decorative arts, archival The methodology proposed in the CHROME materials and material culture. In this step the project targets the following objectives: aim is to link the concepts inside charterhouses • O1. Provide reference datasets to study human- texts to such vocabulary. human interaction during the presentation of cultural heritage. 4.3 Digital photogrammetry • O2. Survey written contents for cultural herit- The architects group have completed the activity age dissemination and compare these with the of aerial photogrammetry digital survey per- multimodal materials collected in the frame- formed by UAV and laser scanner on the 3 main work of the CHROME project. charterhouses buildings and on many interiors. • O3. Provide a reference Gatekeeper Computa- 4.4 Video recording of touristic guide tional Model (GCM) to support development of artificial agents mimicking the ability of expert Three of four touristic guides have been video guides to select and organize contents and ap- recorded during tours in the S. Martino Charter- plying proper verbal and non-verbal behaviour house while describing the artistic features, and • O4. Evaluate the impact of dissemination ori- each one is followed by a public of four visitors. ented, multimodal behavioral models on the Cameras are pointed on the guide and on the capability of artificial agents to simplify access public, speech sounds are recorded with three to digital cultural heritage and attract visitors in microphones, one headset worn by the guide and cultural sites two on field at about one meter equidistant from the guide and pointing to the visitors, too. 4 The present status Speech analyses on these material consists of: • Orthographic level: Transcription of words, At the time we are writing this paper (July 2018) pauses, filled pauses, false starts; we are at month 16 of 36. Up to now we have • Phonetic level: Phonetic transcription and an- collected and analysed many data on Campania notation of coarticulation phenomena, Speech Charterhouses: texts, audio, video and 3D recon- quality analysis; structions. • Syllabic level: Annotation of syllables, Speech 4.1 Charterhouses Text fluency and speech rate analysis; • Intonation level: Pitch movements in relation- For the three Campania Charterhouses (S. Marti- ship with the segmental level, Emphasizing pat- no, S. Lorenzo and S. Giacomo), we have col- terns, speech style. lected 102 texts that belong to different docu- ment types. In particular, such texts are divided • Textual level: analysis of sentences, text struc- among the following categories: Scientific texts; ture, and communicative goals. Specialized catalogues; Dissemination cata- • Multimodal behavior level: annotation of ges- logues; Specialized guides; Certified web mate- tures, face and gaze, including physical de- rial; Dissemination kits. scription, semantic analysis, classification in terms of textual, emotional and interactional 4.2 Textual Analysis functions. The tool chosen for annotating the speech and Starting from these texts, some lexical and se- video material is ELAN1 . In each video portion mantic analyses have already been conducted on the guide’s gestures and body communication part of them. The main ones concerned: i) Do- will be annotated in terms of the communicative main vocabulary extraction; ii) Event annotation: functions they serve. Thus the annotation will some texts are annotated added semantic infor- allow to distinguish the styles of the guides: e.g. mation with respect to reference formalism event a very “technical” guide will use gestures and based. In particular, the formalism adopted is body communication more frequently aimed at CSWL (Cultural Story Web Language) (Sor- describing the artwork or the author, while a gente et al., 2016). The purpose of this approach “friendly” guide’s body behaviors will be often is to have a semantic level that will allow us to aimed at creating syntony with tourists. define an information retrieval not only based on text search; iii) AAT concepts recognition: the Art & Architecture Thesaurus (AAT) (Getty, 1 https://tla.mpi.nl/tools/tla-tools/elan/ 5 Summarizing Frantzi, K., Ananiadou, S. (1999). The C–value / NC Value domain independent method for multi–word CHROME aims at formalizing data collection term extraction. and annotation paradigms for architectural herit- Getty AAT: About the AAT. age, in particular the annotation regards texts, http://www.getty.edu/research/tools/vocabularies/a video, audio and gestures. From the annotated at/. Accessed April 2018 data, we will: i) perform correlation analysis to Sorgente A., Calabrese A., Coda G., Vanacore P., and identify cross-domain patterns and link them to Mele F. ( 2016). Building multimedia dialogues communicative goals; ii) describe how an expert annotating heterogeneous resources. In Artificial presenter relates to the physical environment Intelligence for Cultural Heritage, chapter 3, pages while she describing it; iii) identify which com- 49–82. Cambridge Scholars Publishing. municative strategies can be mimicked by an artificial agent with the available technology. Salton G., Buckley, C. (1988). Term–Weighting Ap- proaches in Automatic Text Retrieval. Information Possible domains of simulation will the deictic Processing and Management, 24(5), pp. 513-523 and iconic gestures, face and gaze behaviour; iv) implement a final demonstrator adopting the Vincze L., Poggi I., D’Errico F. (2014). Precision in formalized strategies to generate dynamic Gestures and Words. Ricerche di Pedagogia e Di- presentations for the attending visitors. dattica – Journal of Theories and Research in Edu- cation 9, 1. Communicating certainty and uncer- tainty: Multidisciplinary perspectives on epistemic- 6 Aknowledgments ity in everyday life This work is funded by the Italian PRIN project Cultural Heritage Resources Orienting Multimo- dal Experience (CHROME) #B52F15000450001. Reference Antinucci F. (2014). Comunicare nel museo. Laterza Milano Bonin F., Dell’Orletta F., Montemagni S., Venturi G. (2010). A Contrastive Approach to Multi-word Ex- traction from Domain-specific Corpora. In: LREC’10 – Seventh International Conference on Language Resources and Evaluation (Valletta, Malta, 17-23 May 2010). Proceedings, pp. 3222 – 3229. Campisi E. and Оzyürek, A. (2013). Iconicity as a communicative strategy: Recipient design in mul- timodal demonstrations for adults and children. Journal of Pragmatics (47), pp. 14-27 Cataldo L. (2011). Dal Museum Theatre al Digital Storytelling. Franco Angeli Milano Dufresne-Tassé C., Lefebvre A. (1995). Psychologie du visiteur du musée. Hurtubise Montréal Mele, F., Sorgente, A. (2013). OntoTimeFL – A For- malism for Temporal Annotation and Reasoning for Natural Language Text. New Challenges in Distributed Information Filtering and Retrieval, Studies in Computational Intelligence 439, pp. 151-170 Mondada, L. (2013). Displaying, contesting and nego- tiating epistemic authority in social interaction: Descriptions and questions in guided visits. Dis- course Studies 15, pp. 597-626