  The CHROME Manifesto: integrating multimodal data into Cultural
                     Heritage Resources
 Francesco Cutugno                   Felice Dell’Orletta                            Isabella Poggi
 Università degli Studi     Istituto di Linguistica Computazio-                   Università di Roma3
di Napoli “Federico II”              nale del CNR - Pisa                 isabella.poggi@uniroma3.it
cutugno@unina.it             ItaliaNLP Lab - www.italianlp.it
                  Renata Savy                                    Antonio Sorgente
         Università degli Studi di Salerno        Istituto di Scienze Applicate e Sistemi Intelligenti
              rsavy@unisa.it                                      del CNR – Pozzuoli

                                                              tegrati in una piattaforma nella quale è
                    Abstract                                  possibile effettuare una annotazione mul-
                                                              tidimensionale, sono anche utilizzati per
    English. The CHROME Project aims at                       la virtualizzazione di ambienti tridimen-
    collecting a wide portfolio of digital re-                sionali e il porting in ambienti di gaming.
    sources oriented to technological applica-
    tion in Cultural Heritage (henceforth
    CH). The contributions for the realisation
    of such objective come from the efforts              1     Introduction
    of computer scientists, psychologists, ar-
    chitects, and computational linguists,               The CHROME project was born with the inten-
    who constitute an interdisciplinary                  tion of creating a framework and methodology to
    equipe. We are collecting and analyzing              collect, represent and analyze cultural heritage
    texts, spoken materials, architectural sur-          contents and present them through artificial
    veys, and human motion videos, attempt-              agents whose behavior is inspired by accurate
    ing the integration of these data in a mul-          analysis of expert guides, museum curators and
    tidimensional platform based on multi-               tour operators. These gatekeepers are those pro-
    level annotation systems, game engines               fessional figures possessing a significant amount
    importing, and virtualization techniques.            of knowledge concerning how people should be
    As case of study we choose to work on                guided in the exploration of cultural contents. In
    the magic travel along three Charterhous-            this sense, they act as mediators between cultural
    es located in Campania region: S. Marti-             heritage and visitors by using a set of communi-
    no in Naples, S. Lorenzo in Padula (Sa-              cation strategies, both verbal and non-verbal,
    lerno) and S. Giacomo, in Capri.                     aimed at maintaining a high level of engagement
                                                         and delivering high-quality content.
    Italiano. Il progetto CHROME (Cultural                  The overall experience of accessing cultural
    Heritage Resources Orienting Multimod-               heritage is greatly enriched by these professional
    al Experiences – PRIN 2015 MIUR) si                  figures: their knowledge and experience, there-
    pone come scopo la raccolta di una am-               fore, should not be overlooked when designing
    pia gamma di risorse digitali da utiliz-             artificial agents oriented to cultural heritage
    zare in applicazione tecnologiche per il             presentation. As this knowledge is primarily
    miglioramento della fruizione dei beni               based on experience collected on the field, the
    culturali (CH). A questo obiettivo con-              CHROME project aims at recording the perfor-
    corrono interdisciplinarmente informati-             mance of gatekeepers in a sensible environment
    ci, psicologi, architetti, linguisti che             so that formal analysis of their behavior can be
    collezionano testi, registrazioni di par-            documented and studied. The result of this pro-
    lato, rilievi architettonici, video e human          cess (see Fig. 1), conducted jointly by humanities
    motion capture. Questi dati sono poi in-             and computer scientists, will lead to the formali-
 zation of a model describing the behaviors                generation for the realization of interaction
 adopted by gatekeepers when presenting cultural           models in natural language;
 heritage. This will then be used to control a hu-       • RomaTre (Roma, University RomaTre), will
 manoid robot designed to follow similar presen-           confront the theme of multimodal communica-
 tation strategies. Taking in account this aim, the        tion and gesture analysis.
 main goals of the project are to: collect and pro-
 vide the scientific community with reference da-           As case of study we choose to work on the
 tasets to study human-human interaction during          magic travel along three Charterhouses located in
 the presentation of cultural heritage by profes-        Campania region: S. Martino in Naples, S. Lo-
 sionals; investigate the structure of the texts con-    renzo in Padula (Salerno) and S. Giacomo, in
 tained in the collected corpus in order to produce      Capri. All the texts, the architectural surveys and
 automatic approaches supporting text generation         the audio-video recordings, in other words, all
 for oral presentations in cultural heritage domain;     the digital resources that we have and will collect
 provide a reference computational model to sup-         and that we describe in the next sections, concern
 port development of artificial agents exhibiting        with these wonderful sites.
 coherent and engaging behavioural strategies. In
 addition to the orality degree of the assembled         2    The Challenge
 presentations, special attention will be attributed
 to non-verbal aspects. Specifically, CHROME             An interesting aspect of the CHROME project is
 will concentrate on enriching the presentation          to tackle some methodological and technological
 with consistent prosody and gestures. Finally,          challenges.
 another goal is to evaluate the impact of these            A first challenge regards the role of gatekeep-
 agents in simplifying access to cultural heritage       ers in shaping visitors’ experience. In fact, the
 and attract visitors in cultural sites.                 communication in museums is considered an
    For the realization of such goals, five research     important issue even if museum specialists have
 groups are involved in the CHROME projects              been reproached to not do enough in this field
 covering different scientific and humanistic dis-       (Antinucci, 2014), with some exceptions. Many
 ciplines that complement each other. The equipe         advancements have been obtained concerning the
 is highly interdisciplinary and is formed of lin-       attempt to understand museum visitors needs and
 guists (with specific competences in prosody,           to look for new ways of communication to im-
 pragmatics, paralinguistics, and non-verbal be-         prove the experience of visiting museums. Inves-
 havior analysis), computational linguists and           tigations about visitors psychological approach
 computer scientists (with skills in Artificial Intel-   (Dufresne-Tassé C. & Lefebvre A., 1995) helped
 ligence and Human Machine Interaction) The              museologists to develop possible methods not
 teams involved in the project are:                      only to exhibit artefacts but also to give them
• UrbanEco (Naples – Federico II) an interdisci-         sense, providing further explanations. So muse-
   plinary team formed by computer scientists, ar-       um experts may better know visitors, and they
   chitects, linguists, aiming at collecting 3D ar-      are ready to be helped by technology (Cataldo L.,
   chitectural surveys and speech and gesture cor-       2011).
   pora. UrbanEco is also designing multimodal              Moreover, another important aim regards the
   interaction systems; sub-partner linked to this       extraction of concepts and expressive forms from
   unit is the “Polo Museale della Campania -            texts. Natural Language Processing technologies
   MiBaCT” the local section of the Italian Cul-         are crucial in the process of converting textual
   tural Ministry managing more than 30 muse-            documents into knowledge resources. New tech-
   ums in our region;                                    niques for the automatic acquisition of linguistic
• ILC (Pisa – CNR) will develop systems for              knowledge from texts are needed. Terminology
   automatically extracting and organizing lin-          extraction is a central field of research for a
   guistic and domain knowledge from domain-             number of applications, such as Ontology Learn-
   specific corpora;                                     ing and Text Mining. Different methodologies
• UniSa (University of Salerno) will analyze             have been proposed so far to automatically ex-
                                                         tract domain terminology from texts. Term ex-
   texts and will afford the theme of prosodic
                                                         traction systems make use of various degrees of
   analysis of spoken material finalized at speech
                                                         linguistic filtering and of statistical measures
   synthesis issue;
                                                         ranging from raw frequency to Information Re-
• ISASI (Pozzuoli, CNR) will afford the chal-
                                                         trieval measures such as TF-IDF (Salton et al.,
   lenge of CH question answering and language
1988), up to more sophisticated methods such as       user model is pointed out by literature in gesture
the C-NC Value method (Frantzi et al., 1999) or       and Conversational Analysis. Concerning the use
contrastive approach (Bonin et al., 2010).            of words and iconic gestures in didactic explana-
   Another important issue we are going to man-       tions to children and expert and novice adults,
age is the analysis of social behaviors in dissem-    their adaptation to the Speaker’s Recipient De-
ination contexts. The specificities of guided tours   sign and their efficacy for comprehension,
have been investigated in (Mondada, 2013), who        (Campisi & Ӧzyürek, 2013) show that people use
studies the distribution of knowledge among           more words when addressing to adults, but wider
guides. This stresses the need to adapt to differ-    and more informative gestures for children. Also,
ent people during visits; while the relevance of a    precision was defined as providing details on the

                              Fig. 1 The CHROME interdisciplinary chart

topic of one’s discourse (Vincze et al., 2014),
while vagueness is how blurred are the bounda-        3   CHROME methodology
ries of one’s ideas or discourse.
                                                      CHROME is a cross-disciplinary project focused
   Spoken text analysis and, prosodic analysis
                                                      on combining computational linguistics and be-
and synthesis will also be addressed. Advanced
                                                      havior analysis methods with expertise in muse-
use of parametric speech synthesis, such as fo-
                                                      ology to formalise computational models of
cus/prominence generation by prosodic modifi-
                                                      gatekeepers (see Fig. 1). The main result of this
cation or expressive prosody modelling, has been
                                                      research will be the Gatekeeper Computational
tested in some research projects (i.e. ALIZ-E).
                                                      Model (GCM) to generate engaging presenta-
Pushing forward prosodic analysis on gatekeep-
                                                      tions of cultural heritage. The project is orga-
ers’ performance can improve the knowledge
                                                      nized in three main phases. The data collection
needed to synthesize natural specialized speech.
                                                      phase foresees recording of gatekeepers present-
   Finally, the technologies to mediate the access
                                                      ing cultural contents and surveying activities to
to digital cultural heritage will be considered. In
                                                      collect reference texts and annotated 3D models.
order to dynamically assemble and present narra-
                                                      During data analysis, these resources will be an-
tives, a formalism to represent different aspects
                                                      notated and examined to obtain the GCM. Ac-
of cultural stories (i.e. (Mele & Sorgente, 2013))
                                                      tivities will compare oral expressions with ex-
as reported by gatekeepers is necessary. By
                                                      pressions found in texts to automatically select
providing semantically annotated multimedia
                                                      fragments that can compose the final presenta-
materials and contents obtained collecting a doc-
                                                      tion together with gestures and prosody synthe-
umental basis, it is possible to use mash-up tech-
                                                      sis. 3D models annotation will allow to connect
niques to dynamically assemble contents and
                                                      presentation to automatic selection of auxiliary
synchronize them with the available media.
                                                      material. Demonstrator implementation will
serve for the validation of the GCM, to dissemi-         2018) is a structured vocabulary containing
nate the research results and estimate the impact        around 40,000 concepts and descriptions related
of the approach in a real environment.                   to fine art, architecture, decorative arts, archival
   The methodology proposed in the CHROME                materials and material culture. In this step the
project targets the following objectives:                aim is to link the concepts inside charterhouses
• O1. Provide reference datasets to study human-         texts to such vocabulary.
  human interaction during the presentation of
  cultural heritage.                                     4.3       Digital photogrammetry
• O2. Survey written contents for cultural herit-        The architects group have completed the activity
  age dissemination and compare these with the           of aerial photogrammetry digital survey per-
  multimodal materials collected in the frame-           formed by UAV and laser scanner on the 3 main
  work of the CHROME project.                            charterhouses buildings and on many interiors.
• O3. Provide a reference Gatekeeper Computa-
                                                         4.4       Video recording of touristic guide
  tional Model (GCM) to support development of
  artificial agents mimicking the ability of expert      Three of four touristic guides have been video
  guides to select and organize contents and ap-         recorded during tours in the S. Martino Charter-
  plying proper verbal and non-verbal behaviour          house while describing the artistic features, and
• O4. Evaluate the impact of dissemination ori-          each one is followed by a public of four visitors.
  ented, multimodal behavioral models on the             Cameras are pointed on the guide and on the
  capability of artificial agents to simplify access     public, speech sounds are recorded with three
  to digital cultural heritage and attract visitors in   microphones, one headset worn by the guide and
  cultural sites                                         two on field at about one meter equidistant from
                                                         the guide and pointing to the visitors, too.
4     The present status                                 Speech analyses on these material consists of:
                                                         • Orthographic level: Transcription of words,
At the time we are writing this paper (July 2018)
                                                           pauses, filled pauses, false starts;
we are at month 16 of 36. Up to now we have
                                                         • Phonetic level: Phonetic transcription and an-
collected and analysed many data on Campania
                                                           notation of coarticulation phenomena, Speech
Charterhouses: texts, audio, video and 3D recon-
                                                           quality analysis;
                                                         • Syllabic level: Annotation of syllables, Speech
4.1    Charterhouses Text                                  fluency and speech rate analysis;
                                                         • Intonation level: Pitch movements in relation-
For the three Campania Charterhouses (S. Marti-
                                                           ship with the segmental level, Emphasizing pat-
no, S. Lorenzo and S. Giacomo), we have col-
                                                           terns, speech style.
lected 102 texts that belong to different docu-
ment types. In particular, such texts are divided        • Textual level: analysis of sentences, text struc-
among the following categories: Scientific texts;          ture, and communicative goals.
Specialized catalogues; Dissemination cata-              • Multimodal behavior level: annotation of ges-
logues; Specialized guides; Certified web mate-            tures, face and gaze, including physical de-
rial; Dissemination kits.                                  scription, semantic analysis, classification in
                                                           terms of textual, emotional and interactional
4.2    Textual Analysis                                    functions.
                                                            The tool chosen for annotating the speech and
Starting from these texts, some lexical and se-
                                                         video material is ELAN1 . In each video portion
mantic analyses have already been conducted on
                                                         the guide’s gestures and body communication
part of them. The main ones concerned: i) Do-
                                                         will be annotated in terms of the communicative
main vocabulary extraction; ii) Event annotation:
                                                         functions they serve. Thus the annotation will
some texts are annotated added semantic infor-
                                                         allow to distinguish the styles of the guides: e.g.
mation with respect to reference formalism event
                                                         a very “technical” guide will use gestures and
based. In particular, the formalism adopted is
                                                         body communication more frequently aimed at
CSWL (Cultural Story Web Language) (Sor-
                                                         describing the artwork or the author, while a
gente et al., 2016). The purpose of this approach
                                                         “friendly” guide’s body behaviors will be often
is to have a semantic level that will allow us to
                                                         aimed at creating syntony with tourists.
define an information retrieval not only based on
text search; iii) AAT concepts recognition: the
Art & Architecture Thesaurus (AAT) (Getty,               1
