<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The CHROME Manifesto: integrating multimodal data into Cultural Heritage Resources</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Antonio Sorgente Istituto di Scienze Applicate e Sistemi Intelligenti del CNR - Pozzuoli</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Isabella Poggi Università di Roma3</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Renata Savy Università degli Studi di Salerno</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. The CHROME Project aims at collecting a wide portfolio of digital resources oriented to technological application in Cultural Heritage (henceforth CH). The contributions for the realisation of such objective come from the efforts of computer scientists, psychologists, architects, and computational linguists, who constitute an interdisciplinary equipe. We are collecting and analyzing texts, spoken materials, architectural surveys, and human motion videos, attempting the integration of these data in a multidimensional platform based on multilevel annotation systems, game engines importing, and virtualization techniques. As case of study we choose to work on the magic travel along three Charterhouses located in Campania region: S. Martino in Naples, S. Lorenzo in Padula (Salerno) and S. Giacomo, in Capri.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. Il progetto CHROME (Cultural
Heritage Resources Orienting
Multimodal Experiences – PRIN 2015 MIUR) si
pone come scopo la raccolta di una
ampia gamma di risorse digitali da
utilizzare in applicazione tecnologiche per il
miglioramento della fruizione dei beni
culturali (CH). A questo obiettivo
concorrono interdisciplinarmente
informatici, psicologi, architetti, linguisti che
collezionano testi, registrazioni di
parlato, rilievi architettonici, video e human
motion capture. Questi dati sono poi
integrati in una piattaforma nella quale è
possibile effettuare una annotazione
multidimensionale, sono anche utilizzati per
la virtualizzazione di ambienti
tridimensionali e il porting in ambienti di gaming.
1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>The CHROME project was born with the
intention of creating a framework and methodology to
collect, represent and analyze cultural heritage
contents and present them through artificial
agents whose behavior is inspired by accurate
analysis of expert guides, museum curators and
tour operators. These gatekeepers are those
professional figures possessing a significant amount
of knowledge concerning how people should be
guided in the exploration of cultural contents. In
this sense, they act as mediators between cultural
heritage and visitors by using a set of
communication strategies, both verbal and non-verbal,
aimed at maintaining a high level of engagement
and delivering high-quality content.</p>
      <p>The overall experience of accessing cultural
heritage is greatly enriched by these professional
figures: their knowledge and experience,
therefore, should not be overlooked when designing
artificial agents oriented to cultural heritage
presentation. As this knowledge is primarily
based on experience collected on the field, the
CHROME project aims at recording the
performance of gatekeepers in a sensible environment
so that formal analysis of their behavior can be
documented and studied. The result of this
process (see Fig. 1), conducted jointly by humanities
and computer scientists, will lead to the
formalization of a model describing the behaviors
adopted by gatekeepers when presenting cultural
heritage. This will then be used to control a
humanoid robot designed to follow similar
presentation strategies. Taking in account this aim, the
main goals of the project are to: collect and
provide the scientific community with reference
datasets to study human-human interaction during
the presentation of cultural heritage by
professionals; investigate the structure of the texts
contained in the collected corpus in order to produce
automatic approaches supporting text generation
for oral presentations in cultural heritage domain;
provide a reference computational model to
support development of artificial agents exhibiting
coherent and engaging behavioural strategies. In
addition to the orality degree of the assembled
presentations, special attention will be attributed
to non-verbal aspects. Specifically, CHROME
will concentrate on enriching the presentation
with consistent prosody and gestures. Finally,
another goal is to evaluate the impact of these
agents in simplifying access to cultural heritage
and attract visitors in cultural sites.</p>
      <p>For the realization of such goals, five research
groups are involved in the CHROME projects
covering different scientific and humanistic
disciplines that complement each other. The equipe
is highly interdisciplinary and is formed of
linguists (with specific competences in prosody,
pragmatics, paralinguistics, and non-verbal
behavior analysis), computational linguists and
computer scientists (with skills in Artificial
Intelligence and Human Machine Interaction) The
teams involved in the project are:
• UrbanEco (Naples – Federico II) an
interdisciplinary team formed by computer scientists,
architects, linguists, aiming at collecting 3D
architectural surveys and speech and gesture
corpora. UrbanEco is also designing multimodal
interaction systems; sub-partner linked to this
unit is the “Polo Museale della Campania
MiBaCT” the local section of the Italian
Cultural Ministry managing more than 30
museums in our region;
• ILC (Pisa – CNR) will develop systems for
automatically extracting and organizing
linguistic and domain knowledge from
domainspecific corpora;
• UniSa (University of Salerno) will analyze
texts and will afford the theme of prosodic
analysis of spoken material finalized at speech
synthesis issue;
• ISASI (Pozzuoli, CNR) will afford the
challenge of CH question answering and language
generation for the realization of interaction
models in natural language;
• RomaTre (Roma, University RomaTre), will
confront the theme of multimodal
communication and gesture analysis.</p>
      <p>As case of study we choose to work on the
magic travel along three Charterhouses located in
Campania region: S. Martino in Naples, S.
Lorenzo in Padula (Salerno) and S. Giacomo, in
Capri. All the texts, the architectural surveys and
the audio-video recordings, in other words, all
the digital resources that we have and will collect
and that we describe in the next sections, concern
with these wonderful sites.
2</p>
    </sec>
    <sec id="sec-3">
      <title>The Challenge</title>
      <p>An interesting aspect of the CHROME project is
to tackle some methodological and technological
challenges.</p>
      <p>
        A first challenge regards the role of
gatekeepers in shaping visitors’ experience. In fact, the
communication in museums is considered an
important issue even if museum specialists have
been reproached to not do enough in this field
        <xref ref-type="bibr" rid="ref1">(Antinucci, 2014)</xref>
        , with some exceptions. Many
advancements have been obtained concerning the
attempt to understand museum visitors needs and
to look for new ways of communication to
improve the experience of visiting museums.
Investigations about visitors psychological approach
        <xref ref-type="bibr" rid="ref5">(Dufresne-Tassé C. &amp; Lefebvre A., 1995)</xref>
        helped
museologists to develop possible methods not
only to exhibit artefacts but also to give them
sense, providing further explanations. So
museum experts may better know visitors, and they
are ready to be helped by technology
        <xref ref-type="bibr" rid="ref4">(Cataldo L.,
2011)</xref>
        .
      </p>
      <p>
        Moreover, another important aim regards the
extraction of concepts and expressive forms from
texts. Natural Language Processing technologies
are crucial in the process of converting textual
documents into knowledge resources. New
techniques for the automatic acquisition of linguistic
knowledge from texts are needed. Terminology
extraction is a central field of research for a
number of applications, such as Ontology
Learning and Text Mining. Different methodologies
have been proposed so far to automatically
extract domain terminology from texts. Term
extraction systems make use of various degrees of
linguistic filtering and of statistical measures
ranging from raw frequency to Information
Retrieval measures such as TF-IDF
        <xref ref-type="bibr" rid="ref11">(Salton et al.,
1988)</xref>
        , up to more sophisticated methods such as
the C-NC Value method
        <xref ref-type="bibr" rid="ref8">(Frantzi et al., 1999)</xref>
        or
contrastive approach
        <xref ref-type="bibr" rid="ref2">(Bonin et al., 2010)</xref>
        .
      </p>
      <p>
        Another important issue we are going to
manage is the analysis of social behaviors in
dissemination contexts. The specificities of guided tours
have been investigated in
        <xref ref-type="bibr" rid="ref7">(Mondada, 2013)</xref>
        , who
studies the distribution of knowledge among
guides. This stresses the need to adapt to
different people during visits; while the relevance of a
user model is pointed out by literature in gesture
and Conversational Analysis. Concerning the use
of words and iconic gestures in didactic
explanations to children and expert and novice adults,
their adaptation to the Speaker’s Recipient
Design and their efficacy for comprehension,
        <xref ref-type="bibr" rid="ref3">(Campisi &amp; Ӧzyürek, 2013)</xref>
        show that people use
more words when addressing to adults, but wider
and more informative gestures for children. Also,
precision was defined as providing details on the
topic of one’s discourse
        <xref ref-type="bibr" rid="ref12">(Vincze et al., 2014)</xref>
        ,
while vagueness is how blurred are the
boundaries of one’s ideas or discourse.
      </p>
      <p>Spoken text analysis and, prosodic analysis
and synthesis will also be addressed. Advanced
use of parametric speech synthesis, such as
focus/prominence generation by prosodic
modification or expressive prosody modelling, has been
tested in some research projects (i.e. ALIZ-E).
Pushing forward prosodic analysis on
gatekeepers’ performance can improve the knowledge
needed to synthesize natural specialized speech.</p>
      <p>
        Finally, the technologies to mediate the access
to digital cultural heritage will be considered. In
order to dynamically assemble and present
narratives, a formalism to represent different aspects
of cultural stories (i.e.
        <xref ref-type="bibr" rid="ref6">(Mele &amp; Sorgente, 2013)</xref>
        )
as reported by gatekeepers is necessary. By
providing semantically annotated multimedia
materials and contents obtained collecting a
documental basis, it is possible to use mash-up
techniques to dynamically assemble contents and
synchronize them with the available media.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>CHROME methodology</title>
      <p>CHROME is a cross-disciplinary project focused
on combining computational linguistics and
behavior analysis methods with expertise in
museology to formalise computational models of
gatekeepers (see Fig. 1). The main result of this
research will be the Gatekeeper Computational
Model (GCM) to generate engaging
presentations of cultural heritage. The project is
organized in three main phases. The data collection
phase foresees recording of gatekeepers
presenting cultural contents and surveying activities to
collect reference texts and annotated 3D models.
During data analysis, these resources will be
annotated and examined to obtain the GCM.
Activities will compare oral expressions with
expressions found in texts to automatically select
fragments that can compose the final
presentation together with gestures and prosody
synthesis. 3D models annotation will allow to connect
presentation to automatic selection of auxiliary
material. Demonstrator implementation will
serve for the validation of the GCM, to
disseminate the research results and estimate the impact
of the approach in a real environment.</p>
      <p>The methodology proposed in the CHROME
project targets the following objectives:
• O1. Provide reference datasets to study
humanhuman interaction during the presentation of
cultural heritage.
• O2. Survey written contents for cultural
heritage dissemination and compare these with the
multimodal materials collected in the
framework of the CHROME project.
• O3. Provide a reference Gatekeeper
Computational Model (GCM) to support development of
artificial agents mimicking the ability of expert
guides to select and organize contents and
applying proper verbal and non-verbal behaviour
• O4. Evaluate the impact of dissemination
oriented, multimodal behavioral models on the
capability of artificial agents to simplify access
to digital cultural heritage and attract visitors in
cultural sites
4</p>
    </sec>
    <sec id="sec-5">
      <title>The present status</title>
      <p>At the time we are writing this paper (July 2018)
we are at month 16 of 36. Up to now we have
collected and analysed many data on Campania
Charterhouses: texts, audio, video and 3D
reconstructions.
4.1</p>
      <sec id="sec-5-1">
        <title>Charterhouses Text</title>
        <p>For the three Campania Charterhouses (S.
Martino, S. Lorenzo and S. Giacomo), we have
collected 102 texts that belong to different
document types. In particular, such texts are divided
among the following categories: Scientific texts;
Specialized catalogues; Dissemination
catalogues; Specialized guides; Certified web
material; Dissemination kits.
4.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Textual Analysis</title>
        <p>
          Starting from these texts, some lexical and
semantic analyses have already been conducted on
part of them. The main ones concerned: i)
Domain vocabulary extraction; ii) Event annotation:
some texts are annotated added semantic
information with respect to reference formalism event
based. In particular, the formalism adopted is
CSWL (Cultural Story Web Language)
          <xref ref-type="bibr" rid="ref10">(Sorgente et al., 2016)</xref>
          . The purpose of this approach
is to have a semantic level that will allow us to
define an information retrieval not only based on
text search; iii) AAT concepts recognition: the
Art &amp; Architecture Thesaurus (AAT)
          <xref ref-type="bibr" rid="ref9">(Getty,
2018)</xref>
          is a structured vocabulary containing
around 40,000 concepts and descriptions related
to fine art, architecture, decorative arts, archival
materials and material culture. In this step the
aim is to link the concepts inside charterhouses
texts to such vocabulary.
4.3
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>Digital photogrammetry</title>
        <p>The architects group have completed the activity
of aerial photogrammetry digital survey
performed by UAV and laser scanner on the 3 main
charterhouses buildings and on many interiors.
4.4</p>
      </sec>
      <sec id="sec-5-4">
        <title>Video recording of touristic guide</title>
        <p>Three of four touristic guides have been video
recorded during tours in the S. Martino
Charterhouse while describing the artistic features, and
each one is followed by a public of four visitors.
Cameras are pointed on the guide and on the
public, speech sounds are recorded with three
microphones, one headset worn by the guide and
two on field at about one meter equidistant from
the guide and pointing to the visitors, too.
Speech analyses on these material consists of:
• Orthographic level: Transcription of words,
pauses, filled pauses, false starts;
• Phonetic level: Phonetic transcription and
annotation of coarticulation phenomena, Speech
quality analysis;
• Syllabic level: Annotation of syllables, Speech
fluency and speech rate analysis;
• Intonation level: Pitch movements in
relationship with the segmental level, Emphasizing
patterns, speech style.
• Textual level: analysis of sentences, text
structure, and communicative goals.
• Multimodal behavior level: annotation of
gestures, face and gaze, including physical
description, semantic analysis, classification in
terms of textual, emotional and interactional
functions.</p>
        <p>The tool chosen for annotating the speech and
1
video material is ELAN . In each video portion
the guide’s gestures and body communication
will be annotated in terms of the communicative
functions they serve. Thus the annotation will
allow to distinguish the styles of the guides: e.g.
a very “technical” guide will use gestures and
body communication more frequently aimed at
describing the artwork or the author, while a
“friendly” guide’s body behaviors will be often
aimed at creating syntony with tourists.
1 https://tla.mpi.nl/tools/tla-tools/elan/</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Summarizing</title>
      <p>CHROME aims at formalizing data collection
and annotation paradigms for architectural
heritage, in particular the annotation regards texts,
video, audio and gestures. From the annotated
data, we will: i) perform correlation analysis to
identify cross-domain patterns and link them to
communicative goals; ii) describe how an expert
presenter relates to the physical environment
while she describing it; iii) identify which
communicative strategies can be mimicked by an
artificial agent with the available technology.
Possible domains of simulation will the deictic
and iconic gestures, face and gaze behaviour; iv)
implement a final demonstrator adopting the
formalized strategies to generate dynamic
presentations for the attending visitors.
6</p>
    </sec>
    <sec id="sec-7">
      <title>Aknowledgments</title>
      <p>This work is funded by the Italian PRIN project
Cultural Heritage Resources Orienting
Multimodal Experience (CHROME)
#B52F15000450001.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Antinucci F.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Comunicare nel museo</article-title>
          .
          <source>Laterza Milano</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Bonin F.</given-names>
            ,
            <surname>Dell'Orletta F</surname>
          </string-name>
          .,
          <string-name>
            <surname>Montemagni</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venturi</surname>
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>A Contrastive Approach to Multi-word Extraction from Domain-specific Corpora</article-title>
          . In: LREC'
          <fpage>10</fpage>
          - Seventh International Conference on Language Resources and
          <string-name>
            <surname>Evaluation (Valletta</surname>
          </string-name>
          , Malta,
          <fpage>17</fpage>
          -
          <lpage>23</lpage>
          May
          <year>2010</year>
          ). Proceedings, pp.
          <fpage>3222</fpage>
          -
          <lpage>3229</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Campisi E.</given-names>
            and
            <surname>Оzyürek</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Iconicity as a communicative strategy: Recipient design in multimodal demonstrations for adults and children</article-title>
          .
          <source>Journal of Pragmatics (47)</source>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>27</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Cataldo L.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Dal Museum Theatre al Digital Storytelling</article-title>
          . Franco Angeli Milano
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Dufresne-Tassé</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lefebvre</surname>
            <given-names>A.</given-names>
          </string-name>
          (
          <year>1995</year>
          ).
          <article-title>Psychologie du visiteur du musée</article-title>
          .
          <source>Hurtubise Montréal</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Mele</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sorgente</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>OntoTimeFL - A Formalism for Temporal Annotation and Reasoning for Natural Language Text</article-title>
          .
          <article-title>New Challenges in Distributed Information Filtering and Retrieval</article-title>
          ,
          <source>Studies in Computational Intelligence</source>
          <volume>439</volume>
          , pp.
          <fpage>151</fpage>
          -
          <lpage>170</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Mondada</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Displaying, contesting and negotiating epistemic authority in social interaction: Descriptions and questions in guided visits</article-title>
          .
          <source>Discourse Studies 15</source>
          , pp.
          <fpage>597</fpage>
          -
          <lpage>626</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Frantzi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ananiadou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>The C-value / NC Value domain independent method for multi-word term extraction</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Getty</surname>
            <given-names>AAT</given-names>
          </string-name>
          :
          <article-title>About the AAT</article-title>
          . http://www.getty.edu/research/tools/vocabularies/a at/. Accessed April 2018
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Sorgente A.</given-names>
            ,
            <surname>Calabrese</surname>
          </string-name>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Coda</surname>
          </string-name>
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Vanacore</surname>
          </string-name>
          <string-name>
            <given-names>P.</given-names>
            , and
            <surname>Mele</surname>
          </string-name>
          <string-name>
            <surname>F.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Building multimedia dialogues annotating heterogeneous resources</article-title>
          .
          <source>In Artificial Intelligence for Cultural Heritage, chapter 3</source>
          , pages
          <fpage>49</fpage>
          -
          <lpage>82</lpage>
          . Cambridge Scholars Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Salton G.</given-names>
            ,
            <surname>Buckley</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>1988</year>
          ).
          <article-title>Term-Weighting Approaches in Automatic Text Retrieval</article-title>
          .
          <source>Information Processing and Management</source>
          ,
          <volume>24</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>513</fpage>
          -
          <lpage>523</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Vincze L.</given-names>
            ,
            <surname>Poggi</surname>
          </string-name>
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>D'Errico</surname>
          </string-name>
          <string-name>
            <surname>F.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Precision in Gestures and Words. Ricerche di Pedagogia e Didattica - Journal of Theories and Research in Education 9, 1. Communicating certainty and uncertainty: Multidisciplinary perspectives on epistemicity in everyday life</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>