=Paper= {{Paper |id=Vol-1621/paper4 |storemode=property |title=Cultural Heritage Presentations with a Humanoid Robot Using Implicit Feedback |pdfUrl=https://ceur-ws.org/Vol-1621/paper4.pdf |volume=Vol-1621 |authors=Antonio Origlia,Antonio Rossi,Maria Laura Chiacchio,Francesco Cutugno |dblpUrl=https://dblp.org/rec/conf/avi/OrigliaRCC16 }} ==Cultural Heritage Presentations with a Humanoid Robot Using Implicit Feedback== https://ceur-ws.org/Vol-1621/paper4.pdf
      Cultural heritage presentations with a humanoid robot
                      using implicit feedback

                    Antonio Origlia                                 Antonio Rossi                  Maria Laura Chiacchio
           Dept. of Electrical Engineering                Dept. of Electrical Engineering        Dept. of Electrical Engineering
            and Information Technology,                    and Information Technology,            and Information Technology,
                University of Naples                           University of Naples                   University of Naples
                    “Federico II”                                  “Federico II”                          “Federico II”
                    Naples, Italy                                  Naples, Italy                          Naples, Italy
                   antonio.origlia@unina.it                    rossi.antonio.84@gmail.com          marialaurachiacchio@gmail.com

                                                               Francesco Cutugno
                                                          Dept. of Electrical Engineering
                                                           and Information Technology,
                                                               University of Naples
                                                                   “Federico II’
                                                                   Naples, Italy
                                                                     cutugno@unina.it


ABSTRACT
In recent years, there has been an increasing interest towards                    Communication in museums is considered an important
cultural heritage in the field of ICT applications. To design                  issue even if museum specialists have sometimes been re-
efficient communication strategies the knowledge possessed                     proached for not doing enough in this field. Many advance-
by art historians, with expertise in mediating access to cul-                  ments have been obtained in the last years concerning the
tural heritage, has become a valuable resource. In this work,                  attempt to understand museum visitors needs and develop
we present a human-robot interaction setup where people ac-                    new interactive ways to address these. Among many oth-
tively choose how much information they would like to access                   ers, investigations about visitors psychological approach [?]
concerning the available topics. To provide engaging presen-                   helped museologists to develop possible methods not only
tations, a humanoid robot exhibiting a general behaviour                       to exhibit artifacts but also to give them a sense, providing
based on a human presenter was used and a mathemati-                           further explanations. So museums experts may be ready to
cal model to keep track of content navigation was designed.                    take an active role in the development of technologies to sup-
Monitoring the evolution of the interactive session allows to                  port visitors’ experiences [?]. Artificial agents can support
estimate users’ general interest towards the available con-                    visitors to obtain information about cultural heritage in a
tents. Our results show that people were very satisfied by                     pleasant way. This is because they use interaction paradigms
the interaction experience and that automatically detected                     that do not require users to deviate too much from everyday
interests were consistent with the users’. Both subjective                     communication. Of course, an artificial agent must rely on
and objective metrics were used to validate the approach.                      written content as Natural Language Generation techniques
                                                                               are still experimental. To obtain a resource enabling an ar-
                                                                               tifact to talk to a user, it is important to start from texts as
CCS Concepts                                                                   close as possible to spoken language, as previous investiga-
•Human-centered computing → Collaborative inter-                               tion in psychology suggests, too [?]. In order to obtain such
action;                                                                        a resource, we compiled our reference database starting from
                                                                               speech transcriptions. We collected speech material of a hu-
Keywords                                                                       man expert presenting works of art and converted it into a
                                                                               form an artificial agent can use in a presentation task. There
Cultural heritage presentation, human-robot interaction, im-                   are a number of examples of robots being used in cultural
plicit feedback                                                                heritage dissemination. Among existing systems for inter-
                                                                               active edutainment with robots, ROBOTINHO [?] provides
1.    INTRODUCTION                                                             multimodal interaction through spoken presentations, facial
                                                                               expressions and gestures. Its hands convey greetings, de-
                                                                               ictic and other spatial information, while gaze establishes
                                                                               joint attention. Also, the robot presented in [?] attracts
                                                                               visitors attention through verbal and nonverbal action. In
                                                                               these works, the authors concentrated on the development of
 Copyright 2016 for this paper by its authors. Copying permitted for private   social strategies to obtain a believable artificial presenter. In
and academic purposes.                                                         interactive approaches, user feedback has been used to ob-
                                                                               tain user models for recommendation systems and personali-
                                                                               sation (e.g. [?]). Most of these systems, however, use explicit
feedback, typically rating, to recommend items. However,
research in the Information Retrieval field highlighted that                                                       Root
explicit feedback poses a significant problem: the obtrusive-
ness of the approach [?]. This problem becomes critical in
edutainment setups.
   In this paper, we describe how the robot’s behavioural                                               Cat.        ...       Cat.
strategies and discourse structure were designed on the ba-
sis of orally delivered cultural heritage presentations. From
the transcription of these recordings, a series of information
nodes have been identified and organized in a tree struc-                                   Item        ...        Item
ture representing general concepts, topic sharing and deep-
ening levels. This structure is then automatically populated
with information describing the feedback strength the sys-
tem should consider, depending on how data are explored.                         Style      Icon.      Auth.
An interactive task using a humanoid robot and a tablet
interface is used for validation.

2.     ROBOT BEHAVIOUR DESIGN                                       Info          ...        Info
    To correctly design the robot’s behaviour during presen-
tations, we collected a corpus of audio-visual reference ma-
terial to study how a human expert delivers the contents re-
                                                                    ...
lated to the considered works of art. In this case, one of the
authors (Maria Laura Chiacchio) performed the presenta-
tions. The total recorded material consists of two hours and       Figure 1: General structure of the XML tree repre-
a half. In this work, we concentrate on reproducing speech         senting contents organization.
and gestures in a humanoid robot, but the full dataset con-
tains material for future analyses.
                                                                           preceding point. However, a different gesture, encir-
    Considering that every attempt to classify art may be con-
                                                                           cling the area of interest, has been identified as differ-
troversial, it has been anyway necessary for the present re-
                                                                           ent from simple pointing;
search to look for wide categories that could be represen-
tative of the most famous styles of European Art. In each               • When the subject of the presentation is not directly
category, among many famous artists and masterpieces, the                 related to the painting (e.g. it refers to the author’s
choice of the paintings has been done considering their im-               life) the presenter looks at the audience. This signals
portance towards the category itself. Moreover, it has been               the absence of direct correlates of the presentation in
important also to avoid very well-known masterpieces, be-                 the painting.
cause their fame could influence the choice of the users to
go further with the exploration. So, for example, instead of
considering the Gioconda by Leonardo da Vinci, another of
                                                                   3.      COLLECTING IMPLICIT FEEDBACK
his paintings has been presented, which is a famous work as           The general structure of the data is organized into an
well but not a universally recognized icon.                        XML tree where two kinds of nodes are available: data nodes
    The high-quality recorded speech was automatically tran-       and abstract nodes. The tree is composed by the following
scribed and manually corrected to remove disfluencies and          elements:
filled pauses. Punctuation and Synthetic Speech Markup
                                                                        • An abstract root node
Language (SSML) tags were also added to support the gen-
eration of synthetic speech, obtained using the MIVOQ1 en-              • A number of abstract Category nodes grouping paint-
gine. To make the robot move consistently, a set of gestural              ings belonging to the same art movement
strategies to accompany the presentations were observed and
used to control the robot’s behaviour:                                  • a number of data Item nodes containing a short presen-
                                                                          tation of the painting considered in the relative subtree
     • When the presentation refers to general aspects of the
       painting, the presenter looks at it to attract the lis-          • for each Item node, three abstract children nodes are
       tener’s attention to the work of art;                              considered to group data concerning Style, Iconography
                                                                          and Author
     • When the presentation refers to a specific detail of the
       painting, the presenter points at the area where the de-         • for each of these abstract nodes, there is a subtree of
       tail is found. This is because it is not always straight-          data Info nodes containing SSML. In this subtree, a
       forward, for the listener, to identify the specific point          child-parent relationship indicates that the child node
       where the presentation subject can be found. Using                 contains a deepening of the content provided in the
       gestures is more immediate and precise than using spo-             parent node. Sibling nodes provide different insight
       ken instructions about where to look;                              about the topic covered by their parent.
     • When the presentation refers to a specific area of the        A graphical summary of the data structure is presented
       painting, the strategy is similar to the one used in the    in Figure 1. In order to evaluate the interest of a user to-
1
    www.mivoq.it                                                   wards a specific category, it is necessary to assign a measure
of importance to each node in the subtree rooted for every
provided category. The concept is that interest is maximal                                R(T )                    PNs
when all the representative items of the category are fully ex-                           N
                                                                                          P s         ,       if    i=1 R(Ti ) ≥ 1
                                                                                            i=1 R(Ti )
                                                                                         
plored by the user. Since we assume that requesting a topic                  Wr (T ) =                                                (4)
deepening brings more insight into the user’s interests, it is                            PNNs (T )
                                                                                         
                                                                                                          ,   otherwise
important to keep track of the structure of each considered                                  i=1 N (Ti )

subtree. Specifically, we consider the resolution R(T ), of a         where Ns is the number of siblings of the root of T plus 1
generic T tree, which corresponds to the number of inter-           (the root itself) and p is the parent node of the root of the
nal branches, branches that do not connect a node to a leaf         T tree.
[?]. We also define the number of nodes included in a tree            This approach is motivated by the way paintings are pre-
T as N (T ) and the informative potential Ip (T ) as the po-        sented by the virtual agent: first of all, it provides access to
tential interest the user may implicitly express by exploring       the content of the root node of the considered Item, which
the T tree. Abstract nodes entirely distribute the amount           contains generic data about the presented work of art. When
of informative potential they receive among their subtrees.         the introduction is finished, the user is allowed to choose if
Data nodes retain a certain part of the informative poten-          continuing to navigate the tree or move to another Item. In
tial before passing the remaining amount to the subtrees            general, after providing the contents of each accessed node,
rooted into their children. We will refer to the amount of          the user is given three choices:
the informative potential retained by the nth data node as its
informative content Ic (n). Also, we will refer to the amount            • if at least one child node is available, the user can
of informative potential distributed to children nodes as its              request a Deepen move. This will make the virtual
informative residual Ir (n). Ic (n) is computed by retaining               agent access the contents of its first child;
the fraction of informative potential that would be assigned
                                                                         • if at least one unvisited sibling is available, the user
to it, should it be equally distributed among the nodes in
                                                                           can request a Continue move, which will make the vir-
its descendant nodes, as shown in Equation 1
                                                                           tual agent access the contents of the first sibling of the
                                                                           current node;
                                   Ip (Tn )
                        Ic (n) =                             (1)         • the user can always choose to Exit to the next Item.
                                   N (Tn )
   where Tn is the tree rooted in the nth node. Consequently,          Given the value of the parenthood relationship a Deepen
Ir (n) is computed as shown in Equation 2                           move can be considered to be totally informed as the user
                                                                    is aware that the chosen action will bring more information
                                                                    about the topic it just listened to. This can be considered a
                    Ir (n) = Ip (Tn ) − Ic (n)               (2)
                                                                    strong indication of interest for the topic. A Continue move
   The informative potential of the tree rooted in a Cate-          is less informed because, while the user is aware that the
gory node is assigned a value of 1: this potential has to be        topic will insist on the current painting, she cannot predict
distributed among the subtrees containing Item nodes. As            the exact topic. The virtual agent automatically moves to
different Items have different amounts of information and           the next Item if the current tree is exhausted. We consider
different structural organization, it is necessary to distribute    the total interest score to be the sum of the informative
the informative potential among the subtrees in such a way          content of the visited nodes.
that the interest value of parenthood is more powerful than
the one of siblinghood. The strategy to distribute the infor-       4.     RESULTS
mative residual to the subtrees depends on two factors: the            In order to evaluate the overall quality of the presented
number of nodes included in a subtree and the resolution            method and interest detection in art categories, we used a
of the subtree itself. Half the score depends on the former         human-robot interactive setup made of an Aldebaran Nao
while the other half on the latter. One half of the informa-        unit paired with a tablet. The robot uses a synthetic voice
tive residual is weighted by the ratio between the resolution       interpreting SSML data to mimic the expressive style of the
of the considered subtree and the sum of the resolutions of         recorded expert. It also controls the tablet to show the
the considered subtree and the resolutions of the trees rooted      paintings and to zoom in areas of interest tied to specific
in the siblings of its root. The other half is weighted by the      nodes of the XML presentation tree. The tablet interface
ratio between the number of nodes in the considered subtree         allows users to issue Continue, Deepen or Next moves. A
and the total number of nodes in the tree itself and in its         group of 30 users was recruited and provided with written in-
siblings. Ip (T ) is therefore computed as in Equation 3            structions describing the interface. The subjects were people
                                                       !            from the university with competence in computer science.
                   Ir (p)                 N (T )                    The categories used for the experiments are Renaissance,
         Ip (T ) =        ·   Wr (T ) + PNs                  (3)    Impressionism and Avant-garde. An average user session
                     2                   i=1 N (Ti )                lasted between 20 and 30 minutes. Users were asked to ex-
  where Wr (T ) represents the portion of informative resid-        plore the presentations and look for a painting they would
ual that is assigned to the T subtree on the basis of its reso-     be interested in discussing. After the interaction session was
lution. Of course, if the sibling nodes are all roots of subtrees   over, users chose and commented a single painting and eval-
with resolution equal to 0, the entire informative residual is      uated, on a scale of 1 to 5, the quality of the presentation
assigned depending solely on the number of nodes, as shown          offered by Nao. Asking users to pick a single painting in-
in Equation 4                                                       stead of a specific category is meant to obfuscate the goal
                                                                    of the experiment and to simplify the task for non-expert
                                                                                 Kappa value      Agreement
                                                                                    < 0.20           Poor
                                                                                  0.21 − 0.40         Fair
                                                                                  0.41 − 0.60      Moderate
                                                                                  0.61 − 0.80        Good
                                                                                  0.81 − 1.00      Very good

                                                                 Table 1: Kappa value and the corresponding inter-
                                                                 pretation of agreement.


                                                                 concerning works of art helps obtaining implicit feedback.
                                                                 We concentrate on user requests that are differently informed
Figure 2: Distribution scores for presentation qual-             with respect to the predictability of the associated piece of
ity and category selection.                                      information and check if the proposed measures are consis-
                                                                 tent with reported user interests. Evaluation is performed
                                                                 with an explicit and an implicit measure to take into account
users. It makes more sense, for a user, to select an interest-   potential interest overestimates users may provide, the topic
ing item in a set they just explored as they may be unaware      being works of art. Results show that the experience was
of the art movement the item belongs to. The performance         positively evaluated and that the agreement on the degree
measure is then given by the agreement rate between the          of interest is promising. While in this work we consider
category selected by Nao, which is kept hidden to the users      a classification based on artistic movements, the procedure
at this stage, and the category to which the chosen paint-       applies to other viewpoints, too. Also, a graph based repre-
ing belongs to. To collect an explicit judgement on this, at     sentation allows multiple classifications to co-exist and will
the end of the experiment users were informed about which        be covered in future work.
category scored the highest value and they were provided
with a brief, written description of the category that was
considered to be of their potential interest. Users were then
asked to evaluate the automatic choice on a scale of 1 to 5.
   First of all, we consider explicit judgement given by the
users. The obtained distribution of scores for presentation
quality and category selection is shown in Figure 2. Results
show that the subjects assigned a significantly high score to
the overall experience offered by Nao, validating the trans-
fer of the expert’s basic behaviour in the robot. This is
important as we can safely discard the possibility that indi-
cations of non-interest were caused by the robot. Explicit
judgement for category selection is also high, on average. A
single, very low, outlying score was observed from a person
who declared that he “did not like art in general ”.
   To evaluate the agreement rate between the categories of
the paintings chosen by the subjects and the categories se-
lected by Nao, we consider Weighted Cohen’s Kappa. In
general, this measure evaluates the agreement rate between
two annotators. In our case, we also need to differently
weight errors given the relationship between the considered
categories. We do this by specifying the relative distances
between the categories and then applying squared weight-
ing as a function of the temporal ordering from Renais-
sance to Avant-garde. More in detail, Impressionism and
Avant-garde are close as the former poses the basis for the
latter. Impressionism is closer to Renaissance than Avant-
garde but still cannot be considered an error as light as for
the Impressionism/Avant-garde confusion. Obviously, Re-
naissance and Avant-garde are two completely different art
movements and confusing them is considered a severe er-
ror. The kappa value we obtained with this setup is 0.604
which, given the general reference table provided in [?] and
reported in Table 1, indicates that the agreement lies on the
boundary between moderate and good.

5.   CONCLUSIONS
   We have investigated how, during human-robot interac-
tion, tracking the way users explore a structured document