Cultural heritage presentations with a humanoid robot using implicit feedback Antonio Origlia Antonio Rossi Maria Laura Chiacchio Dept. of Electrical Engineering Dept. of Electrical Engineering Dept. of Electrical Engineering and Information Technology, and Information Technology, and Information Technology, University of Naples University of Naples University of Naples “Federico II” “Federico II” “Federico II” Naples, Italy Naples, Italy Naples, Italy antonio.origlia@unina.it rossi.antonio.84@gmail.com marialaurachiacchio@gmail.com Francesco Cutugno Dept. of Electrical Engineering and Information Technology, University of Naples “Federico II’ Naples, Italy cutugno@unina.it ABSTRACT In recent years, there has been an increasing interest towards Communication in museums is considered an important cultural heritage in the field of ICT applications. To design issue even if museum specialists have sometimes been re- efficient communication strategies the knowledge possessed proached for not doing enough in this field. Many advance- by art historians, with expertise in mediating access to cul- ments have been obtained in the last years concerning the tural heritage, has become a valuable resource. In this work, attempt to understand museum visitors needs and develop we present a human-robot interaction setup where people ac- new interactive ways to address these. Among many oth- tively choose how much information they would like to access ers, investigations about visitors psychological approach [?] concerning the available topics. To provide engaging presen- helped museologists to develop possible methods not only tations, a humanoid robot exhibiting a general behaviour to exhibit artifacts but also to give them a sense, providing based on a human presenter was used and a mathemati- further explanations. So museums experts may be ready to cal model to keep track of content navigation was designed. take an active role in the development of technologies to sup- Monitoring the evolution of the interactive session allows to port visitors’ experiences [?]. Artificial agents can support estimate users’ general interest towards the available con- visitors to obtain information about cultural heritage in a tents. Our results show that people were very satisfied by pleasant way. This is because they use interaction paradigms the interaction experience and that automatically detected that do not require users to deviate too much from everyday interests were consistent with the users’. Both subjective communication. Of course, an artificial agent must rely on and objective metrics were used to validate the approach. written content as Natural Language Generation techniques are still experimental. To obtain a resource enabling an ar- tifact to talk to a user, it is important to start from texts as CCS Concepts close as possible to spoken language, as previous investiga- •Human-centered computing → Collaborative inter- tion in psychology suggests, too [?]. In order to obtain such action; a resource, we compiled our reference database starting from speech transcriptions. We collected speech material of a hu- Keywords man expert presenting works of art and converted it into a form an artificial agent can use in a presentation task. There Cultural heritage presentation, human-robot interaction, im- are a number of examples of robots being used in cultural plicit feedback heritage dissemination. Among existing systems for inter- active edutainment with robots, ROBOTINHO [?] provides 1. INTRODUCTION multimodal interaction through spoken presentations, facial expressions and gestures. Its hands convey greetings, de- ictic and other spatial information, while gaze establishes joint attention. Also, the robot presented in [?] attracts visitors attention through verbal and nonverbal action. In these works, the authors concentrated on the development of Copyright 2016 for this paper by its authors. Copying permitted for private social strategies to obtain a believable artificial presenter. In and academic purposes. interactive approaches, user feedback has been used to ob- tain user models for recommendation systems and personali- sation (e.g. [?]). Most of these systems, however, use explicit feedback, typically rating, to recommend items. However, research in the Information Retrieval field highlighted that Root explicit feedback poses a significant problem: the obtrusive- ness of the approach [?]. This problem becomes critical in edutainment setups. In this paper, we describe how the robot’s behavioural Cat. ... Cat. strategies and discourse structure were designed on the ba- sis of orally delivered cultural heritage presentations. From the transcription of these recordings, a series of information nodes have been identified and organized in a tree struc- Item ... Item ture representing general concepts, topic sharing and deep- ening levels. This structure is then automatically populated with information describing the feedback strength the sys- tem should consider, depending on how data are explored. Style Icon. Auth. An interactive task using a humanoid robot and a tablet interface is used for validation. 2. ROBOT BEHAVIOUR DESIGN Info ... Info To correctly design the robot’s behaviour during presen- tations, we collected a corpus of audio-visual reference ma- terial to study how a human expert delivers the contents re- ... lated to the considered works of art. In this case, one of the authors (Maria Laura Chiacchio) performed the presenta- tions. The total recorded material consists of two hours and Figure 1: General structure of the XML tree repre- a half. In this work, we concentrate on reproducing speech senting contents organization. and gestures in a humanoid robot, but the full dataset con- tains material for future analyses. preceding point. However, a different gesture, encir- Considering that every attempt to classify art may be con- cling the area of interest, has been identified as differ- troversial, it has been anyway necessary for the present re- ent from simple pointing; search to look for wide categories that could be represen- tative of the most famous styles of European Art. In each • When the subject of the presentation is not directly category, among many famous artists and masterpieces, the related to the painting (e.g. it refers to the author’s choice of the paintings has been done considering their im- life) the presenter looks at the audience. This signals portance towards the category itself. Moreover, it has been the absence of direct correlates of the presentation in important also to avoid very well-known masterpieces, be- the painting. cause their fame could influence the choice of the users to go further with the exploration. So, for example, instead of considering the Gioconda by Leonardo da Vinci, another of 3. COLLECTING IMPLICIT FEEDBACK his paintings has been presented, which is a famous work as The general structure of the data is organized into an well but not a universally recognized icon. XML tree where two kinds of nodes are available: data nodes The high-quality recorded speech was automatically tran- and abstract nodes. The tree is composed by the following scribed and manually corrected to remove disfluencies and elements: filled pauses. Punctuation and Synthetic Speech Markup • An abstract root node Language (SSML) tags were also added to support the gen- eration of synthetic speech, obtained using the MIVOQ1 en- • A number of abstract Category nodes grouping paint- gine. To make the robot move consistently, a set of gestural ings belonging to the same art movement strategies to accompany the presentations were observed and used to control the robot’s behaviour: • a number of data Item nodes containing a short presen- tation of the painting considered in the relative subtree • When the presentation refers to general aspects of the painting, the presenter looks at it to attract the lis- • for each Item node, three abstract children nodes are tener’s attention to the work of art; considered to group data concerning Style, Iconography and Author • When the presentation refers to a specific detail of the painting, the presenter points at the area where the de- • for each of these abstract nodes, there is a subtree of tail is found. This is because it is not always straight- data Info nodes containing SSML. In this subtree, a forward, for the listener, to identify the specific point child-parent relationship indicates that the child node where the presentation subject can be found. Using contains a deepening of the content provided in the gestures is more immediate and precise than using spo- parent node. Sibling nodes provide different insight ken instructions about where to look; about the topic covered by their parent. • When the presentation refers to a specific area of the A graphical summary of the data structure is presented painting, the strategy is similar to the one used in the in Figure 1. In order to evaluate the interest of a user to- 1 www.mivoq.it wards a specific category, it is necessary to assign a measure of importance to each node in the subtree rooted for every provided category. The concept is that interest is maximal  R(T ) PNs when all the representative items of the category are fully ex-  N  P s , if i=1 R(Ti ) ≥ 1 i=1 R(Ti )  plored by the user. Since we assume that requesting a topic Wr (T ) = (4) deepening brings more insight into the user’s interests, it is  PNNs (T )   , otherwise important to keep track of the structure of each considered i=1 N (Ti ) subtree. Specifically, we consider the resolution R(T ), of a where Ns is the number of siblings of the root of T plus 1 generic T tree, which corresponds to the number of inter- (the root itself) and p is the parent node of the root of the nal branches, branches that do not connect a node to a leaf T tree. [?]. We also define the number of nodes included in a tree This approach is motivated by the way paintings are pre- T as N (T ) and the informative potential Ip (T ) as the po- sented by the virtual agent: first of all, it provides access to tential interest the user may implicitly express by exploring the content of the root node of the considered Item, which the T tree. Abstract nodes entirely distribute the amount contains generic data about the presented work of art. When of informative potential they receive among their subtrees. the introduction is finished, the user is allowed to choose if Data nodes retain a certain part of the informative poten- continuing to navigate the tree or move to another Item. In tial before passing the remaining amount to the subtrees general, after providing the contents of each accessed node, rooted into their children. We will refer to the amount of the user is given three choices: the informative potential retained by the nth data node as its informative content Ic (n). Also, we will refer to the amount • if at least one child node is available, the user can of informative potential distributed to children nodes as its request a Deepen move. This will make the virtual informative residual Ir (n). Ic (n) is computed by retaining agent access the contents of its first child; the fraction of informative potential that would be assigned • if at least one unvisited sibling is available, the user to it, should it be equally distributed among the nodes in can request a Continue move, which will make the vir- its descendant nodes, as shown in Equation 1 tual agent access the contents of the first sibling of the current node; Ip (Tn ) Ic (n) = (1) • the user can always choose to Exit to the next Item. N (Tn ) where Tn is the tree rooted in the nth node. Consequently, Given the value of the parenthood relationship a Deepen Ir (n) is computed as shown in Equation 2 move can be considered to be totally informed as the user is aware that the chosen action will bring more information about the topic it just listened to. This can be considered a Ir (n) = Ip (Tn ) − Ic (n) (2) strong indication of interest for the topic. A Continue move The informative potential of the tree rooted in a Cate- is less informed because, while the user is aware that the gory node is assigned a value of 1: this potential has to be topic will insist on the current painting, she cannot predict distributed among the subtrees containing Item nodes. As the exact topic. The virtual agent automatically moves to different Items have different amounts of information and the next Item if the current tree is exhausted. We consider different structural organization, it is necessary to distribute the total interest score to be the sum of the informative the informative potential among the subtrees in such a way content of the visited nodes. that the interest value of parenthood is more powerful than the one of siblinghood. The strategy to distribute the infor- 4. RESULTS mative residual to the subtrees depends on two factors: the In order to evaluate the overall quality of the presented number of nodes included in a subtree and the resolution method and interest detection in art categories, we used a of the subtree itself. Half the score depends on the former human-robot interactive setup made of an Aldebaran Nao while the other half on the latter. One half of the informa- unit paired with a tablet. The robot uses a synthetic voice tive residual is weighted by the ratio between the resolution interpreting SSML data to mimic the expressive style of the of the considered subtree and the sum of the resolutions of recorded expert. It also controls the tablet to show the the considered subtree and the resolutions of the trees rooted paintings and to zoom in areas of interest tied to specific in the siblings of its root. The other half is weighted by the nodes of the XML presentation tree. The tablet interface ratio between the number of nodes in the considered subtree allows users to issue Continue, Deepen or Next moves. A and the total number of nodes in the tree itself and in its group of 30 users was recruited and provided with written in- siblings. Ip (T ) is therefore computed as in Equation 3 structions describing the interface. The subjects were people ! from the university with competence in computer science. Ir (p) N (T ) The categories used for the experiments are Renaissance, Ip (T ) = · Wr (T ) + PNs (3) Impressionism and Avant-garde. An average user session 2 i=1 N (Ti ) lasted between 20 and 30 minutes. Users were asked to ex- where Wr (T ) represents the portion of informative resid- plore the presentations and look for a painting they would ual that is assigned to the T subtree on the basis of its reso- be interested in discussing. After the interaction session was lution. Of course, if the sibling nodes are all roots of subtrees over, users chose and commented a single painting and eval- with resolution equal to 0, the entire informative residual is uated, on a scale of 1 to 5, the quality of the presentation assigned depending solely on the number of nodes, as shown offered by Nao. Asking users to pick a single painting in- in Equation 4 stead of a specific category is meant to obfuscate the goal of the experiment and to simplify the task for non-expert Kappa value Agreement < 0.20 Poor 0.21 − 0.40 Fair 0.41 − 0.60 Moderate 0.61 − 0.80 Good 0.81 − 1.00 Very good Table 1: Kappa value and the corresponding inter- pretation of agreement. concerning works of art helps obtaining implicit feedback. We concentrate on user requests that are differently informed Figure 2: Distribution scores for presentation qual- with respect to the predictability of the associated piece of ity and category selection. information and check if the proposed measures are consis- tent with reported user interests. Evaluation is performed with an explicit and an implicit measure to take into account users. It makes more sense, for a user, to select an interest- potential interest overestimates users may provide, the topic ing item in a set they just explored as they may be unaware being works of art. Results show that the experience was of the art movement the item belongs to. The performance positively evaluated and that the agreement on the degree measure is then given by the agreement rate between the of interest is promising. While in this work we consider category selected by Nao, which is kept hidden to the users a classification based on artistic movements, the procedure at this stage, and the category to which the chosen paint- applies to other viewpoints, too. Also, a graph based repre- ing belongs to. To collect an explicit judgement on this, at sentation allows multiple classifications to co-exist and will the end of the experiment users were informed about which be covered in future work. category scored the highest value and they were provided with a brief, written description of the category that was considered to be of their potential interest. Users were then asked to evaluate the automatic choice on a scale of 1 to 5. First of all, we consider explicit judgement given by the users. The obtained distribution of scores for presentation quality and category selection is shown in Figure 2. Results show that the subjects assigned a significantly high score to the overall experience offered by Nao, validating the trans- fer of the expert’s basic behaviour in the robot. This is important as we can safely discard the possibility that indi- cations of non-interest were caused by the robot. Explicit judgement for category selection is also high, on average. A single, very low, outlying score was observed from a person who declared that he “did not like art in general ”. To evaluate the agreement rate between the categories of the paintings chosen by the subjects and the categories se- lected by Nao, we consider Weighted Cohen’s Kappa. In general, this measure evaluates the agreement rate between two annotators. In our case, we also need to differently weight errors given the relationship between the considered categories. We do this by specifying the relative distances between the categories and then applying squared weight- ing as a function of the temporal ordering from Renais- sance to Avant-garde. More in detail, Impressionism and Avant-garde are close as the former poses the basis for the latter. Impressionism is closer to Renaissance than Avant- garde but still cannot be considered an error as light as for the Impressionism/Avant-garde confusion. Obviously, Re- naissance and Avant-garde are two completely different art movements and confusing them is considered a severe er- ror. The kappa value we obtained with this setup is 0.604 which, given the general reference table provided in [?] and reported in Table 1, indicates that the agreement lies on the boundary between moderate and good. 5. CONCLUSIONS We have investigated how, during human-robot interac- tion, tracking the way users explore a structured document