User Emotion Detection via Taxonomy Management: An Innovative System Alfredo Cuzzocrea1 , Giovanni Pilato2 , and Edoardo Fadda3 1 University of Calabria, Rende, Italy alfredo.cuzzocrea@unical.it 2 ICAR-CNR, Palermo, Italy giovanni.pilato@cnr.it 3 Politecnico di Torino, Torino, Italy edoardo.fadda@polito.it Abstract. Catching the attention of a new acquaintance and empathize with her can improve the social skills of a robot. For this reason, we illus- trate here the first step towards a system which can be used by a social robot in order to “break the ice” between a robot and a new acquain- tance. After a training phase, the robot acquires a sub-symbolic coding of the main concepts being expressed in tweets about the IAB Tier-1 categories. Then this knowledge is used to catch the new acquaintance interests, which let arouse in her a joyful sentiment. The analysis process is done alongside a general small talk, and once the process is finished, the robot can propose to talk about something that catches the attention of the user, hopefully letting arise in him a mix of feelings which involve surprise and joy, triggering, therefore, an engagement between the user and the social robot. 1 Introduction Engagement is one of the most basic and important phases in interactions between human beings. In the last years there has been a growing interest about this topic throughout the human-machine-interaction (HMI) and related fields [6]. Researchers have highlighted that engagement is a very complex phe- nomenon, including both cognitive and affective components: it should involve attention and enjoyment [3] [15]. We refer to this term as the “starting or intention to start an interaction”. In particular, we focus our attention on the fact that, in making new acquaintances, the first impression is very important, and finding as soon as possible common interests to talk about, allows starting an empathetic interaction between two persons, with all that this implies. In order to trigger both attention and enjoyment, given these premises, it would Copyright © 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). This volume is published and copyrighted by its editors. SEBD 2020, June 21-24, 2020, Villasimius, Italy. be useful to design a social robotic system which tries to find the topic of inter- est of a new acquaintance while attempting to understand what might raise a sentiment of joy attempting to catch an empathetic attention of the user. As a matter of fact, the knowledge of the topics of interest and the “joyful” subjects for the user can lead the first stages of a conversational interaction that allows the robot to facilitate the engagement of a friendly interaction, instead of a classical trivial interaction between a robot and an human user. To reach this goal, the robot can access the social network data of the new acquaintance trying to coarsely profile her/his interests, catching useful infor- mation to engage a possibly interesting conversation for the user. Social networks represent a great place, maybe the best, to gather informa- tion about people’s opinions, since they are generally used to express personal thoughts and to discuss with other people about specific subjects [12][30]. These opinions are really useful to understand and classify the emotion of an event, a product, a person, etc. and analyze his trend [21][22][13]. In this paper we illustrate the design of a system which can be used by a social robot in order to “break the ice” between the robot and a new acquaintance. First of all, the robot acquires a knowledge about the construction of prototypes describing each entry of the IAB Taxonomy. The system needs a training phase where fundamental concepts, induced by a data driven construction of a conceptual space by using the Latent Semantic Analysis (LSA) procedure and a set of topics derived by the Latent Dirichlet Allocation (LDA) methodology, representing the Tier1 categories of the IAB v2.0 taxonomy are mapped in a semantic space. A set of tweets is therefore retrieved for each word describing each entry of the IAB Taxonomy. A set of words describing the conceptual axes of two “conceptual spaces” induced from each set of tweets associated to a single IAB entry is built. Each conceptual axis is therefore described by a specific “bag of words” which constitute the axis description. Each axis is therefore coded as a vector in a semantic space built through LSA, and it is associated to the specific IAB entry. At the end of the procedure, each entry of the IAB taxonomy is associated to set of vectors, associated to the labels of each fundamental axis of the category, in the built semantic space. On the other hand, a system which is able to detect a pattern of basic Eck- man emotions [24], given a text, is trained too. Once the system is trained, during a general conversation with a new acquain- tance, the robot asks for the user Twitter ID, and while the conversation con- tinues, it retrieves the most recent tweets of the user. Each tweet is then encoded as a vector in a semantic space. The semantic sim- ilarity between each tweet and each vector representing each entry of the IAB taxonomy is computed, and the highest value of similarity is retained. The above procedure allows to associate a tweet of the user to a pattern of IAB categories; furthermore, for each tweet a vector of Eckman fundamental emo- tions is computed. This leads to a selection of the Tier1 categories of the IAB taxonomy which are of interest of the user and that let arise in the user a specific Fig. 1. Training process emotion. In our case we have chosen to select the ”joy” emotion, which is the most desirable when a person meets for the first time another human being. The goal is to engage a conversation somehow polarizing it on topics that catch the attention of the user, trying to establish an empathetic relationship. Under some extensions, this approach has relationships with adaptive metaphors, like those developed in other scientific contexts (e.g., [5]). 2 The System The proposed system is composed of a set of modules interacting in order to catch the attention of the user. The modularity of the proposed architecture makes it suitable to be implemented on top of Cloud infrastructure (e.g., [10]). The system has a training phase, shown in Fig.1, where a semantic space S is induced from Twitter data and a joyful-topic-detection process, illustrated in Fig. 2, which exploits the Twitter ID of the user in order to retrieve her posts and trying to catch the interests of the user that somehow let arise a “joy” emotion. 2.1 The IAB Taxonomy The IAB (Interactive Advertising Bureau ) Tech Lab Content Taxonomy is a concise taxonomy which is also an international standard to map contextual business categories [17] [18]. The latest release of the taxonomy, namely version 2.0, has been released on November 2017 and it accounts 698 entries distributed over 29 Tier-1 classes. This taxonomy is particularly suited for being used by companies in the market, it is standardized and industry-neutral. These characteristics can be effectively exploited for profiling an user interests. 2.2 Tweets retrieval module The dataset object of analysis is retrieved by using the Twitter APIs with the de- fault access level. The default access level gives a random sample of the streaming Fig. 2. Joyful-topic-detection process of publicly available tweets. For our approach, we use only the tweet text content, which is preprocessed before being exploited to build a data-driven conceptual space. Stop-words are filtered out, and links are removed before processing the text since they often hide off-topic posts or even spam. Abnormal sequences of characters were discarded. The retrieval module can be used either for retrieving tweets satisfying a query composed of keywords or to download the last tweets of a given twitter user ID. 2.3 LSA-based Descriptors The Latent Semantic Analysis (LSA) technique is a well-known methodology that is capable of giving a coarse sub-symbolic encoding of word semantics [20] and of simulating several human cognitive phenomena [19]. The LSA procedure is based on a term-document occurrence matrix A, whose generic element rep- resents the number of times a term is present in a document. Let K be the rank of A. The factorization named Singular Value Decomposition (SVD) holds for the matrix A: A = UΣVT (1) Let R be an integer > 0 with R < N , and let UR be the M × R matrix obtained from U by suppressing the last N − R columns, let ΣR be the matrix obtained from Σ by suppressing the last N − R rows and the last N − R columns; let VR be the N × R matrix obtained from V by suppressing the last N − R columns. Then: T AR = UR ΣR VR (2) AR is a M × N matrix of rank R, and it is the best rank R approximation of A (among the M × N matrices) with respect to the Frobenius metric. The i-th row of the matrix UR may be considered as representative of the i-th word. The columns of the UR matrix represent the R independent dimensions of the << space S. Each j-th dimension is weighted by the corresponding value σj of ΣR . Furthermore, each j-th dimension can be tagged by considering the words having the highest module values of uij . This makes it possible to interpret the space S as a “conceptual” space, according to the procedure illustrated in [1][25]. 2.4 LDA-based Descriptors In the last years a Bayesian probabilistic model of text corpora, namely the Latent Dirichlet Allocation (LDA), has been proposed with the aim of finding topics in documents [2] by associating a set of words to each topic, obtaining a rough representation of a textual corpus. One on the main advantages of LDA, like LSA, is the fact that the approach is completely unsupervisedThe only thing required by LDA to setup a priori is the number N of topics to extract. Latent topics are discovered through the identification of sets of words in the corpus that often occur together within documents. LDA is based on a generative process according to these two steps: – For each topic n = 1, 2, · · · , N , φ(n) ∼ Dirichlet(β) is a discrete probability distribution over a fixed Vocabulary constituting the n−th topic distribution, and β is a hyperparameter for the symmetric Dirichlet distribution. – For each document dk of the document corpus, θdk ∼ Dirichlet(α), which is a symmetric Dirichlet distribution for the specific document dk over the available topics is computed. θdk is a low dimensional coding of dk in the topic space. For each word wi belonging to the dk document, zi ∼ Discrete(θdj ) and wi ∼ Discrete(φ(zi ) ) are being computed, where zi is the topic index for wi The above process leads to the following distribution p(w, z, θ, φ|α, β) = p(φ|β)p(θ|α)p(z|θ)p(w|φz ) (3) where z, θ, φ are the latent variables of interest In LDA the posterior inference is given by: p(θ, φ, z|w, α, β) p(θ, φ, z|w, α, β) = (4) p(w|α, β) which represents the learning of the latent variables given the observed data. The above formula is usually computed through variational inference and Gibbs sampling, as reported in literature [14][2][29] 2.5 Emotion Detection Module This module deals with the detection of emotions in tweets. For the emotional labeling of tweets, we have considered the six Ekman basic emotions: anger, disgust, fear, joy, sadness and surprise, exploiting an emotions lexicon obtained from the Word-Net Affect Lexicon, as described in [26] [27] and adopting a procedure that has been illustrated in [24], which we briefly recap below. The methodology is based on LSA and starts from the fact that any text d can be mapped into a Data Driven “conceptual” space in the sense illustrated above, by computing a vector d whose i-th component is the number of times the i-th word of the vocabulary, corresponding to the i-th row of UR , appears in d. This leads to the mapping of the text as: −1 dR = dT UR ΣR (5) The emotional lexicon has been split into six lists, each one associated to one of the basic Ekman emotions {anger, disgust, fear, joy, sadness, surprise}. Fixed an emotion e, a set of 300 artificial sentences has been built by using five randomly selected words belonging to the list related to e. This procedure has been done for each list associated with a fundamental Ekman emotion, leading to a set of 1800 artificial sentences. Furthermore, all the 1542 words of the lexicon have been considered. Each one of the 3342 (i.e. 1542+1800) b texts associated with an emotion e has been mapped into the data driven “con- ceptual space” induced by TSVD according to the transformation in eq.(2). The above procedure leads to a cloud of 3342 (i.e. 1542+1800) vectors that have been used to map a tweet from the conceptual space to the emotional space. In particular, we have six sets Eanger , Edisgust , · · · , Esurprise of vectors constituting the sub-symbolic coding of the words belonging to the lexicon for a particular emotion together with their artifact sentences. The generic vec- (e) tor belonging to one of the sets will be denoted in the following as bi where e ∈ {“anger”, “disgust”, “f ear”, “joy”, “sadness”, “surprise”} and i is the in- (e) (e) dex that identifies the i-th bi in the e set. Specifically, bi is computed as: (e) −1 bi = bT UR ΣR (6) where b is, time by time, the vector computed starting from one of the 3342 textual artifacts b according to the procedure illustrated at the beginning of this section. Analogously, any textual content t of a tweet can be mapped into the Data Driven “conceptual” space by computing a vector t whose i-th component in the number of times the i-th word of the vocabulary, corresponding to the i-th row of UR , appears in t. This leads to the mapping of the tweet as: −1 tR = tT UR ΣR (7) Once the tweet t is mapped into the “conceptual” space as a vector tR , it (e) is possible to compute its emotional fingerprint by exploiting the vectors bi , which act as “beacons” for the vector tR , helping in finding its position inside the conceptual space. In particular, fixed tR , for each set Ee it is computed the weight: (e) we = max cos(tR , bi ) (8) once all the six we weights are computed, the vector ft , associated to the vector tR , and by consequence to the tweet t, is calculated as: " # w(anger) w(f ear) w(surprise) ft = p P , pP , · · · , pP (9) 2 2 2 e we e we e we The vector ft finally constitutes the emotional fingerprint of the tweet t in the emotional space. The emotional space is therefore a six-dimensional hypersphere where all tweets can be mapped and grouped. We call the fingerprint ft “emoxel”, analogously as the knoxel in the conceptual space paradigm [4]. 2.6 Conversational Engine The conversational engine exploits a speech-recognition module which makes use of the Google speech recognition APIs; after that the speech-to-text task is performed, the recognized string is sent to a dialogue manager. A set of question- answer rules are set-up into the conversation engine in order to start a conver- sation that leads to the detection of the user interests by transparently invoking the most adequate procedures which analyze the social network posts of the new acquaintance. The conversational agent engine allows for a natural human-robot interaction. The conversational module is based on a Rivescript engine, which is a a simple scripting language for realizing chatbots and other conversational entities. We have chosen this kind of engine because of the following interesting features: it is plain text, line-based scripting language, simple to learn, quick to type, and easy to read and maintain [23]. The syntax required to build a Rivescript “knowledge base” is very simple: Question-Answers pairs are encoded in plain text; it is easy to write a set of rules that can be combined to build effective conversational agents; its core library is focused on rendering responses, and it is straightforward to make custom modules and scripts; last but not least is an Open Source tool released under the MIT license [28]. The choice of such an engine allows us to easily connect it to other kind of robots or other kind of services. As a matter of fact, the conversational engine is invoked through a REST service and the answer is delivered to the user after its processing. A Rivescript knowledge base is made up of Triggers/Replies pairs. Triggers are identified by a “+” sign, while Replies are denoted by a “-” sign. For example: + hi - Hello there, my name is SocialRobot, please could you tell me your twitter ID? the above pair makes it possible that whenever the user says “Hi”, the conver- sational engine replies with “Hello there, my name is SocialRobot, please could you tell me your twitter ID? ”. At the beginning of the conversation, a specific Rivescript Topic is activated. Topics are logical groupings of triggers. When the conversation is bound in a topic, what the user says can only match triggers that belong to the activated topic [23]. The topic is aimed at entertain a general conversation while the robot peeks the tweets of the user trying to roughly identify the subjects that interest the user and those that specifically trigger joyful emotions. Once the predominant subject has been identified, the robot activates another Rivescript Topic which is of particular interest for the user, trying to establish an empathetic engagement with the new acquaintance. As an example, let us say that the system finds that, among the different higher level categories of the IAB taxonomy, the user is particularly interested in the “Automobiles” topic and that some of his tweets show the “joy” emotion for that topic, the conversation will be switched to the “Automobile” Topic and specific sentences will be said by the robot in order to catch the user attention and em- pathy, like “Great! with my superpowers I can see that you like automobiles. I like the brand automobiles! Which one do you prefer?” 3 Conclusions and future works We have presented a preliminary work on a system that tries to catch the atten- tion of a new acquaintance with the aim of establishing a first engagement with the user. The system uses both LSA and LDA descriptors, as well as an emotion detection module to reach this goal. A conversational engine guides the initial process and continues witht the conversation. Many issues have to be enhanced, starting from a more fine grained classifica- tion which should be also fast and reliable, the selection of specific entities that can catch in a more effective manner the attention of the user, as well as the automatic generation of conversational statements starting from the user tweets. Other lines of research shall consider privacy-preservation issues (e.g., [8, 11]), as well as complex web intelligent solutions (e.g., [7, 9]). References 1. F. Agostaro, A. Augello, G. Pilato, G. Vassallo, and S. Gaglio, “A conversational agent based on a conceptual interpretation of a data driven semantic space.,” Lecture Notes in Artificial Intelligence, vol. 3673, no. 2, pp. 381–392, 2005. 2. Blei, D., Ng, A., and Jordan, M. Latent Dirichlet allocation. Journal of Machine Learning Re- search,3:993–1022, January 2003. 3. Brethes L, Menezes P, Lerasle F, Hayet J (2004) Face tracking and hand gesture recognition for human–robot interaction. In: IEEE international conference on robotics and automation, vol 2. IEEE, pp 1901–1906 4. Chella, A., Frixione, M., and Gaglio, S. (2008) A cognitive architecture for robot self conscious- ness. Artificial intelligence in medicine, 44(2): 147–154 5. Cannataro M., Cuzzocrea A., and Pugliese A. (2001) A Probabilistic Approach to Model Adaptive Hypermedia Systems. In: 1st Int. Workshop on Web Dynamics, in conjunction on ICDT 2001 6. Corrigan Lee J., Peters C., Küster D. and Castellano G. (2016) Engagement Perception and Generation for Social Robots and Virtual Agents - Toward Robotic Socially Believable Behaving Systems - Volume I, Intelligent Systems Reference Library 105, pp 29-51, Springer 7. Cuzzocrea, A. (2006) Combining multidimensional user models and knowledge representation and management techniques for making web services knowledge-aware. Web Intell. Agent Syst. 4(3): 289-312 8. Cuzzocrea A., and Bertino, E. (2011) Privacy Preserving OLAP over Distributed XML Data: A Theoretically-Sound Secure-Multiparty-Computation Approach. J. Comput. Syst. Sci. 77(6): 965-987 9. Cuzzocrea A., De Maio C., Fenza G., Loia V., and Parente M. (2016) OLAP analysis of multidi- mensional tweet streams for supporting advanced analytics. In: AMC SAC 2016, pp. 992-999 10. Cuzzocrea A., Fortino G., and Rana O. (2013) Managing Data and Processes in Cloud- Enabled Large-Scale Sensor Networks: State-of-the-Art and Future Research Directions. In: 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013, pp. 583–588 11. Cuzzocrea A., and Russo V. (2009) Privacy Preserving OLAP and OLAP Security. Encyclopedia of Data Warehousing and Mining 2009, pp. 1575-1581 12. D’Avanzo E., Pilato G. (2014) Mining social network users opinions’ to aid buyers’ shopping decisions. Computers in Human Behavior, Elsevier, Vol.51, pp 1284-1294 13. D’Avanzo E., Pilato G., and Lytras M. D.. Using twitter sentiment and emotions analysis of google trends for decisions making. In Program, pages Vol. 51, Issue 3. 2017. 14. Darling WM. A theoretical and practical implementation tutorial on topic modeling and gibbs sampling. Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies 2011 Dec 1 (pp. 642-647). 15. Delaherche E, Dumas G, Nadel J, Chetouani M (2014) Automatic measure of imitation during social interaction: a behavioral and hyperscanning-eeg benchmark. Pattern Recognit Lett. 16. Ekman P., Friesen W V. Constants across cultures in the face and emotion., Journal of person- ality and social psychology 17:124, 1971. 17. Interactive Advertising Bureau (IAB) Contextual Taxonomy, http://www.iab.net/ Retrieved December 2017 18. Kanagasabai, R., Veeramani, A., Ngan, L. D., Yap, G. E., Decraene, J., Nash, A. S. (2014). Using Semantic Technologies to Mine Customer Insights in Telecom Industry. In International Semantic Web Conference (Industry Track). 19. Landauer, T. K., and Dumais, S. T. “A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge”, In Psychological review, vol. 104(2), 1990, pp. 211-223. 20. Landauer, T. K., Foltz, P. W. and Laham, D. “ An introduction to latent semantic analysis”, In Discourse processes, vol. 25, 1998, pp. 259-284. 21. Liu, B. (2010). Sentiment Analysis and Subjectivity. In N. Indurkhya, & F. J.Damerau, Hand- book of Natural Language Processing (pp. 627-665). CRC Press. 22. Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs up? Sentiment Classification using Machine Learning Techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10 (pp. 79-86). Association for Computational Linguistics. 23. Petherbridge, N. “Artifical Intelligence Scripting Language - Rivescript.com,” (online) 24. G. Pilato, E. D’Avanzo - Data-driven Social Mood Analysis through the Conceptualization of Emotional Fingerprints - Procedia Computer Science, 2018, (in press) 25. Santilli, S., Nota, L., Pilato, G.: The use of latent semantic analysis in the positive psychology: A comparison with twitter posts. In: Semantic Computing (ICSC), 2017 IEEE 11th International Conference on. pp. 494–498. IEEE (2017) 26. C. Strapparava and R Mihalcea. Semeval-2007 task 14: Affective text. In Proceedings of the 4th International Workshop on Semantic Evaluations, pages 70–74. Association for Computational Linguistics, 2007. 27. C. Strapparava and R Mihalcea. Learning to identify emotions in text. In SAC ’08 Proceedings of the 2008 ACM symposium on Applied computing. 2008. 28. Siddharth, G., Borkar, D., De Mello, C., Patil, S. “An E-Commerce Website based Chatbot” Proc of. (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (2) , 2015, 1483-1485 29. Teh, Y. W., Newman, D., Welling, M. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In NIPS, Vol. 6, pp. 1378-1385, 2006 30. Terrana D., Augello A., Pilato (2014). Facebook users relationships analysis based on sentiment classification. Proc. of 2014 IEEE International Conference on Seman- tic Computing (ICSC), pp. 290-296