-

Micro Facial Expressions for More Inclusive User Interfaces

Alessio Ferrato

ale.ferrato@stud.uniroma3.it 0 2

Carla Limongelli

limongel@dia.uniroma3.it 0 2

Mauro Mezzini

mauro.mezzini@uniroma3.it 0 1

Giuseppe Sansonetti

0 2 0 College Station , USA 1 Department of Education, Roma Tre University , Viale del Castro Pretorio 20, 00185 Rome , Italy 2 Department of Engineering, Roma Tre University , Via della Vasca Navale 79, 00146 Rome , Italy

Current image/video acquisition and analysis techniques allow for not only the identification and classification of objects in a scene but also more sophisticated processing. For example, there are video cameras today able to capture micro facial expressions, namely, facial expressions that occur in a fraction of a second. Such micro expressions can provide useful information to define a person's emotional state. In this article, we propose to use these features to collect useful information for designing and implementing increasingly efective interactive technologies. In particular, facial micro expressions could be used to develop interfaces capable of fostering the social and cultural inclusion of users belonging to diferent realities and categories. The preliminary experimental results obtained by recording the reactions of individuals while observing artworks demonstrate the existence of correlations between the action units (i.e., single components of the muscular movement in which it is possible to break down facial expressions) and the emotional reactions of a sample of users, as well as correlations within some homogeneous groups of testers.

User interfaces User modeling Emotion recognition Computer vision

1. Introduction and Background Systems capable of identifying a user’s emotional state starting from her behavior are becoming more and more popular [ 1 ]. Among these, Automatic Facial Expression Facial expressions can be defined as facial changes in response to a person’s internal emotional states, intentions, or social communications [3]. This research topic is certainly not new if we consider that Darwin in 1872 had already addressed the subject in [4]. Since then, there have been several attempts by behavioral scientists to conceive methods and models for the automatic analysis of facial expressions on image sequences [5, 6]. These studies have laid the foundations for the realization of computer systems able to help us understand this natural form of communication among human beings (e.g., see [7, 8, 9, 10]). Such systems, although very eficient, are inevitably afected by context, culture, genre and so on [11, 12, 13]. In this article, we propose the analysis of facial micro expressions as a possible solution to these problems. Micro facial expressions are facial expressions that occur in a fraction of a second. They can provide accurate information about a person’s actual emotional Joint Proceedings of the ACM IUI 2021 Workshops, April 13-17, 2021,

2. Kinesics

Kinesics is the science that studies body language. According to the anthropologist Ray Birdwhistell, who coined this term in 1952, this science allows us to interpret a necessary to collect the data that could allow us to verify person’s thoughts, feelings, and emotions by analyzing our initial assumptions. her facial expressions, gestures, posture, gaze, and movements of the legs and arms [21]. Birdwhistell’s theories 3.1. The development of a data collection were highly regarded over the years and it is well known system that mere verbal communication represents only a small part of the message that allows two individuals to convey At the beginning of our research activity, we had planned information to each other. According to the 7-38-55 Rule real experimentation in a suitable place to verify our hydeveloped by Albert Mehrabian in the 1970s [22], com- potheses, for example, a museum. Unfortunately, the munication takes place in three ways: the content (what limitations imposed by the COVID-19 pandemic did not is communicated), tone (how it is communicated), and allow us to follow this road. Consequently, to collect data body language (posture, expressions, etc). The digits that it was necessary to develop an online application. First appear in the rule name indicate the percentage of the of all, we developed a website1 that had mainly two funcrelevance of these ways: 7% the content of the message, tions. The first function was to simulate a visit sharing 38% the tone of the voice, 55% the body language. the same characteristics as a visit to a real museum. For this purpose, we selected some artworks from those ex2.1. Facial expressions (FACS) hibited at the National Gallery of Modern and Contemporary Art2 in Rome, Italy. The selection was made in such The kinesic system of signification and signaling includes a way as to be able to show the user works as diferent as the movements of the body, face, and eyes [23]. Facial possible. The second function was to collect information expressions manifest the intentions of the subject based about the visitor. In particular, we were interested in on the context and depending on this there are facial ex- acquiring data relating to her demographic profile, depressions that difer substantially, also giving the listener gree of appreciation of the work displayed at that time, the possibility to understand the state of mind of her and resulting micro facial expressions. Specifically, parinterlocutor. In 1979 Paul Ekman and Wallace V. Friesen, ticipants were shown eight artworks and asked to rate based on the previously developed study by Swedish each of them on a five-point Likert scale. Meanwhile, anatomist Carl-Herman Hjortsjö [24], proposed the Fa- the participants were recorded through the webcam of cial Action Coding System (FACS) [23], an anatomically their device while viewing each artwork. Demographic accurate system to describe all visually distinguishable information was collected through a final questionnaire. facial movements. Specifically, the demographic data relating to the users who participated in the experimental trials are shown 2.1.1. Action Units (AUs) in Table 1. The participants were 73, almost equally distributed between females and males, and aged mostly between 21 and 29. Most participants had a high school diploma and were mainly university students. Once the dataset was collected, it was necessary to process the recorded videos using facial recognition software. We employed two diferent software tools for this purpose: OpenFace3, an opensource toolkit capable of performing action unit analysis, and iMotions4, a proprietary software.

The FACS decoding system explores facial expressions by breaking them down into the smallest fundamental units, the action units (AUs), giving each one a meaning.

Ekman and Friesen cataloged 44 AUs describing changes in facial expressions and 14 AUs mapping changes in the eye gaze direction and the head orientation. The AUs play a fundamental role in the recognition of emotions, movements, and attitudes, not only of the face but also of the body, allowing us to analyze the state of mind of the subject. The combination of the AUs enables us to map the four main emotions, namely, happiness, sadness, 4. Data Analysis anger, and fear [25].

3. Data Collection

The research questions underlying the experimental analysis we performed are the following: is there a correlation between the micro facial expressions of an observer and her degree of appreciation (i.e., rating) of an artwork? Is it possible to identify correlations shared by specific categories of users? To answer these questions, it was Let us now analyze the results returned by the two analysis software. Table 2 shows the average values, standard deviations, as well as the minimum and maximum values, calculated on the whole dataset. First of all, we can observe that the iMotions software returns more information than OpenFace and that the two software tools 1https://www.raccoltadati.tk/ 2https://lagallerianazionale.com/en/ 3https://github.com/TadasBaltrusaitis/OpenFace 4https://imotions.com/

Demographics of the 73 users involved in the experimental

hence, conclude that most testers kept high their level of attention during the virtual visit. Table 3 shows the value of Spearman’s correlation coeficient of the ratings assigned by the testers to the individual works and the average score obtained by the features for each video. We sometimes analyze the same micro expressions. The mean of the individual action units is often less than the standard deviation. At the same time, the minimum values difer highly from the maximum values. These sume a neutral expression for most of the time except results, therefore, indicate the tendency of visitors to as- testers. More specifically, we grouped the data based on gender, the rating attributed to the artwork, and the numis noteworthy. The average value is very close to the in rare moments. The attention score, namely, the atten- ber of recognized artworks. Table 4 reports the values tion showed by the visitor while observing the artwork, returned by OpenFace. We note a positive correlation value between the rating and the cheek raise action unit can immediately notice a high correlation value between ratings and eye closure. The same thing happens for perceived sadness. The negative value of these correlations indicates that a high value of the feature corresponds to a low rating attributed to the work. We then verified if there were any correlations shared by some categories of

AU & Emotions

Inner Brow Raise Outer Brow Raise Brow Lower Upper Lid Raise Cheek Raise Lid Tighten Nose Wrinkle Upper Lip Raise Lip Corner Puller Dimpler Lip Corner Depressor Chin Raise Lip Stretch Lid Tighten Mouth Open Jaw Drop Blink Lip Suck Lip Press Lip Pucker Eye Closure Eye Widen Smile Smirk Engagement Attention Anger Sadness Disgust Joy Surprise Fear

Contempt -0.07 -0.01 -0.05 0.00 -0.05 -0.04 -0.03 -0.02 -0.04 0.01 -0.09 -0.01 -0.05 -0.03 -0.05 -0.06 0.03 -0.01 0.04 -0.04 -0.05 -0.05 -0.02 -0.09 -0.07 -0.05 -0.07 -0.13* -0.17** -0.06 -0.05 -0.06 -0.05 -0.05 0.06 -0,04 -0.04 0.00 -0.03 -0.06 -0.07 -0.04 -0.08 0.00 -0.02 -0.08 5. Conclusions and Future Works The ultimate goal of our research activities was to verify whether facial micro expressions can be exploited to create interfaces that can adapt diferently depending on the characteristics of the active user. If so, it would be possible to foster cultural and social inclusion between individuals from diferent backgrounds and belonging to diferent categories, including disadvantaged and at-risk categories as well as vulnerable people. In particular, from the experimental results, it emerged how it is posexpressions and the degree of appreciation of an object, specifically an artwork. It is also possible to identify correlations within some homogeneous groups of testers.

Our experimental analysis is very simplified and also sufers from numerous limitations. Among others, it is evident as follows: • it was performed in a specific domain, namely

that of cultural heritage; • the micro facial expressions were collected in re

sponse to a specific stimulus, that is, the vision • the data was collected through a virtual and not of an artwork; live experimentation; • the sample of users was very limited; • the sample of users was mostly made up of university students, so it was anything but heterogeneous.

A much more extensive and rigorous experimental analysis is therefore needed, including further categories of users, scenarios (e.g., [26, 27, 28]), and information (e.g., [29]). Only in this way we could indeed draw definitive conclusions on the existence of correlations between micro facial expressions and categories of testers. timodal Behavior Analysis in the Wild, Computer Vision and Pattern Recognition, Academic Press, 2019, pp. 1–8. [2] B. T. Hung, L. M. Tien, Facial expression recognition with cnn-lstm, in: R. Kumar, N. H. Quang, V. Kumar Solanki, M. Cardona, P. K. Pattnaik (Eds.), Research in Intelligent and Computing in Engineering,

Springer Singapore, Singapore, 2021, pp. 549–560. [3] Y. Tian, T. Kanade, J. F. Cohn, Facial expression recognition, in: S. Z. Li, A. K. Jain (Eds.), Handbook of Face Recognition, Springer London, Lon[8] H. Gunes, M. Piccardi, Automatic temporal segment dentlitteratur, 1969.

detection and afect recognition from face and body [25] C. G. Kohler, T. Turner, N. M. Stolar, W. B. Bilker, display, IEEE Transactions on Systems, Man, and C. M. Brensinger, R. E. Gur, R. C. Gur, Diferences Cybernetics, Part B (Cybernetics) 39 (2008) 64–84. in facial expressions of four universal emotions, [9] J. Lien, T. Kanade, J. Cohn, C. Li, Detection, tracking, Psychiatry Research 128 (2004) 235 – 244. and classification of action units in facial expression, [26] S. Caldarelli, D. F. Gurini, A. Micarelli, G. Sansonetti, Robotics and Autonomous Systems 31 (2000). A signal-based approach to news recommendation, [10] Y.-l. Tian, T. Kanade, J. F. Cohn, Recognizing action in: CEUR Workshop Proceedings, volume 1618, units for facial expression analysis, IEEE Trans. CEUR-WS.org, Aachen, Germany, 2016.

Pattern Anal. Mach. Intell. 23 (2001) 97–115. [27] M. Onori, A. Micarelli, G. Sansonetti, A comparative [11] J. Carroll, J. Russell, Do facial expressions signal analysis of personality-based music recommender specific emotions? judging emotion from the face systems, in: CEUR Workshop Proceedings, volume in context., Journal of personality and social psy- 1680, CEUR-WS.org, Aachen, Germany, 2016. chology 70 2 (1996) 205–18. [28] D. Valeriani, G. Sansonetti, A. Micarelli, A compar[12] J. Russell, Culture and the categorization of emo- ative analysis of state-of-the-art recommendation tions., Psychological bulletin 110 3 (1991) 426–50. techniques in the movie domain, Lecture Notes in [13] J. Russell, Is there universal recognition of emo- Computer Science 12252 LNCS (2020) 104–118. tion from facial expression? a review of the cross- [29] M. Saneiro, O. Santos, S. Salmeron-Majadas, J. Botcultural studies., Psychological bulletin 115 1 (1994). icario, Towards emotion detection in educational [14] G. Sansonetti, Point of interest recommendation scenarios from facial expressions and body movebased on social and linked open data, Personal and ments through multimodal approaches, The ScienUbiquitous Computing 23 (2019) 199–214. tific World Journal (2014). [15] H. A. M. Hassan, G. Sansonetti, F. Gasparetti, A. Micarelli, Semantic-based tag recommendation in scientific bookmarking systems, in: Proceedings of the 12th ACM Conference on Recommender Systems, ACM, New York, NY, USA, 2018, pp. 465–469. [16] A. Fogli, G. Sansonetti, Exploiting semantics for context-aware itinerary recommendation, Personal and Ubiquitous Computing 23 (2019) 215–231. [17] M. Chang, G. D’Aniello, M. Gaeta, F. Orciuoli,

D. Sampson, C. Simonelli, Building ontology-driven tutoring models for intelligent tutoring systems using data mining, IEEE Access 8 (2020) 48151–48162. [18] G. D’Aniello, M. Gaeta, F. Orciuoli, G. Sansonetti,

F. Sorgente, Knowledge-based smart city service system, Electronics 9 (2020). [19] G. Sansonetti, F. Gasparetti, A. Micarelli, F. Cena,

C. Gena, Enhancing cultural recommendations through social and linked open data, User Modeling and User-Adapted Interaction 29 (2019) 121–159. [20] M. Mezzini, C. Limongelli, G. Sansonetti,

C. De Medio, Tracking museum visitors through convolutional object detectors, in: Adjunct Publication of UMAP ’20, ACM, New York, NY,

USA, 2020, p. 352–355. [21] R. L. Birdwhistell, Kinesics and context: Essays on body motion communication, University of Pennsylvania press, 2010. [22] A. Mehrabian, M. Wiener, Decoding of inconsistent communications, Journal of personality and social psychology 6 (1967) 109—114. [23] P. Ekman, W. Friesen, Facial Action Coding System,

Consulting Psychologists Press, 1978. [24] C. Hjortsjö, Man’s Face and Mimic Language, Stu

[1]

Alameda-Pineda ,

Ricci ,

Sebe , Multimodal behavior analysis in the wild: An introduction , in: X. Alameda-Pineda , E. Ricci , N. Sebe (Eds.), Mul-