<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Micro Facial Expressions for More Inclusive User Interfaces</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessio Ferrato</string-name>
          <email>ale.ferrato@stud.uniroma3.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carla Limongelli</string-name>
          <email>limongel@dia.uniroma3.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mauro Mezzini</string-name>
          <email>mauro.mezzini@uniroma3.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Sansonetti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College Station</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Education, Roma Tre University</institution>
          ,
          <addr-line>Viale del Castro Pretorio 20, 00185 Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Engineering, Roma Tre University</institution>
          ,
          <addr-line>Via della Vasca Navale 79, 00146 Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Current image/video acquisition and analysis techniques allow for not only the identification and classification of objects in a scene but also more sophisticated processing. For example, there are video cameras today able to capture micro facial expressions, namely, facial expressions that occur in a fraction of a second. Such micro expressions can provide useful information to define a person's emotional state. In this article, we propose to use these features to collect useful information for designing and implementing increasingly efective interactive technologies. In particular, facial micro expressions could be used to develop interfaces capable of fostering the social and cultural inclusion of users belonging to diferent realities and categories. The preliminary experimental results obtained by recording the reactions of individuals while observing artworks demonstrate the existence of correlations between the action units (i.e., single components of the muscular movement in which it is possible to break down facial expressions) and the emotional reactions of a sample of users, as well as correlations within some homogeneous groups of testers.</p>
      </abstract>
      <kwd-group>
        <kwd>User interfaces</kwd>
        <kwd>User modeling</kwd>
        <kwd>Emotion recognition</kwd>
        <kwd>Computer vision</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1. Introduction and Background
Systems capable of identifying a user’s emotional state
starting from her behavior are becoming more and more
popular [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Among these, Automatic Facial Expression
Facial expressions can be defined as facial changes in
response to a person’s internal emotional states, intentions,
or social communications [3]. This research topic is
certainly not new if we consider that Darwin in 1872 had
already addressed the subject in [4]. Since then, there
have been several attempts by behavioral scientists to
conceive methods and models for the automatic analysis
of facial expressions on image sequences [5, 6]. These
studies have laid the foundations for the realization of
computer systems able to help us understand this
natural form of communication among human beings (e.g.,
see [7, 8, 9, 10]). Such systems, although very eficient,
are inevitably afected by context, culture, genre and so
on [11, 12, 13]. In this article, we propose the analysis of
facial micro expressions as a possible solution to these
problems. Micro facial expressions are facial expressions
that occur in a fraction of a second. They can provide
accurate information about a person’s actual emotional
Joint Proceedings of the ACM IUI 2021 Workshops, April 13-17, 2021,
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Kinesics</title>
      <p>Kinesics is the science that studies body language.
According to the anthropologist Ray Birdwhistell, who coined
this term in 1952, this science allows us to interpret a necessary to collect the data that could allow us to verify
person’s thoughts, feelings, and emotions by analyzing our initial assumptions.
her facial expressions, gestures, posture, gaze, and
movements of the legs and arms [21]. Birdwhistell’s theories 3.1. The development of a data collection
were highly regarded over the years and it is well known system
that mere verbal communication represents only a small
part of the message that allows two individuals to convey At the beginning of our research activity, we had planned
information to each other. According to the 7-38-55 Rule real experimentation in a suitable place to verify our
hydeveloped by Albert Mehrabian in the 1970s [22], com- potheses, for example, a museum. Unfortunately, the
munication takes place in three ways: the content (what limitations imposed by the COVID-19 pandemic did not
is communicated), tone (how it is communicated), and allow us to follow this road. Consequently, to collect data
body language (posture, expressions, etc). The digits that it was necessary to develop an online application. First
appear in the rule name indicate the percentage of the of all, we developed a website1 that had mainly two
funcrelevance of these ways: 7% the content of the message, tions. The first function was to simulate a visit sharing
38% the tone of the voice, 55% the body language. the same characteristics as a visit to a real museum. For
this purpose, we selected some artworks from those
ex2.1. Facial expressions (FACS) hibited at the National Gallery of Modern and
Contemporary Art2 in Rome, Italy. The selection was made in such
The kinesic system of signification and signaling includes a way as to be able to show the user works as diferent as
the movements of the body, face, and eyes [23]. Facial possible. The second function was to collect information
expressions manifest the intentions of the subject based about the visitor. In particular, we were interested in
on the context and depending on this there are facial ex- acquiring data relating to her demographic profile,
depressions that difer substantially, also giving the listener gree of appreciation of the work displayed at that time,
the possibility to understand the state of mind of her and resulting micro facial expressions. Specifically,
parinterlocutor. In 1979 Paul Ekman and Wallace V. Friesen, ticipants were shown eight artworks and asked to rate
based on the previously developed study by Swedish each of them on a five-point Likert scale. Meanwhile,
anatomist Carl-Herman Hjortsjö [24], proposed the Fa- the participants were recorded through the webcam of
cial Action Coding System (FACS) [23], an anatomically their device while viewing each artwork. Demographic
accurate system to describe all visually distinguishable information was collected through a final questionnaire.
facial movements. Specifically, the demographic data relating to the users
who participated in the experimental trials are shown
2.1.1. Action Units (AUs) in Table 1. The participants were 73, almost equally
distributed between females and males, and aged mostly
between 21 and 29. Most participants had a high school
diploma and were mainly university students. Once the
dataset was collected, it was necessary to process the
recorded videos using facial recognition software. We
employed two diferent software tools for this purpose:
OpenFace3, an opensource toolkit capable of
performing action unit analysis, and iMotions4, a proprietary
software.</p>
      <p>The FACS decoding system explores facial expressions
by breaking them down into the smallest fundamental
units, the action units (AUs), giving each one a meaning.</p>
      <p>Ekman and Friesen cataloged 44 AUs describing changes
in facial expressions and 14 AUs mapping changes in the
eye gaze direction and the head orientation. The AUs
play a fundamental role in the recognition of emotions,
movements, and attitudes, not only of the face but also
of the body, allowing us to analyze the state of mind of
the subject. The combination of the AUs enables us to
map the four main emotions, namely, happiness, sadness, 4. Data Analysis
anger, and fear [25].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Data Collection</title>
      <p>The research questions underlying the experimental
analysis we performed are the following: is there a correlation
between the micro facial expressions of an observer and
her degree of appreciation (i.e., rating) of an artwork?
Is it possible to identify correlations shared by specific
categories of users? To answer these questions, it was
Let us now analyze the results returned by the two
analysis software. Table 2 shows the average values, standard
deviations, as well as the minimum and maximum
values, calculated on the whole dataset. First of all, we can
observe that the iMotions software returns more
information than OpenFace and that the two software tools
1https://www.raccoltadati.tk/
2https://lagallerianazionale.com/en/
3https://github.com/TadasBaltrusaitis/OpenFace
4https://imotions.com/</p>
      <sec id="sec-3-1">
        <title>Demographics of the 73 users involved in the experimental</title>
        <p>hence, conclude that most testers kept high their level
of attention during the virtual visit. Table 3 shows the
value of Spearman’s correlation coeficient of the ratings
assigned by the testers to the individual works and the
average score obtained by the features for each video. We
sometimes analyze the same micro expressions. The
mean of the individual action units is often less than
the standard deviation. At the same time, the minimum
values difer highly from the maximum values. These
sume a neutral expression for most of the time except
results, therefore, indicate the tendency of visitors to as- testers. More specifically, we grouped the data based on
gender, the rating attributed to the artwork, and the
numis noteworthy. The average value is very close to the
in rare moments. The attention score, namely, the atten- ber of recognized artworks. Table 4 reports the values
tion showed by the visitor while observing the artwork, returned by OpenFace. We note a positive correlation
value between the rating and the cheek raise action unit
can immediately notice a high correlation value between
ratings and eye closure. The same thing happens for
perceived sadness. The negative value of these correlations
indicates that a high value of the feature corresponds to
a low rating attributed to the work. We then verified if
there were any correlations shared by some categories of</p>
        <p>AU &amp; Emotions</p>
      </sec>
      <sec id="sec-3-2">
        <title>Inner Brow Raise</title>
      </sec>
      <sec id="sec-3-3">
        <title>Outer Brow Raise</title>
      </sec>
      <sec id="sec-3-4">
        <title>Brow Lower</title>
      </sec>
      <sec id="sec-3-5">
        <title>Upper Lid Raise</title>
      </sec>
      <sec id="sec-3-6">
        <title>Cheek Raise</title>
      </sec>
      <sec id="sec-3-7">
        <title>Lid Tighten</title>
      </sec>
      <sec id="sec-3-8">
        <title>Nose Wrinkle</title>
      </sec>
      <sec id="sec-3-9">
        <title>Upper Lip Raise</title>
      </sec>
      <sec id="sec-3-10">
        <title>Lip Corner Puller</title>
      </sec>
      <sec id="sec-3-11">
        <title>Dimpler</title>
      </sec>
      <sec id="sec-3-12">
        <title>Lip Corner Depressor</title>
      </sec>
      <sec id="sec-3-13">
        <title>Chin Raise</title>
      </sec>
      <sec id="sec-3-14">
        <title>Lip Stretch</title>
      </sec>
      <sec id="sec-3-15">
        <title>Lid Tighten</title>
      </sec>
      <sec id="sec-3-16">
        <title>Mouth Open</title>
      </sec>
      <sec id="sec-3-17">
        <title>Jaw Drop</title>
      </sec>
      <sec id="sec-3-18">
        <title>Blink</title>
      </sec>
      <sec id="sec-3-19">
        <title>Lip Suck</title>
      </sec>
      <sec id="sec-3-20">
        <title>Lip Press</title>
      </sec>
      <sec id="sec-3-21">
        <title>Lip Pucker</title>
      </sec>
      <sec id="sec-3-22">
        <title>Eye Closure</title>
      </sec>
      <sec id="sec-3-23">
        <title>Eye Widen</title>
      </sec>
      <sec id="sec-3-24">
        <title>Smile</title>
      </sec>
      <sec id="sec-3-25">
        <title>Smirk</title>
      </sec>
      <sec id="sec-3-26">
        <title>Engagement</title>
      </sec>
      <sec id="sec-3-27">
        <title>Attention</title>
      </sec>
      <sec id="sec-3-28">
        <title>Anger</title>
      </sec>
      <sec id="sec-3-29">
        <title>Sadness</title>
      </sec>
      <sec id="sec-3-30">
        <title>Disgust Joy</title>
      </sec>
      <sec id="sec-3-31">
        <title>Surprise</title>
      </sec>
      <sec id="sec-3-32">
        <title>Fear</title>
        <p>Contempt
-0.07
-0.01
-0.05
0.00
-0.05
-0.04
-0.03
-0.02
-0.04
0.01
-0.09
-0.01
-0.05
-0.03
-0.05
-0.06
0.03
-0.01
0.04
-0.04
-0.05
-0.05
-0.02
-0.09
-0.07
-0.05
-0.07
-0.13*
-0.17**
-0.06
-0.05
-0.06
-0.05
-0.05
0.06
-0,04
-0.04
0.00
-0.03
-0.06
-0.07
-0.04
-0.08
0.00
-0.02
-0.08
5. Conclusions and Future Works
The ultimate goal of our research activities was to
verify whether facial micro expressions can be exploited to
create interfaces that can adapt diferently depending on
the characteristics of the active user. If so, it would be
possible to foster cultural and social inclusion between
individuals from diferent backgrounds and belonging to
diferent categories, including disadvantaged and at-risk
categories as well as vulnerable people. In particular,
from the experimental results, it emerged how it is
posexpressions and the degree of appreciation of an object,
specifically an artwork. It is also possible to identify
correlations within some homogeneous groups of testers.</p>
        <p>Our experimental analysis is very simplified and also
sufers from numerous limitations. Among others, it is
evident as follows:
• it was performed in a specific domain, namely</p>
        <p>that of cultural heritage;
• the micro facial expressions were collected in
re</p>
        <p>sponse to a specific stimulus, that is, the vision
• the data was collected through a virtual and not
of an artwork;
live experimentation;
• the sample of users was very limited;
• the sample of users was mostly made up of
university students, so it was anything but
heterogeneous.</p>
        <p>A much more extensive and rigorous experimental
analysis is therefore needed, including further categories
of users, scenarios (e.g., [26, 27, 28]), and information
(e.g., [29]). Only in this way we could indeed draw
definitive conclusions on the existence of correlations between
micro facial expressions and categories of testers.
timodal Behavior Analysis in the Wild, Computer
Vision and Pattern Recognition, Academic Press,
2019, pp. 1–8.
[2] B. T. Hung, L. M. Tien, Facial expression recognition
with cnn-lstm, in: R. Kumar, N. H. Quang, V.
Kumar Solanki, M. Cardona, P. K. Pattnaik (Eds.),
Research in Intelligent and Computing in Engineering,</p>
        <p>Springer Singapore, Singapore, 2021, pp. 549–560.
[3] Y. Tian, T. Kanade, J. F. Cohn, Facial expression
recognition, in: S. Z. Li, A. K. Jain (Eds.),
Handbook of Face Recognition, Springer London,
Lon[8] H. Gunes, M. Piccardi, Automatic temporal segment dentlitteratur, 1969.</p>
        <p>detection and afect recognition from face and body [25] C. G. Kohler, T. Turner, N. M. Stolar, W. B. Bilker,
display, IEEE Transactions on Systems, Man, and C. M. Brensinger, R. E. Gur, R. C. Gur, Diferences
Cybernetics, Part B (Cybernetics) 39 (2008) 64–84. in facial expressions of four universal emotions,
[9] J. Lien, T. Kanade, J. Cohn, C. Li, Detection, tracking, Psychiatry Research 128 (2004) 235 – 244.
and classification of action units in facial expression, [26] S. Caldarelli, D. F. Gurini, A. Micarelli, G. Sansonetti,
Robotics and Autonomous Systems 31 (2000). A signal-based approach to news recommendation,
[10] Y.-l. Tian, T. Kanade, J. F. Cohn, Recognizing action in: CEUR Workshop Proceedings, volume 1618,
units for facial expression analysis, IEEE Trans. CEUR-WS.org, Aachen, Germany, 2016.</p>
        <p>Pattern Anal. Mach. Intell. 23 (2001) 97–115. [27] M. Onori, A. Micarelli, G. Sansonetti, A comparative
[11] J. Carroll, J. Russell, Do facial expressions signal analysis of personality-based music recommender
specific emotions? judging emotion from the face systems, in: CEUR Workshop Proceedings, volume
in context., Journal of personality and social psy- 1680, CEUR-WS.org, Aachen, Germany, 2016.
chology 70 2 (1996) 205–18. [28] D. Valeriani, G. Sansonetti, A. Micarelli, A
compar[12] J. Russell, Culture and the categorization of emo- ative analysis of state-of-the-art recommendation
tions., Psychological bulletin 110 3 (1991) 426–50. techniques in the movie domain, Lecture Notes in
[13] J. Russell, Is there universal recognition of emo- Computer Science 12252 LNCS (2020) 104–118.
tion from facial expression? a review of the cross- [29] M. Saneiro, O. Santos, S. Salmeron-Majadas, J.
Botcultural studies., Psychological bulletin 115 1 (1994). icario, Towards emotion detection in educational
[14] G. Sansonetti, Point of interest recommendation scenarios from facial expressions and body
movebased on social and linked open data, Personal and ments through multimodal approaches, The
ScienUbiquitous Computing 23 (2019) 199–214. tific World Journal (2014).
[15] H. A. M. Hassan, G. Sansonetti, F. Gasparetti, A.
Micarelli, Semantic-based tag recommendation in
scientific bookmarking systems, in: Proceedings of
the 12th ACM Conference on Recommender
Systems, ACM, New York, NY, USA, 2018, pp. 465–469.
[16] A. Fogli, G. Sansonetti, Exploiting semantics for
context-aware itinerary recommendation, Personal
and Ubiquitous Computing 23 (2019) 215–231.
[17] M. Chang, G. D’Aniello, M. Gaeta, F. Orciuoli,</p>
        <p>D. Sampson, C. Simonelli, Building ontology-driven
tutoring models for intelligent tutoring systems
using data mining, IEEE Access 8 (2020) 48151–48162.
[18] G. D’Aniello, M. Gaeta, F. Orciuoli, G. Sansonetti,</p>
        <p>F. Sorgente, Knowledge-based smart city service
system, Electronics 9 (2020).
[19] G. Sansonetti, F. Gasparetti, A. Micarelli, F. Cena,</p>
        <p>C. Gena, Enhancing cultural recommendations
through social and linked open data, User Modeling
and User-Adapted Interaction 29 (2019) 121–159.
[20] M. Mezzini, C. Limongelli, G. Sansonetti,</p>
        <p>C. De Medio, Tracking museum visitors through
convolutional object detectors, in: Adjunct
Publication of UMAP ’20, ACM, New York, NY,</p>
        <p>USA, 2020, p. 352–355.
[21] R. L. Birdwhistell, Kinesics and context: Essays on
body motion communication, University of
Pennsylvania press, 2010.
[22] A. Mehrabian, M. Wiener, Decoding of inconsistent
communications, Journal of personality and social
psychology 6 (1967) 109—114.
[23] P. Ekman, W. Friesen, Facial Action Coding System,</p>
        <p>Consulting Psychologists Press, 1978.
[24] C. Hjortsjö, Man’s Face and Mimic Language,
Stu</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>X.</given-names>
            <surname>Alameda-Pineda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sebe</surname>
          </string-name>
          ,
          <article-title>Multimodal behavior analysis in the wild: An introduction</article-title>
          , in: X.
          <string-name>
            <surname>Alameda-Pineda</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Ricci</surname>
          </string-name>
          , N. Sebe (Eds.), Mul-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>