<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analysis of Dynamic Characteristics of Spontaneous Facial Expressions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>@oecu.jp)</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Division of Information and Computer Sciences, Osaka Electro-Communication University</institution>
          ,
          <addr-line>18-8 Hatsucho, Neyagawa, Osaka, 572-8530</addr-line>
          ,
          <country country="JP">JAPAN</country>
        </aff>
      </contrib-group>
      <fpage>341</fpage>
      <lpage>345</lpage>
      <abstract>
        <p>The relationship between emotions elicited by film clips and spontaneous dynamic facial expressions was investigated. Participants (n = 10) watched 13 emotional film clips, and their facial responses were recorded using a motion capture system. We extracted 3-sec length intervals in which facial events occurred from motion sequences. The participants were asked to self-assess their felt emotional arousal and positive and negative affect for each interval. To find the spatiotemporal components of dynamic facial expressions, we employed the multiway decomposition method, PARAFAC, on a time sequence of facial landmark coordinates standardized via methodologies of geometric morphometrics. The second component was related to facial movement that appears slowly and then maintains a stable state over a long term. Finally, the third component was linked to movement that appears and rapidly returns to the initial state. Local regression analysis was performed to obtain the distribution of the component scores on a two-dimensional plane: pleasure-displeasure and arousal-sleepiness. The third component was negatively correlated with arousal level.</p>
      </abstract>
      <kwd-group>
        <kwd>Dynamic Facial Expressions</kwd>
        <kwd>Spontaneous Expressions</kwd>
        <kwd>Emotion</kwd>
        <kwd>Motion Capture</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Many studies have investigated morphological features of
facial expressions. Most researches on facial expressions
have relied on the use of photographed images or “static”
information. However, affective facial expressions are
dynamic in nature. Several researchers have claimed that
dynamically changing facial configuration plays an
important role in the perception of facial patterns and
attribution of emotional category labels
        <xref ref-type="bibr" rid="ref9">(Krumhuber,
Kappas, &amp; Manstead, 2013)</xref>
        . Particularly when either
morphological information is limited or intensity of
expressions is lower, the presence of facial motion signals
helps perceivers to identify emotion
        <xref ref-type="bibr" rid="ref3 ref5">(Cunningham &amp;
Wallraven, 2009; Bould &amp; Morris, 2008; Krumhuber &amp;
Manstead, 2009)</xref>
        .
      </p>
      <p>
        Although the presence of a motion signal is important for
emotion recognition, the benefit of dynamic display does
not seem to be solely attributed to an increase in the amount
of static information. Natural facial events would not occur
at a constant speed as in morphing animations; however,
they occur in a nonlinear manner. Such nonlinear facial
motion leads to more accurate emotion recognition
compared to linear motion animation
        <xref ref-type="bibr" rid="ref12">(Wallraven, Breidt,
Cunningham, &amp; Bülthoff, 2008)</xref>
        . Therefore, for elucidating
dynamic facial expressions, complex spatiotemporal
information embedded in facial motion is also considered to
be at least as important as “static” facial information.
      </p>
      <p>
        In many studies of facial expressions, portraits of actors’
stereotypical emotion expressions or facial actions of
specific predefined patterns (e.g., FACS
        <xref ref-type="bibr" rid="ref7">(Ekman &amp; Friesen,
1978)</xref>
        ) have been employed. However, it is difficult to
capture temporal facial changes using posed expressions.
Thus, in the present study, we investigated dynamic facial
expressions based on spontaneous facial expressions elicited
by emotional film clips. Although some conventional facial
action coding techniques based on human observations are
designed to code temporal changes in facial expressions, it
is difficult to quantitatively describe facial dynamic changes
with such coding systems. To obtain dynamic facial
configurations, we employed a facial motion capture system.
      </p>
      <p>
        Facial motion capture systems have been widely used in
the film industry. Recently, such landmark-based
approaches have been applied to research on spontaneous
dynamic facial expressions
        <xref ref-type="bibr" rid="ref11 ref13">(Valstar, Gunes, &amp; Pantic, 2007;
Zhang, Yin, Cohn, Canavan, Reale, Horowitz, &amp; Girard,
2014)</xref>
        . The present study aims at extracting the components
of dynamic facial expressions using a combination of the
methodologies of geometric morphometrics and the
multiway decomposition method PARAFAC
        <xref ref-type="bibr" rid="ref8">(Kroonenberg,
1983)</xref>
        , being a type of modified principal component
analysis based on three dimensional landmark coordinate
data. Furthermore, we investigated the relationship between
the extracted facial-expression components and the
selfreported emotion of the owners of the faces.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Method</title>
      <sec id="sec-2-1">
        <title>Participants</title>
        <p>Japanese undergraduate and graduate students (n = 10: 8
men and 2 women; age: 18 to 24 years, mean age = 21.4,
SD = 1.96) participated on a voluntary basis. The
participants gave written consent to participate in the study,
and Osaka Electro-Communication University ethics
committee approved the study.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Procedure</title>
        <p>The participants watched thirteen clips intended to elicit
differentiated emotional states, such as positive/negative
mood, anger, sadness, fear, or disgust. The clips were taken
from a database of emotion-eliciting films (Schaefer, Nils,
Sanchez, &amp; Philippot, 2010) (Table 1). Each of the clips was
shown in a Japanese-dubbed version and was approximately
2 min in length.</p>
        <p>Participants were instructed to watch all the clips
attentively, without diverting their attention from the
monitor. When participants were ready to begin the
experiment, lighting in the room was dimmed. Each trial
started with a countdown leader (3s). It was followed by a
presentation of explanatory text of the outline of the
narrative of the film (30s), and a presentation of the clip.
This procedure was repeated for each film excerpt. The
order of items within each set was randomized. Each session
lasted approximately 40 min. The film clips were displayed
on a 12-inch monitor placed about 1m in front of the
participant. The audio of the films was played through
speakers placed in front of the participant.</p>
        <p>The sequences of 30 facial landmark coordinates were
recorded using a facial motion capture system
(OptiTrackFLEX: V100R2) at 100Hz (Figure 1). The
landmark locations for each participant were decided based
on a front facial image that was photographed before the
experiment. We also set up a video camera to capture the
entire scene and speakers for monitoring and as a reference
for synchronization between facial motion sequence and
stimuli.</p>
        <p>
          To extract intervals in which facial events occurred from
the motion data, we applied principal component analysis
(PCA) based on automatic motion segmentation technique
to the facial motion sequences
          <xref ref-type="bibr" rid="ref1">(Barbič, Safonova, Pan,
Faloutsos, Hodgins, &amp; Pollard, 2004)</xref>
          . This technique is
based on the observation that simple motions exhibit lower
dimensionality than more complex motions. We set the
segmentation parameters to k = 3s, τ = 0.85s，and l = 75s.
As a result, all the facial motion sequences of 10
participants were divided into 1325 sections. It is expected
that some of the beginnings of the sections (hereafter
referred to as segmentation points) correspond to the
moments just after facial expressions began. Three
observers determined visually whether each segmentation
point neighbors the starting point of facial expressions
according to the criteria; (1) the segmentation point must be
included in the period of film presentation, (2) motion
segmentation must be caused by facial movement changes
but not by head movements, and (3) a segmentation point
that was segmented by solely eye blinking is not the starting
point of facial expressions. Consequently, 98 segmentation
points were selected as analysis targets. For each selected
segmentation point, we extracted a 3-sec length interval of
facial motion sequence from 1s before to 2s after the
segmentation point.
        </p>
        <p>2
12
24</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Facial Motion Analysis</title>
      <p>
        Each face in each frame differed in location, size, and
orientation. To standardize them, we performed a
Generalized Procrustes analysis (GPA) on the facial
landmarks of all frames of all faces
        <xref ref-type="bibr" rid="ref2 ref6">(Bookstein, 1997;
Dryden &amp; Mardia, 1998)</xref>
        . A GPA is an analytical method
used for multivariate statistical analysis of landmark
locations expressed in Cartesian coordinates. This method
preserves information on the relative spatial relationships of
landmarks throughout the standardization. Using this
method, we standardized the three dimensional sequential
landmark coordinates of facial motion (Figure 2). Moreover,
landmark coordinate values were standardized for each
coordinate so that the mean equals 0 and the standard
deviation equals 1.
      </p>
      <p>This motion data of the participants can be regarded as a
three-mode data  = !"#  = 1,2, … , ,  = 1,2, … , ,  =
1,2, … ,  (i corresponds to the time sequence, j
corresponds to the landmark, and k corresponds to each
extracted interval). To find the components of facial
expressions from the motion data, we performed PARAFAC
analysis on the three-mode data. PARAFAC is a
generalization of PCA to higher order arrays, given by
!
!"# =</p>
      <p>!" !"!" + !"#
!!!
In the above, !", !", !" correspond to the mode of time
sequence, and landmark and interval, respectively.</p>
      <p>We calculated the principal component scores and
loadings up to the third factor with PARAFAC. Figure 3
illustrates the changes of each factor loading concerning
time sequence. The first component was found to be a
“static” factor, indicating that it was not linked to any facial
motion but to static morphological features that are a
geometric arrangement of facial parts. However, the second
and the third component were related to facial movement. In
this study, we examined the second and third components
that were linked to facial movement.</p>
      <p>The second component score started increasing at around
60 msec and peaked at around 150 msec, after which the
score becomes stable. Thus, the second component is
considered to be related to a facial movement that appears
slowly and maintains a stable state. The third component
was linked to a movement that appears at around 50 msec,
peaks at 100 msec, and then rapidly returns to the initial
state.</p>
      <p>To visualize the facial changes along the second and third
component, facial landmark movements were reconstructed
based on the loadings of both time sequence and landmarks
for each component (Figure 4). The results of the
reconstruction indicated that a higher second component
score of extracted interval was related to eyebrow raising
and mouth opening, whereas a lower score was related to
eyebrow lowering and mouth closing (Figure 4-A). A higher
third component score was related to horizontal movement,
including mouth opening in a horizontal direction, and a
CO M PO N EN T1</p>
      <p>0
G .01
DIN .000
A</p>
      <p>LO .−105 0
CO M PO N EN T2</p>
      <p>0
G .01
DIN .000
A</p>
      <p>LO .−105 0
CO M PO N EN T3</p>
      <p>0
G .01
DIN .000
A
LO .−105 0
50
50
50
001
05
0
−05
−010
001
50
0
−05
−010
A
B
100
100
100</p>
      <p>TIM 1E（50m s）
TIM 1E（50m s）</p>
      <p>TIM 1E（50m s）
CO M PO N EN T2
200
250</p>
      <p>300
200
250</p>
      <p>300
200
250
300
lower score was related to a movement of contracting lips
into a rounded shape and knitting eyebrows (Figure 4-B).
−100
−50
0
50
100</p>
    </sec>
    <sec id="sec-4">
      <title>Assessment of Participants’ Affect</title>
      <p>
        Subjective emotional states during observation of emotional
film clips were examined using the same film clips as used
in the facial motion capture session. The participants were
also the same as those in the facial motion capture session.
For each participant, we presented each film clip and
paused the clip at the moment the facial expression had
occurred, which marked the segmentation point. Participants
were asked to rate their mood using the Affect Grid
technique
        <xref ref-type="bibr" rid="ref10">(Russell, Weiss, &amp; Mendelsohn, 1989)</xref>
        for each
moment that their facial expressions had occurred. The
Affect Grid is a mood scale that requires participants to
place a mark to report their current mood on a 9 x 9 grid
with the horizontal dimension representing affective valence
(from unpleasantness to pleasantness) and the vertical
dimension representing a degree of perceived activation
(ranging from sleepiness to high arousal). Both the valence
and arousal scores were standardized for each participant.
      </p>
      <p>
        The relationship between facial motion components and
subjective emotional state was investigated. It was
considered inappropriate to assume an a priori model
regarding the relationship between facial expressions and
emotion. Therefore, we adopted LOESS, a robust leveling
technique based on local polynomial regression
        <xref ref-type="bibr" rid="ref4">(Cleveland,
Grosse, &amp; Shyu, 1992)</xref>
        , that can graphically demonstrate the
relationship between emotion and facial motion components.
      </p>
      <p>The relationship between Affect Grid scores and smoothed
component scores of both the second component and the
third component is shown in Figure 5 (span, kernel function
range was set as 3.0). Figure 5-A shows that both negative
and low arousal affect lead to higher scores for the second
component, suggesting that the component of facial
movement that appears slowly is related to expressions both
of emotional valence and arousal. For this component, both
positive and high arousal expressions are similarly
correlated to eyebrow raising and mouth opening. On the
other hand, it can be seen from Figure 5-B that arousal level
was basically linked to the third component that reflects
short-term facial deformations, except for the cases of
strong unpleasant affect. As for this component, a higher
arousal level corresponds to the movement of contracting
lips into a rounded shape and knitting eyebrows.</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>This study suggests that dynamic facial expressions consist
of plural components differing in spatiotemporal
characteristics (i.e., long-term facial deformations and
shortterm facial deformations). One component was found to be
connected to facial deformations that appear slowly and
maintains a stable state over a long term, and the other was
connected to rapid appearance and disappearance of facial
deformations. Each component corresponds to movements
in different directions and of different facial parts. The
findings of the present study indicate that dynamically
changing facial expressions can be described by
synthesizing a few components of facial movement that
differ in spatiotemporal characteristics.</p>
      <p>Moreover, we show the connection between the
components of expressions and emotional valence and
arousal. The results suggested that long-term facial
deformation was related both to valence and arousal
whereas short-term deformation was related solely to
arousal.</p>
      <p>The results of the study also suggest that the combination
of motion segmentation technique, methodology of
geometric morphometrics, and modified principal
component analysis is a valid method for finding
components of dynamic facial expressions.</p>
      <p>A
C O M PO N EN T2
−2</p>
      <p>Because of the small number of participants in this study,
it is difficult to consider the facial components found were
stable. Future studies should examine the accuracy of
PARAFAC-based expression model in detail with a larger
number of participants.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>A part of this study was supported by a Grant-in-Aid for
Scientific Research (No. 22730591) from the Japan Society
for the Promotion of Science.
and</p>
      <p>Vision</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Barbič</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Safonova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>J. Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faloutsos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hodgins</surname>
            ,
            <given-names>J. K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pollard</surname>
            ,
            <given-names>N. S.</given-names>
          </string-name>
          (
          <year>2004</year>
          , May).
          <article-title>Segmenting motion capture data into distinct behaviors</article-title>
          .
          <source>In Proceedings of the 2004 Graphics Interface Conference</source>
          (pp.
          <fpage>185</fpage>
          -
          <lpage>194</lpage>
          ). Canadian Human-Computer Communications Society.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Bookstein</surname>
            ,
            <given-names>F. L.</given-names>
          </string-name>
          (
          <year>1997</year>
          ).
          <article-title>Morphometric tools for landmark data: geometry and biology</article-title>
          . Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Bould</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Morris</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Role of motion signals in recognizing subtle facial expressions of emotion</article-title>
          .
          <source>British Journal of Psychology</source>
          ,
          <volume>99</volume>
          (
          <issue>2</issue>
          ),
          <fpage>167</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Cleveland</surname>
            ,
            <given-names>W. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grosse</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Shyu</surname>
            ,
            <given-names>W. M.</given-names>
          </string-name>
          (
          <year>1992</year>
          ).
          <article-title>Local regression models</article-title>
          .
          <source>Statistical models in S</source>
          ,
          <volume>309</volume>
          -
          <fpage>376</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>D. W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wallraven</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2009</year>
          ,
          <article-title>September)</article-title>
          .
          <article-title>The interaction between motion and form in expression recognition</article-title>
          .
          <source>In Proceedings of the 6th symposium on applied perception in graphics and visualization</source>
          (pp.
          <fpage>41</fpage>
          -
          <lpage>44</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Dryden</surname>
            ,
            <given-names>I. L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mardia</surname>
            ,
            <given-names>K. V.</given-names>
          </string-name>
          (
          <year>1998</year>
          ).
          <article-title>Statistical shape analysis</article-title>
          (Vol.
          <volume>4</volume>
          ). Chichester: Wiley.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Friesen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ekman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>1978</year>
          ).
          <article-title>Facial action coding system: a technique for the measurement of facial movement</article-title>
          . Palo Alto.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Kroonenberg</surname>
            ,
            <given-names>P. M.</given-names>
          </string-name>
          (
          <year>1983</year>
          ).
          <article-title>Three-mode principal component analysis: Theory and applications</article-title>
          (Vol.
          <volume>2</volume>
          ). DSWO press.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Krumhuber</surname>
            ,
            <given-names>E. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kappas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Manstead</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Effects of dynamic aspects of facial expressions: A review</article-title>
          .
          <source>Emotion Review</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <fpage>41</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Mendelsohn</surname>
            ,
            <given-names>G. A.</given-names>
          </string-name>
          (
          <year>1989</year>
          ).
          <article-title>Affect grid: a single-item scale of pleasure and arousal</article-title>
          .
          <source>Journal of personality and social psychology</source>
          ,
          <volume>57</volume>
          (
          <issue>3</issue>
          ),
          <fpage>493</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Valstar</surname>
            ,
            <given-names>M. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gunes</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Pantic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2007</year>
          , November).
          <article-title>How to distinguish posed from spontaneous smiles using geometric features</article-title>
          .
          <source>In Proceedings of the 9th international conference on Multimodal interfaces</source>
          (pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Wallraven</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Breidt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>D. W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Bülthoff</surname>
            ,
            <given-names>H. H.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Evaluating the perceptual realism of animated facial expressions</article-title>
          .
          <source>ACM Transactions on Applied Perception (TAP)</source>
          ,
          <volume>4</volume>
          (
          <issue>4</issue>
          ),
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohn</surname>
            ,
            <given-names>J. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Canavan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reale</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horowitz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Girard</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>BP4DSpontaneous: a high-resolution spontaneous 3D dynamic facial expression database</article-title>
          .
          <source>Image Computing</source>
          ,
          <volume>32</volume>
          (
          <issue>10</issue>
          ),
          <fpage>692</fpage>
          -
          <lpage>706</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>