<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluating Methods for Emotion Recognition based on Facial and Vocal Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matthias Ley</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Egger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sten Hanke</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AIT Austrian Institute of Technology GmbH</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Center for Health and Bioresources</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of eHealth</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Applied Sciences - FH JOANNEUM GmbH</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>An automatized emotion recognition based on Information and Communication Technology (ICT) is currently of high interest and opens possibilities for various applications. Emotions are a key feature in communication and the possibility of recognizing emotions may advance Human Computer Interaction. Furthermore unobtrusive emotion recognition can be used to determine levels of stress while handling machines and vehicles and helps to reduce the risk of car crashes or accidents at work. The e-health sector bene ts from emotion recognition to monitor mental states of patients at home and to improve recovery rates through a more personalized care. Emotion recognition might be useful to predict emotions from patients who are unable to express their emotions, for example patients su ering from autism, Parkinsons or locked-in syndrome. At the moment several methods for an ICT based emotion recognition exist. The methods vary in terms of precision, usability, application area as well as number of emotions which can be detected. An overview of methods used for emotion recognition and the current state of the art is given in this paper. The paper is focusing on evaluating existing tools for emotion recognition based on facial features as well as vocal features in voice interactions. The methods have been tested for its usability in the eld of e-health and applications for elderly care. A case study was conducted to elicit emotions in 20 participants after presenting them an a ective video based on rated picture and sound databases. Furthermore, video recordings of the frontal face as well as spoken texts were used for testing the usability of the three projects. The results suggest that facial recognition works best for the emotion happiness while voice recognition works best for the emotion anger.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Austria</p>
      <p>Graz</p>
      <p>Austria</p>
      <p>Copyright c 2019 by the paper's authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY
4.0).</p>
      <p>In: E. Calvanese Strinati, D. Charitos, I. Chatzigiannakis, P. Ciampolini, F. Cuomo, P. Di Lorenzo, D. Gavalas, S. Hanke, A.
Komninos, G. Mylonas (eds.): Proceeding of the Poster and Workshop Sessions of AmI-2019, the 2019 European Conference on
Ambient Intelligence, Rome, Italy, 13-11-2019, published at http://ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>
        The ability to recognize emotions is one of the criteria for emotional intelligence and therefore an important part
of human intelligence. Machine intelligence needs to include emotional intelligence to recognize human a ective
states. Emotion recognition is therefore fundamental towards advanced Human Computer Interaction. Recent
research emphasizes on recognition of emotional reactions from non-verbal cues such as facial expressions, voice,
gestures and bio signals [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] [
        <xref ref-type="bibr" rid="ref2">3</xref>
        ] [2]. Ekman and Friesen de ned six basic emotions, which are valid for all ages
and cultural di erences [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ]. In 1873 Wilhelm Wundt designed a novel three-dimensional emotion classi cation
system describing the valence, arousal and intensity of emotions on three axes [
        <xref ref-type="bibr" rid="ref4">5</xref>
        ]. Most common is a two
dimensional model derived from the circumplex model, where valence describes if the emotion is positive or
negative and arousal if the emotion is more passive or active. Happiness, for example, is labeled as an emotion
with both high valence and arousal [
        <xref ref-type="bibr" rid="ref5">6</xref>
        ]. A similar model called the Geneva Emotion Wheel is commonly used
in recent studies on emotion recognition [
        <xref ref-type="bibr" rid="ref6">7</xref>
        ]. The methods mentioned for emotion recognition vary in terms of
accuracy and number of emotions which can be detected. The application where the emotion recognition will
be used needs to be considered. For Activities of Daily Living (ADL) emotion recognition based on wearable
measurement systems (e.g. smart watches) to measure physiological bio-signals might be appropriate. For
applications with a strong focus on video communication (e.g. in voice-video communication or interactions with
an avatar or a robot equipped with a camera) facial and voice recognition is preferable. For a detailed analysis
of emotion recognition methods respective to the application area see [18]. This paper is analysing methods for
emotion recognition based on facial and vocal features. Three toolboxes and SDKs are tested in a simple setup
to evaluate their usefulness for emotion recognition applications. The following technologies were tested: the
commercial FaceReader 1 Software by Noldus Information Technology B.V., the A dex SDK 2 by A ectiva as
well as OpenVokaturi 3 by Vokaturi B.V.
1.1
1.1.1
      </p>
      <sec id="sec-2-1">
        <title>State of the Art in Emotion Recognition</title>
      </sec>
      <sec id="sec-2-2">
        <title>Overview Emotion Recognition</title>
        <p>
          Emotion recognition can be achieved with various techniques. The right method depends on the application
area as well as the to be analysed emotions. The accuracy for one certain emotion and method might not be
reproducible with another method. Multi modal systems are used to compensate low accuracy rates for certain
emotions. Furthermore, emotion recognition from bio signals (e.g. from electrodermal activity, respiration, skin
temperature or electromyography) generally requires a multi modal setup. This is due to the fact, that most bio
signals are not su cient to describe emotions on a two dimensional scale in a uni modal setup. Electrodermal
activity, for example, only allows the expression of the excitement level of a person (arousal). Table 1 shows an
overview of possible accuracy, bene ts and limitations of common methods for emotion recognition. Regarding
the state of the art in emotion recognition, several review papers give more detailed insight into measurement
setups and their classi cation accuracy [18] [19] [
          <xref ref-type="bibr" rid="ref12">20</xref>
          ] [
          <xref ref-type="bibr" rid="ref13">21</xref>
          ].
1.1.2
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Emotion Recognition based on Facial and Vocal Features</title>
        <p>
          Several studies use a multi-cue approach when building tools for emotion recognition. Data from facial expressions
as well as from speech is merged to increase the classi cation rate [
          <xref ref-type="bibr" rid="ref20">28</xref>
          ], [
          <xref ref-type="bibr" rid="ref21">29</xref>
          ], [
          <xref ref-type="bibr" rid="ref22">30</xref>
          ]. For validation of the methods
some studies validate their data against lab trials where participants are asked to pose certain emotions. It is
questionable if validating against actors is a viable method because the process of emotion expression is complex
and might not be reproducible when acted out by humans. This fact might be easier for facial and vocal cues
and more inaccurate when concerning bio signals caused by the autonomous nervous system. More promising is
a validation against labeled data such as rated video and sound databases, which is the preferred method for a
comparable validation. For example Shan et al. [
          <xref ref-type="bibr" rid="ref23">31</xref>
          ] is validating several methods using local binary patterns in
pictures against rated databases (Japanese Female Facial Expression (JAFFE) Database and the Cohn-Kanade
database) [
          <xref ref-type="bibr" rid="ref24">32</xref>
          ]. Kahou et al. [
          <xref ref-type="bibr" rid="ref25">33</xref>
          ] uses a deep learning approach to classify 7 emotions from short clips of
Hollywood movies based on visual face features. Tivatansakul et al. [
          <xref ref-type="bibr" rid="ref26">34</xref>
          ] uses emotion recognition based on
facial expression to provide better health services. Furthermore there are emotion recognition approaches based
on vocal features as well as from the context of the message (vocabulary, syntax and usage of words). For vocal
1https://www.noldus.com
2https://www.affectiva.com/product/emotion-sdk/
3https://vokaturi.com/
features, Zhang et al. [35] included the Mel frequency cepstrum coe cient, pitch and formants in his analysis to
recognize six emotions.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <p>The following three projects were chosen for testing the usability of facial and vocal emotion recognition: To
recognise emotions from facial features, the FaceReader by Noldus Information Technology B.V., Wageningen,
Netherlands as well as the A dex SDK by A ectiva, Boston, Massachusetts were used. To analyse spoken texts
taken from audio recordings, the OpenVokaturi SDK, Vokaturi B.V., Amsterdam, Netherlands was used.
2.1</p>
      <sec id="sec-3-1">
        <title>Facial Recognition with FaceReader</title>
        <p>
          Four basic emotions were chosen for investigation - contentment, disgust, sadness and happiness. A study
was conducted at the AIT Austrian Institute of Technology GmbH, Wiener Neustadt, Austria as well as the
University of Applied Sciences Technikum Wien, Vienna, Austria. A total of 23 data sets were recorded, based
on 21 participants. The age of the participants ranged from 23 to 39 years. Males, females, people with glasses,
beards and di erent skin colors were involved. One participant took part in an extended study to prove the
reproducibility of the system. A video was shown to the participants based on rated picture (NAPS) and sound
(DEAM) databases to elicit the desired emotions [
          <xref ref-type="bibr" rid="ref7">8</xref>
          ] [
          <xref ref-type="bibr" rid="ref8">9</xref>
          ]. The video consisted of four emotional phases with
two minutes for each phase. Between these phases a relaxation phase ensures that the emotional e ects and
the reaction of the autonomous nervous system calmed down. The frontal face of the participants was recorded
during the whole measurement protocol. The recorded videos of the participants faces were then used for the
analysis with the FaceReader. The participants rated their emotional experience with a self assessment test
(SAM) [
          <xref ref-type="bibr" rid="ref9">10</xref>
          ] at the beginning of each relaxation phase. The test involved rating of the participants arousal (from
mild to intense) and valence (from unpleasant to pleasant) on a one to ve scale. MATLAB by The MathWorks
Inc., Massachusetts, United States of America, was used for the analysis and evaluation of the recorded data.
The self assessed scores were correlated against the classi cation results from the FaceReader.
2.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Facial Recognition with A dex</title>
        <p>To test the usability of the A dex SDK for facial emotion recognition video material from the platform YouTube
was used. Table 2 describes the videos used for the analysis. Contrary to the previous method this part of the
research was based on publicly available videos of people showing emotions for various reasons (mostly acting)
instead of having people react to emotional elicitation material. The videos were not chosen based on rated
databases and were not annotated by experts. It was ensured that videos showing the frontal face were used
and proper lighting conditions were met. The length of the videos ranged from short videos (4 to 11 seconds)
to longer videos (1 to 3 minutes). One video le (ID3, table 5) was separated into eight smaller parts to only
analyse relevant emotions instead of the original 30 emotions portrayed by the same actor (Breeze Woodson).
The videos were rated individually and the average classi cation accuracy of seven emotions was calculated.
The analysed emotions were contentment, surprise, anger, sadness, disgust, fear and joy. The A dex SDK was
implemented in a C# based project which was able to analyse the recorded video les as well as to analyse faces
with a webcam in real time. The results were then exported and analysed in MATLAB.
2.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Voice Recognition with OpenVokaturi</title>
        <p>
          Similar to section 2.2 this part of the research was not based on self made recordings but on an annotated
dataset. For testing the OpenVokaturi SDK an audio visual database called RAVDESS (Ryerson Audio-Visual
Database of Emotional Speech and Song) was used [
          <xref ref-type="bibr" rid="ref10">11</xref>
          ]. The database contains recordings from 24 actors. For
speech records, two statements were spoken in two intensities (normal and strong) in two repetitions. A Python
script was written to analyse the sound les. The les had to be corrected in Audacity recording and editing
software [
          <xref ref-type="bibr" rid="ref11">12</xref>
          ]. The les were imported and saved as a 16-bit mono WAV les without meta descriptions. After
the classi cation of the les with the OpenVokaturi SDK was successful, the results were exported and further
analysed in MATLAB.
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <sec id="sec-4-1">
        <title>FaceReader</title>
        <p>Table 3 shows the self rated arousal and valence scores from the participants after watching the emotional phases
from the elicitation video. This result gives insight whether the material used was appropriate to elicit the
desired emotions. Table 4 shows the average classi cation accuracy for one subject over three measurements
(reproducibility measurement). The participant was instructed to force facial expressions based on the experienced
emotion. Figure 1 shows the average classi cation accuracy for all 23 datasets. The accuracy is calculated over
the length of two minutes for each emotional phase. The participants were instructed to behave naturally.</p>
        <p>ID
1
2
3
4
5
6
7
8
9
10</p>
      </sec>
      <sec id="sec-4-2">
        <title>Description</title>
        <p>Can You Watch This
Without Smiling
Acting - Sad Scene
100 People Show
Us What It Looks Like
When They Cry
30 EMOTIONS
- Breeze Woodson
Excerpt from ID3
Excerpt from ID3
Excerpt from ID3
Excerpt from ID3
Excerpt from ID3
Excerpt from ID3
Excerpt from ID3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Emotion</title>
        <sec id="sec-4-3-1">
          <title>Happy Sad Sad</title>
        </sec>
        <sec id="sec-4-3-2">
          <title>Angry</title>
          <p>Disgust</p>
          <p>Happy</p>
          <p>Laughing
Pain / Ouch
Pouty
Sad
01:11
02:45
02:22
00:09
00:04
00:07
00:11
00:04
00:04
00:10</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>Length [m:s]</title>
      </sec>
      <sec id="sec-4-5">
        <title>Source</title>
        <p>[14]
[15]
[16]
[17]
Contentment</p>
        <p>Sad</p>
        <p>Disgusted</p>
        <p>Happy
The following table 5 shows the results of the emotion recognition of YouTube videos with the A dex SDK. The
prediction results for contentment and fear were not displayed due to visibility and the fact, that their maximum
accuracy (8.7 % for contentment, 1.7 % for fear) were exceptionally low.
ID
1
2
3
4
5
6
7
8
9
10
11</p>
      </sec>
      <sec id="sec-4-6">
        <title>Disp. Em.</title>
      </sec>
      <sec id="sec-4-7">
        <title>Happy</title>
        <p>Sad</p>
        <p>Sad
Angry
Disgust</p>
      </sec>
      <sec id="sec-4-8">
        <title>Happy</title>
      </sec>
      <sec id="sec-4-9">
        <title>Laughing</title>
      </sec>
      <sec id="sec-4-10">
        <title>Pain</title>
        <p>Pouty
Sad</p>
      </sec>
      <sec id="sec-4-11">
        <title>Surprise Det. Em. Joy</title>
        <p>Surprise
Anger
Sadness
Anger
Joy</p>
        <p>Joy</p>
      </sec>
      <sec id="sec-4-12">
        <title>Anger</title>
        <p>Disgust
Surprise</p>
      </sec>
      <sec id="sec-4-13">
        <title>Surprise Ang.</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>The results of the emotion recognition through vocal features are presented in table 6. Displayed are the emotion
labels of the audio recordings based on 192 samples for each emotion, the ve emotional states (neutral, happy,
sad, angry, fear) and the number of failed samples during the analysis. Similar to the other both methods, an
overall prediction accuracy is expressed.</p>
      <p>For the analysis with the FaceReader and the dataset gathered from one participant measured three times while
excessively expressing the desired emotion, the best achieved classi cation accuracy occurred during the phase
"Happy" with 76.9 % mean prediction accuracy (table 4). The classi cation accuracy for "Happy" is followed by
"Sad" (53.0 %), "Contentment" (17.2 %) and eventually "Disgusted" (6.4 %). For the analysis of the whole 23
recordings ( gure 1), the accuracy ranking for "Happy" and "Sad" are ipped compared to the previous analysis.
The overall accuracy for all participants is lower than the accuracy for one subject forcing the emotions. This is
probably based on the fact, that the participants were instructed to behave naturally. Furthermore, each picture
taken from the rated database to create the video was presented for 15 seconds (eight pictures, total duration of
two minutes per emotional phase). It seems natural, that the participants might express the emotion for a short
amount of time and then return to a neutral facial state, which results in a lower overall accuracy. However,
table 3 shows that the material used to elicit the emotions meets the requirements and that the participants
experienced the desired emotions. In a further step, it might be bene cial to only analyse the dominant emotion
over a certain time window, and to focus on the emotions happy and sad, as contentment and other emotions
result in similar facial features to a neutral expression.</p>
      <p>The usability of the A dex SDK was based on the analysis of unrated video material and the sole decision
of one person (author). It would be appropriate to refer to a rated database in future tests. For the video
material used it could be observed that the recognition of the emotion happy works best (assuming that joy is
closely related to happiness in terms of arousal and valence scores). Video ID 1 contains a compilation of people
laughing and being happy in general. The high prediction rate of 78 % for ID 1 (table 5) suggests that the A dex
SDK works properly when detecting multiple faces (20 individual actors) (table 5. The video covers people from
most ethnicities, eyeglass wearers, people with beards and includes males and females. Similar to the results
with the FaceReader for three measurements on the same participant (table 4), the recognition of happy works
better than for any other emotion. Besides happiness, one sample was correctly classi ed for surprise and pain
(assumed to be anger) and no sad or disgust sample was correctly classi ed with the A dex SDK. The unusual
high recognition results for the emotion happy are most likely dependant on the unique facial characteristic for
that emotion. In conclusion it can be said that emotion recognition with video material works best for the
emotion happy across the two analysed systems.</p>
      <p>After correcting the format of the audio les the script could not produce results for a number of les. The
reason for this is unknown and might be caused by di erent recording methods of the database RAVDESS, the
length or the quality of the audio les. Furthermore, the overall prediction accuracy is presented. Analysing
the dominant emotion for each sample instead of the mean overall prediction accuracy for each emotion might
give additional insight regarding which method is most applicable. Regarding table 6 it can be seen that even
for the happy labeled sound samples the emotion could not be predicted correctly (angry dominates over happy
detection). When listening to the audio les it seemed that some happy samples did not sound happy at all. This
leads to the assumption that the RAVDESS database might have labeled their sound les misleadingly and that
a di erent database might yield better results. The only positive classi cation was for the sound les labeled as
angry (mean classi cation accuracy of 42.0 %). The results for the sound les labeled as sad were exceptionally
bad (49 failed samples out of 192, mean classi cation accuracy of 9.2 % for sad). This might be caused by the
fact, that sadly spoken sentences have less detectable features as when speaking in other emotional states. In
general it seems that the classi cation for happy is confused with the classi cation for angry.
[2] K. Dai, H. J. Fell and J. MacAuslan, Recognizing emotion in speech using neural networks, in
Proceedings of the 4th IASTED International Conference on Telehealth and Assistive Technologies,
Telehealth/AT 2008, pp. 3136, 2008.
[13] M. Ley, Emotion recognition from facial recognition and bio signal analysis, in ful llment of the
requirements for the degree of Master of Science in Engineering, University of Applied Sciences Technikum
Wien, 2019.
[14] [BuzzFeedVideo], Can You Watch This Without Smiling?</p>
      <p>https://www.youtube.com/watch?v=f8OmSWxF6h8, 04.04.2016.
[15] [Artemis], Acting - Sad Scene - Artemis [Video
https://www.youtube.com/watch?v=9qRGBRYHO0g, 24.02.2014.
[Video File],</p>
      <sec id="sec-5-1">
        <title>Retrieved from</title>
        <p>File],</p>
      </sec>
      <sec id="sec-5-2">
        <title>Retrieved from</title>
        <p>[16] [Cut], Crying | 100 People Show Us What It Looks Like When They Cry | Keep it 100 | Cut [Video</p>
        <p>File], Retrieved from https://www.youtube.com/watch?v=vv2qnoUfjPU, 03.01.2017.
[17] [Breeze Woodson], 30 EMOTIONS | Breeze Woodson
https://www.youtube.com/watch?v=y2YUMPJATmg, 19.08.2016.
[Video</p>
        <p>File],</p>
      </sec>
      <sec id="sec-5-3">
        <title>Retrieved from</title>
        <p>[18] M. Egger, M. Ley, and S. Hanke, Emotion Recognition from Physiological Signal Analysis: A Review,</p>
        <p>Electron. Notes Theor. Comput. Sci., 2019.
[19] J. Garcia-Garcia, V. Penichet, and M. Lozano, Emotion detection: a technology review, XVIII
International Conference, 2017.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kessous</surname>
          </string-name>
          , G. Castellano and
          <string-name>
            <given-names>G.</given-names>
            <surname>Caridakis</surname>
          </string-name>
          ,
          <article-title>Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis</article-title>
          ,
          <source>J. Multimodal User Interfaces</source>
          , vol.
          <volume>3</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>3348</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Maaoui</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Pruski</surname>
          </string-name>
          ,
          <article-title>Emotion recognition through physiological signals for human-machine communication</article-title>
          ,
          <source>Cut. Edge Robot</source>
          ., pp.
          <fpage>317333</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ekman</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. V.</given-names>
            <surname>Friesen</surname>
          </string-name>
          ,
          <article-title>Constants across cultures in the face and emotion</article-title>
          ,
          <source>J. Pers. Soc. Psychol.</source>
          , vol.
          <volume>17</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>124129</fpage>
          ,
          <year>1971</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wundt</surname>
          </string-name>
          , Principles of physiological psychology,
          <year>1873</year>
          . East Norwalk,
          <string-name>
            <given-names>CT</given-names>
            , US:
            <surname>Appleton-CenturyCrofts.</surname>
          </string-name>
          ,
          <year>1948</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Posner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Russell</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Peterson</surname>
          </string-name>
          ,
          <article-title>The circumplex model of a ect: An integrative approach to a ective neuroscience, cognitive development, and psychopathology</article-title>
          , Dev. Psychopathol., vol.
          <volume>17</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>715734</fpage>
          ,
          <string-name>
            <surname>Sep</surname>
          </string-name>
          .
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sacharin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Schlegel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Scherer</surname>
          </string-name>
          ,
          <source>Geneva Emotion Wheel rating study (Report), Soc. Sci. Inf</source>
          .,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Marchewka</surname>
          </string-name>
          , .
          <string-name>
            <surname>Zurawski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Jednorg</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Grabowska</surname>
          </string-name>
          ,
          <article-title>The Nencki A ective Picture System (NAPS): Introduction to a novel, standardized, wide-range, high-quality, realistic picture database</article-title>
          ,
          <source>Behav. Res. Methods</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Marchewka</surname>
          </string-name>
          , .
          <string-name>
            <surname>Zurawski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Jednorg</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Grabowska</surname>
          </string-name>
          ,
          <article-title>The Nencki A ective Picture System (NAPS): Introduction to a novel, standardized, wide-range, high-quality, realistic picture database</article-title>
          ,
          <source>Behav. Res. Methods</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Morris</surname>
          </string-name>
          , Observations: SAM:
          <article-title>The self-assessment manikin: An e cient cross-cultural measurement of emotional response</article-title>
          ,
          <source>Journal of Advertising Research</source>
          , vol.
          <volume>35</volume>
          , no.
          <issue>6</issue>
          . Advertising Research Foundation, US, pp.
          <fpage>6368</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.R.</given-names>
            <surname>Livingstone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.A.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <article-title>The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English</article-title>
          ,
          <source>PLoS ONE 13</source>
          (
          <issue>5</issue>
          ): e0196391,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mazzoni</surname>
          </string-name>
          ,
          <article-title>Audacity 2.3.2 recording and editing software, copyright 1999-2019 Audacity Team</article-title>
          , Web site: https://audacityteam.org/,
          <article-title>free software distributed under the terms of the GNU General Public License</article-title>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jerritta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Murugappan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nagarajan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <article-title>Physiological signals based human emotion Recognition: a review</article-title>
          ,
          <source>in 2011 IEEE 7th International Colloquium on Signal Processing and its Applications</source>
          , pp.
          <fpage>410415</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wac</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Tsiourti</surname>
          </string-name>
          ,
          <article-title>Ambulatory assessment of a ect: Survey of sensor systems for monitoring of autonomic nervous systems activation in emotion</article-title>
          ,
          <source>IEEE Trans. A ect. Comput.</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lundqvist</surname>
          </string-name>
          ,
          <article-title>Emotional responses to music: experience, expression</article-title>
          , and physiology,
          <source>Psychology of Music</source>
          , vol.
          <volume>37</volume>
          , no.
          <issue>1</issue>
          , p.
          <fpage>61</fpage>
          -
          <lpage>90</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Larsen</surname>
          </string-name>
          ,
          <article-title>E ects of positive and negative a ect on electromyographic activity over zygomaticus major and corrugator supercilii</article-title>
          ,
          <source>Psychophysiology</source>
          , vol.
          <volume>40</volume>
          , no.
          <issue>5</issue>
          , p.
          <fpage>776</fpage>
          -
          <lpage>785</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hazlett</surname>
          </string-name>
          ,
          <article-title>Measuring Emotional Valence during Interactive Experiences: Boys at Video Game Play</article-title>
          ,
          <source>CHI 2006 Proceedings</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Mehmood</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A novel feature extraction method based on late positive potential for emotion recognition in human brain signal patterns, Comput</article-title>
          . Electr. Eng.,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>F.</given-names>
            <surname>Agra oti</surname>
          </string-name>
          , D. Hatzinakos, and
          <string-name>
            <surname>A. K. Anderson</surname>
          </string-name>
          ,
          <article-title>ECG pattern analysis for emotion detection</article-title>
          ,
          <source>IEEE Trans. A ect. Comput.</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>A.</given-names>
            <surname>Haag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goronzy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schaich</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>Emotion Recognition Using Bio-sensors: First Steps towards an Automatic System</article-title>
          ,
          <source>Tutorial and research workshop on a ective dialogue systems</source>
          , p.
          <fpage>36</fpage>
          -
          <lpage>48</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Hossain</surname>
          </string-name>
          , G. Muhammad,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Mohammed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Song</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Al-Mutib</surname>
          </string-name>
          ,
          <article-title>Audio-Visual Emotion Recognition Using Big Data Towards 5G</article-title>
          ,
          <source>Mobile Networks and Applications</source>
          , vol.
          <volume>5</volume>
          , Springer New York LLC, p.
          <volume>753</volume>
          {
          <issue>763</issue>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zong</surname>
          </string-name>
          <article-title>, Multi-cue fusion for emotion recognition in the wild</article-title>
          ,
          <source>Neurocomputing</source>
          , vol.
          <volume>309</volume>
          ,
          <string-name>
            <surname>Elsevier</surname>
            <given-names>B.V.</given-names>
          </string-name>
          , p.
          <fpage>27</fpage>
          -
          <lpage>35</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wen</surname>
          </string-name>
          <article-title>, Multi-scale temporal modeling for dimensional emotion recognition in video</article-title>
          ,
          <source>AVEC 2014 - Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, Workshop of MM</source>
          <year>2014</year>
          ,
          <article-title>Association for Computing Machinery</article-title>
          , Inc, p.
          <fpage>11</fpage>
          -
          <lpage>18</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>C.</given-names>
            <surname>Shan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gong</surname>
          </string-name>
          and
          <string-name>
            <given-names>P. W.</given-names>
            <surname>McOwan</surname>
          </string-name>
          ,
          <article-title>Facial expression recognition based on Local Binary Patterns: A comprehensive study</article-title>
          ,
          <source>Image and Vision Computing</source>
          , Elsevier Ltd, p.
          <fpage>803</fpage>
          -
          <lpage>816</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kanade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cohn</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <article-title>Comprehensive database for facial expression analysis</article-title>
          ,
          <source>in: IEEE International Conference on Automatic Face &amp; Gesture Recognition (FG)</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Kahou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bouthillier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lamblin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gulcehre</surname>
          </string-name>
          , V. Michalski, K. Konda,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Jean et</article-title>
          . al,
          <article-title>EmoNets: Multimodal deep learning approaches for emotion recognition in video</article-title>
          ,
          <source>arXiv 1503</source>
          .
          <year>01800</year>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tivatansakul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ohkura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Puangpontip</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Achalakul</surname>
          </string-name>
          ,
          <article-title>Emotional healthcare system: Emotion detection by facial expressions using Japanese database, 2014 6th Computer Science</article-title>
          and Electronic Engineering Conference, CEEC 2014 - Conference Proceedings,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>