<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Impact of implicit and explicit affective labeling on a recommender system's performance</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Ljubljana Faculty of electrical engineering</institution>
          ,
          <addr-line>Trˇzaˇska 25, 1000 Ljubljana, Sovenia</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Affective labeling of multimedia content can be useful in recommender systems. In this paper we compare the effect of implicit and explicit affective labeling in an image recommender system. The implicit affective labeling method is based on an emotion detection technique that takes as input the video sequences of the users' facial expressions. It extracts Gabor low level features from the video frames and employs a kNN machine learning technique to generate affective labels in the valence-arousal-dominance space. We performed a comparative study of the performance of a content-based recommender (CBR) system for images that uses three types of metadata to model the users and the items: (i) generic metadata, (ii) explicitly acquired affective labels and (iii) implicitly acquired affective labels with the proposed methodology. The results showed that the CBR performs best when explicit labels are used. However, implicitly acquired labels yield a significantly better performance of the CBR than generic metadata while being an unobtrusive feedback tool.</p>
      </abstract>
      <kwd-group>
        <kwd>content-based recommender system</kwd>
        <kwd>affective labeling</kwd>
        <kwd>emotion detection</kwd>
        <kwd>facial expressions</kwd>
        <kwd>affective user modeling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Problem statement and proposed solution</title>
      <p>
        Each of the two approaches for affective labeling, explicit and implicit, has its
pros and cons. The explicit approach provides unambiguous labels but Pant
        <xref ref-type="bibr" rid="ref3">ic
and Vinciarelli [2009</xref>
        ] argue that the truthfulness of such labels is questionable as
users can be driven by different motives (egoistic labeling, reputation-driven
labeling and asocial labeling). Another drawback of the explicit labeling approach
is the intrusiveness of the process. On the other hand implicit affective labeling
is completely unobtrusive and harder to be cheated by the user. Unfortunately
the accuracy of the algorithms that detect affective responses might be too low
and thus yield ambiguous/inaccurate labels.
      </p>
      <p>Given the advantages of implicit labeling over explicit there is a need to
assess the impact of the low emotion detection accuracy on the performance of
recommender systems.</p>
      <p>In this paper we compare the performance of a CBR system using explicit
affective labeling vs. the proposed implicit affective labeling. The baseline
results of the CBR with explicit affective labeling are those published in Tkalˇciˇc
et al. [2010a]. The comparative results of the implicit affective labeling are
obtained using the same CBR procedure as in Tkalˇciˇc et al. [2010a], the same
user interaction dataset [Tkalˇciˇc et al., 2010c] but with affective labels acquired
implicitly.
1.2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        As anticipated by Pant
        <xref ref-type="bibr" rid="ref3">ic and Vinciarelli [2009</xref>
        ], affective labels are supposed to
be useful in content retrieval applications. Work related to this paper is divided
in (i) the acquisition of affective labels and (ii) the usage of affective labels.
      </p>
      <p>
        The acquisition of explicit affective labels is usually performed through an
application with a graphical user interface (GUI) where users consume the
multimedia content and provide appropriate labels. An example of such an application
is the one developed by Eckhardt and P
        <xref ref-type="bibr" rid="ref3">icard [2009</xref>
        ].
      </p>
      <p>
        On the other hand, the acquisition of implicit affective labels is usually
reduced to the problem of non-intrusive emotion detection. Various modalities are
used, such as video of users’ faces, voice or physiological sensors (heartbeat,
galvanic skin res
        <xref ref-type="bibr" rid="ref11">ponse etc.) [Picard and Daily, 2005</xref>
        ]. A good overview of such
methods
        <xref ref-type="bibr" rid="ref3">is given in Zeng et al. [2009</xref>
        ]. In our work we use implicit affective
labeling from videos of users’ faces. Generally, the approach taken in related work in
automatic detection of emotions from video clips of users’ faces is composed of
three stages: (i) pre-processing, (ii) low level features extraction and (iii)
classification. Related work differ mostly in the last two stages. Bartlett et al. [2006],
Wang and Guan [2008], Zhi and
        <xref ref-type="bibr" rid="ref24">Ruan [2008</xref>
        ] used Gabor wavelets based
features for emotion detection. Beside these, which a
        <xref ref-type="bibr" rid="ref24">re mostly used, Zhi and Ruan
[2008</xref>
        ] report the usage of other facial features in related work: active appearance
models (AAM), action units, various facial points and motion units, Haar based
features and textures. Various classification schemes were used successfully in
video emotion detection. Bartlett et al. [2006] employed both the Support
Vector Machine (SVM) and AdaBoost classifie
        <xref ref-type="bibr" rid="ref24">rs. Zhi and Ruan [2008</xref>
        ] used the
knearest neighbours (k-NN) algorithm. Before using the classifier they performed
a dimensionality reduction step using the locality preserving projection (LPP)
technique. In thei
        <xref ref-type="bibr" rid="ref24">r work, Wang and Guan [2008</xref>
        ] compared four classifiers: the
Gaussian Mixture Model (GMM), the k-NN, neural networks (NN) and Fisher’s
Linear Discriminant Analysis (FLDA). The latter turned out to yield the best
performance. T
        <xref ref-type="bibr" rid="ref9">he survey Zeng et al. [2009</xref>
        ] reports the use of other classifiers
like the C4.5, Bayes Net and rule based class
        <xref ref-type="bibr" rid="ref3">ifiers. Joho et al. [2009</xref>
        ] used an
emotion detection techique that uses video sequences of users’ face expressions
to provide affective labels for video content.
      </p>
      <p>
        Another approach is to extract affective labels directly from the content
itself, without observin
        <xref ref-type="bibr" rid="ref1 ref2">g the users. Hanjalic and Xu [2005</xref>
        ] used low level features
extracted from the audio track of video clips to identify moments in video
sequences that induce high arousal in viewers.
      </p>
      <p>
        In contrast to emotion detection techniques the usage of affective labels for
information retrieval has only recently started to gain attention. Chen et al.
[2008] developed the EmoPlayer which has a similar user interface to the tool
developed by Eckhardt and P
        <xref ref-type="bibr" rid="ref3">icard [2009</xref>
        ] but with a reversed functionality: it
assists users to find specific scenes in a video sequence. Soleyman
        <xref ref-type="bibr" rid="ref3">i et al. [2009</xref>
        ]
built a collaborative filtering system that retrieves video clips based on affective
queries. Similarly, but for mus
        <xref ref-type="bibr" rid="ref3">ic content, Shan et al. [2009</xref>
        ] have developed a
system that performs emotion based quer
        <xref ref-type="bibr" rid="ref3">ies. Arapakis et al. [2009</xref>
        ] built a complete
video recommender system that detects the users’ affective state and provides
recommended content. K
        <xref ref-type="bibr" rid="ref3">ierkels and Pun [2009</xref>
        ] used physiological sensors (ECG
and EEG) to implicitly detect the emotive responses of users. Based on implicit
affective labels they observed an increase of content retrieval accuracy compared
to explicit affective labels. Tkalˇciˇc et al. [2010a] have shown that the usage of
affective labels significantly improves the performance of a recommender system
over generic labels.
2
2.1
      </p>
      <sec id="sec-2-1">
        <title>Affective modeling in CBR systems</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Emotions during multimedia items consumption</title>
      <p>In a multimedia consumption scenario a user is watching multimedia content.
During the consumption of multimedia content (images in our case), the emotive
state of a user is continuously changing between different emotive states ǫj ∈ E,
as different visual stimuli hi ∈ H induce these emotions (see Fig. 1). The facial
expressions of the user are being continuously monitored by a video camera for
the purpose of the automatic detection of the emotion expressions.</p>
      <p>The detected emotion expressions of the users, along with the ratings given
to the content items, can be used in two ways: (i) to model the multimedia
content item (e.g. the multimedia item hi is funny - it induces laughter in most
of the viewers) and (ii) to model individual users (e.g. the user u likes images
that induce fear).</p>
      <p>E
ǫ4
ǫ3
ǫ2
ǫ1
ǫN</p>
      <p>tT
t(h1)
t(h2)
t(h3)
t(h4)
t
Item modeling with affective metadata We use the valence-arousal-dominance
(VAD) emotive space for describing the users’ emotive reactions to images. In the
VAD space each emotive state is described by three parameters, namely valence,
arousal and dominance. A single user u ∈ U consumes one or more content items
(images) h ∈ H. As a consequence of the image h being a visual stimulus, the
user u experiences an emotive response which we denote as er(u, h) = (v, a, d)
where v, a and d are scalar values that represent the valence, arousal and
dominance dimensions of the emotive response er. The set of users that have watched
a single item h are denoted with Uh. The emotive responses of all users Uh, that
have watched the item h form the set ERh = {er(u, h) : u ∈ Uh}. We model
the image h with the item profile that is composed of the first two statistical
moments of the VAD values from the emotive responses ERh which yields the
six tuple</p>
      <p>V = (v¯, σv, a¯, σa, d¯, σd)
(1)
where v¯, a¯ and d¯ represent the average VAD values and σv, σa and σd
represent the standard deviations of the VAD values for the observed content item
h. An example of the affective item profile is shown in Tab. 1.</p>
      <p>User modeling with affective metadata The preferences of the user are
modeled based on the explicit ratings that she/he has given to the consumed
items. The observed user u rates each viewed item either as relevant or
nonrelevant. A machine learning (ML) algorithm is trained to separate relevant from
non-relevant items using the affective metadata in the item profiles as features
and the binary ratings (relevant/non-relevant) as classes. The user profile up(u)
of the observed user u is thus an ML algorithm dependent data structure. Fig.
2 shows an example of a user profile when the tree classifier C4.5 is being used.</p>
      <p>
        Value
We used our implementation of an emotion detection algorithm
        <xref ref-type="bibr" rid="ref13 ref19">(see Tkalˇciˇc
et al. [2010b])</xref>
        for implicit affective labeling and we compared the performance
of the CBR system that uses explicit vs. implicit affective labels.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Overview of the emotion detection algorithm for implicit affective labeling</title>
      <p>The emotion detection procedure used to give affective labels to the content
images involved three stages: (i) pre-processing, (ii) low level feature extraction
and (iii) emotion detection. We formalized the procedure with the mappings
I → Ψ → E
(2)
where I represents the frame from the video stream, Ψ represents the low
level features corresponding to the frame I and E represents the emotion
corresponding to the frame I.</p>
      <p>
        In the pre-processing stage we extracted and registered the faces from the
video frames to allow precise low level feature extraction. We used the eye tracker
developed by Valent
        <xref ref-type="bibr" rid="ref3">i et al. [2009</xref>
        ] to extract the locations of the eyes. The
detection of emotions from frames in a video stream was performed by comparing
the current video frame It of the user’s face to a neutral face expression. As the
LDOS-PerAff-1 database is an ongoing video stream of users consuming
different images we averaged all the frames to get the neutral frame. This method is
applicable when we have a non supervised video stream of a user with different
face expressions.
      </p>
      <p>The low level features used in the proposed method were drawn from the
images filtered by a Gabor filter bank. We used a bank of Gabor filters of 6
different orientation and 4 different spatial sub-bands which yielded a total of
24 Gabor filtered images per frame. The final feature vector had the total length
of 240 elements.</p>
      <p>The emotion detection was done by a k-NN algorithm after performing
dimensionality reduction using the principal component analysis (PCA).</p>
      <p>Each frame from the LDOS-PerAff-1 dataset was labeled with a six tuple of
the induced emotion V . The six tuple was composed of scalar values representing
the first two statistical moments in the VAD space. However, for our purposes
we opted for a coarser set of emotional classes ǫ ∈ E. We divided the whole VAD
space into 8 subspaces by thresholding each of the three first statistical moments
v¯, a¯ and d¯. We thus gained 8 rough classes. Among these, only 6 classes actually
contained at least one item so we reduced the emotion detection problem to a
classification into 6 distinct classes problem as shown in Tab. 2.
centroid values
class E v¯ a¯ d¯ v a d
ǫ1 v¯ &gt; 0 a¯ &lt; 0 d¯ &lt; 0 0.5 −0.5 −0.5
ǫ2 v¯ &lt; 0 a¯ &gt; 0 d¯ &lt; 0 −0.5 0.5 −0.5
ǫ3 v¯ &gt; 0 a¯ &gt; 0 d¯ &lt; 0 0.5 0.5 −0.5
ǫ4 v¯ &lt; 0 a¯ &lt; 0 d¯ &gt; 0 −0.5 −0.5 0.5
ǫ5 v¯ &gt; 0 a¯ &lt; 0 d¯ &gt; 0 0.5 −0.5 0.5
ǫ6 v¯ &gt; 0 a¯ &gt; 0 d¯ &gt; 0 0.5 0.5 0.5
Our scenario consisted in showing end users a set of still color images while
observing their facial expressions with a camera. These videos were used for implicit
affective labeling. The users were also asked to give explicit binary ratings to
the images. They were instructed to select images for their computer wallpapers.
The task of the recommender system was to select the relevant items for each
user as accurate as possible. This task falls in the category find all good items for
the recommender systems’ tasks taxonomy proposed by Herlocker et al. [2004].</p>
      <p>
        The set of images h ∈ H that the users were consuming, had a twofold
meaning: (i) they were used as content items and (ii) they were used as emotion
induction stimuli for the affective labeling algorithm. We used a subset of 70
images from the IAPS dataset Lan
        <xref ref-type="bibr" rid="ref1 ref2">g et al. [2005</xref>
        ]. The IAPS dataset of images
is annotated with the mean and standard deviations of the emotion responses
in the VAD space which was useful as the ground truth in the affective labeling
part of the experiment.
      </p>
      <p>The affective labeling algorithm described in Sec. 3.1 yielded rough classes in
the VAD space. In order to build the affective item profiles we used the classes’
centroid values (see Tab. 2) in the calculation of the first two statistical moments.
We applied the procedure from Sec. 2.2.</p>
      <p>We had 52 users taking part in our experiment (mean = 18.3 years, 15 males).
3.3</p>
    </sec>
    <sec id="sec-5">
      <title>Affective CBR system evaluation methodology</title>
      <p>The results of the CBR system were the confusion matrices of the classification
procedure that mapped the images H into one of the two possible classes:
relevant or non-relevant class. From the confusion matrices we calculated the recall,
precision and F measure as defined in Herlocker et al. [2004].</p>
      <p>
        We also compared the performances of the CBR system with three types of
metadata: (i) generic metadata
        <xref ref-type="bibr" rid="ref13 ref18 ref19 ref20">(genre and watching time as done by Tkalˇciˇc
et al. [2010a])</xref>
        , (ii) affective metadata given explicitly and (iii) affective
metadata acquired implicitly with the proposed emotion detection algorithm. For
that purpose we transferred the statistical testing of the confusion matrices into
the testing for the equivalence of two estimated discrete probability distributions
[L
        <xref ref-type="bibr" rid="ref12">ehman and Romano, 2005</xref>
        ]. To test the equivalence of the underlying
distributions we used the Pearson χ2 test. In case of significant differences we used
the scalar measures precision, recall and F measure to see which approach was
significantly better.
4
      </p>
      <sec id="sec-5-1">
        <title>Results</title>
        <p>We compared the performance of the classification of items into relevant or
non relevant through the confusion matrices in the following way: (i) Explicitly
acquired affective metadata vs Implicitly acquired metadata, (ii) explicitly
acquired metadata vs. generic metadata and (iii) implicitly acquired metadata vs.
generic metadata. In all three cases the p value was p &lt; 0.01. Table 3 shows the
scalar measures precision, recall and F measures for all three approaches.
As we already reported in Tkalˇciˇc et al. [2010b], the application of the emotion
detection algorithm on spontaneous face expression videos has a low
performance. We identified three main reasons for that: (i) weak supervision in
learning, (ii) non-optimal video acquisition and (iii) non-extreme facial expressions.</p>
        <p>In supervised learning techniques there is ground truth reference data to
which we compare our model. In the induced emotion experiment the ground
truth data is weak because we did not verify whether the emotive response of
the user equals to the predicted induced emotive response.</p>
        <p>Second, the acquisition of video of users’ expressions in real applications takes
place in less controlled environments. The users change their position during the
session. This results in head orientation changes, size of the face changes and
changes of camera focus. All these changes require a precise face tracker that
allows for fine face registration. Further difficulties are brought by various face
occlusions and changing lighting conditions (e.g. a light can be turned on or off,
the position of the curtains can be changed etc.) which confuse the face tracker.
It is important that the face registration is done in a precisely manner to allow
the detection of changes in the same areas of the face.</p>
        <p>The third reason why the accuracy drops is the fact that face expressions in
spontaneous videos are less extreme than in posed videos. As a consequence the
changes on the faces are less visible and are hidden in the overall noise of the face
changes. The dynamics of face expressions depend on the emotion amplitude as
well as on the subjects’ individual differences.</p>
        <p>The comparison of the performance of the CBR with explicit vs. implicit
affective labeling shows significant differences regardless of the ML technique
employed to predict the ratings. The explicit labeling yields superior CBR
performance than the implicit labeling. However, another comparison, that between
the implicitly acquired affective labels and generic metadata (genre and
watching time) shows that the CBR with implicit affective labels is significantly better
than the CBR with generic metadata only. Although not as good as explicit
labeling, the presented implicit labeling technique brings additional value to the
CBR system used.</p>
        <p>The usage of affective labels is not present in state-of-the-art commercial
recommender systems, to the best of the authors’ knowledge. The presented
approach allows to upgrade an existing CBR system by adding the
unobtrusive video acquisition of users’ emotive responses. The results showed that the
inclusion of affective metadata, although acquired with a not-so-perfect
emotion detection algorithm, significantly improves the quality of the selection of
recommended items. In other words, although there is a lot of noise in the
affective labels acquired with the proposed method, these labels still describe more
variance in users’ preferences than the generic metadata used in state-of-the-art
recommender systems.
5.1</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Pending issues and future work</title>
      <p>The usage of affective labels in recommender systems has not reached a
production level yet. There are several open issues that need to be addressed in the
future.</p>
      <p>
        The presented work was verified on a sample of 52 users of a narrow age and
social segment and on 70 images as content items. The sample size is not big but
it is in line with sample sizes used in related work [Arapak
        <xref ref-type="bibr" rid="ref3">is et al., 2009</xref>
        , Jo
        <xref ref-type="bibr" rid="ref9">ho
et al., 2009</xref>
        , K
        <xref ref-type="bibr" rid="ref3">ierkels and Pun, 2009</xref>
        ]. Although we correctly used the statistical
tests and verified the conditions before applying the tests a repetition of the
experiment on a larger sample of users and content items would increase the
strength of the results reported.
      </p>
      <p>
        Another aspect of the sample size issue is the impact of the size on the
ML techniques used. The sample size in the emotion detection algorithm (the
kNN classifier) is not problematic. It is, however, questionable the sample size
used in the CBR. In the ten fold cross validation scheme we used 63 items for
training the model and seven for testing. Although it appears that this is small, a
comparison with other recommender system reveals that this is a common issue,
and is usually referred as the sparsity problem. It occurs when, even if there are
lots of users and lots of items, each user usually rated only few items and there
are few data to build the models u
        <xref ref-type="bibr" rid="ref11">pon [Adomavicius and Tuzhilin, 2005</xref>
        ].
      </p>
      <p>The presented work also lacks a further user satisfaction study. Besides just
aiming at the prediction of user ratings for unseen items research should also
focus on the users’ satisfaction with the list of recommended items.</p>
      <p>But the most important thing to do in the future is to improve the
emotion detection algorithms used for implicit affective labeling. In the ideal case,
the perfect emotion detection algorithm would yield CBR performance that is
identical to the CBR performance with explicit labeling.</p>
      <p>The acquisition of video of users raises also privacy issues that need to be
addressed before such a system can go in production.</p>
      <p>
        Last, but not least, we believe that implicit affective labeling should be
complemented with context modeling to provide better predictions of users’
preferences. In fact, emotional responses of users and their tendencies to seek one
kind of emotion over another, is tightly connected with the context where the
items are consumed. Several investigations started to explore the influence of
various contextual parameters, like being alone or being in company, on the
users’ pr
        <xref ref-type="bibr" rid="ref12">eferences [Adomavicius et al., 2005</xref>
        , Odi´c et al., 2010]. We will include
this information in our future affective user models.
6
      </p>
      <sec id="sec-6-1">
        <title>Conclusion</title>
        <p>We performed a comparative study of a CBR system for images that uses three
types of metadata: (i) explicit affective labels, (ii) implicit affective labels and (iii)
generic metadata. Although the results showed that the explicit labels yielded
better recommendations than implicit labels, the proposed approach significantly
improves the CBR performance over generic metadata. Because the approach is
unobtrusive it is feasible to upgrade existing CBR systems with the proposed
solution. The presented implicit labeling technique takes as input video sequences
of users’ facial expressions and yields affective labels in the VAD emotive space.
We used Gabor filtering based low level features, PCA for dimensionality
reduction and the kNN classifier for affective labeling.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Acknowledgement</title>
        <p>This work was partially funded by the European Commission within the FP6
IST grant number FP6-27312 and partially by the Slovenian Research Agency
ARRS. All statements in this work reflect the personal ideas and opinions of the
authors and not necessarily the opinions of the EC or ARRS.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Adomavicius</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Tuzhilin</surname>
          </string-name>
          .
          <article-title>Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>17</volume>
          (
          <issue>6</issue>
          ):
          <fpage>734</fpage>
          -
          <lpage>749</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Adomavicius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sankaranarayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Tuzhilin</surname>
          </string-name>
          .
          <article-title>Incorporating contextual information in recommender systems using a multidimensional approach</article-title>
          .
          <source>ACM Transactions on Information Systems (TOIS)</source>
          ,
          <volume>23</volume>
          (
          <issue>1</issue>
          ):
          <fpage>103</fpage>
          -
          <lpage>145</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>I.</given-names>
            <surname>Arapakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Moshfeghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Joho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hannah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.M.</given-names>
            <surname>Jose</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Gardens</surname>
          </string-name>
          .
          <article-title>Integrating facial expressions into user profiling for the improvement of a multimodal recommender system</article-title>
          .
          <source>In Proc. IEEE Int'l Conf. Multimedia &amp; Expo</source>
          , pages
          <fpage>1440</fpage>
          -
          <lpage>1443</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>M.S. Bartlett</surname>
            ,
            <given-names>G.C.</given-names>
          </string-name>
          <string-name>
            <surname>Littlewort</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lainscsek</surname>
            ,
            <given-names>I. Fasel</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>J.R.</given-names>
            <surname>Movellan</surname>
          </string-name>
          .
          <article-title>Automatic recognition of facial actions in spontaneous expressions</article-title>
          .
          <source>Journal of Multimedia</source>
          ,
          <volume>1</volume>
          (
          <issue>6</issue>
          ):
          <fpage>22</fpage>
          -
          <lpage>35</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Ling</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Gen-Cai Chen</surname>
            , Cheng-Zhe Xu,
            <given-names>Jack</given-names>
          </string-name>
          <string-name>
            <surname>March</surname>
            , and
            <given-names>Steve</given-names>
          </string-name>
          <string-name>
            <surname>Benford</surname>
          </string-name>
          .
          <article-title>Emoplayer: A media player for video clips with affective annotations</article-title>
          .
          <source>Interacting with Computers</source>
          ,
          <volume>20</volume>
          (
          <issue>1</issue>
          ):
          <fpage>17</fpage>
          -
          <lpage>28</lpage>
          ,
          <year>January 2008</year>
          . doi: http://dx.doi.org/10.1016/ j.intcom.
          <year>2007</year>
          .
          <volume>06</volume>
          .003. URL http://dx.doi.org/10.1016/j.intcom.
          <year>2007</year>
          .
          <volume>06</volume>
          .003.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Micah</given-names>
            <surname>Eckhardt</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rosalind</given-names>
            <surname>Picard</surname>
          </string-name>
          .
          <article-title>A more effective way to label affective expressions</article-title>
          .
          <source>2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          ,
          <year>September 2009</year>
          . doi:
          <volume>10</volume>
          .1109/ACII.
          <year>2009</year>
          .5349528. URL http://ieeexplore.ieee.org/lpdocs/ epic03/wrapper.htm?arnumber=
          <fpage>5349528</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Alan</given-names>
            <surname>Hanjalic</surname>
          </string-name>
          and
          <string-name>
            <surname>Li-Qun Xu</surname>
          </string-name>
          .
          <article-title>Affective video content representation and modeling</article-title>
          .
          <source>IEEE Transactions on Multimedia</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          ):
          <fpage>143</fpage>
          -
          <lpage>154</lpage>
          ,
          <year>February 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>J.L.</given-names>
            <surname>Herlocker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.G.</given-names>
            <surname>Terveen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.T.</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <article-title>Evaluating collaborative filtering recommender systems</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          ,
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <fpage>53</fpage>
          ,
          <year>January 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Joho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.M.</given-names>
            <surname>Jose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valenti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Sebe</surname>
          </string-name>
          .
          <article-title>Exploiting facial expressions for affective video summarisation</article-title>
          .
          <source>In Proceeding of the ACM International Conference on Image and Video Retrieval</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . ACM,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>J.J.M.</given-names>
            <surname>Kierkels</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Pun</surname>
          </string-name>
          .
          <article-title>Simultaneous exploitation of explicit and implicit tags in affect-based multimedia retrieval</article-title>
          .
          <source>In Affective Computing and Intelligent Interaction and Workshops</source>
          ,
          <year>2009</year>
          .
          <source>ACII</source>
          <year>2009</year>
          . 3rd International Conference on, pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . IEEE,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>P.J. Lang</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          <string-name>
            <surname>Bradley</surname>
            ., and
            <given-names>B.N.</given-names>
          </string-name>
          <string-name>
            <surname>Cuthbert</surname>
          </string-name>
          .
          <article-title>International affective picture system (iaps): Affective ratings of pictures and instruction manual</article-title>
          .
          <source>technical report a-6</source>
          .
          <source>Technical report</source>
          , University of Florida, Gainesville, FL,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Lehman</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.P.</given-names>
            <surname>Romano</surname>
          </string-name>
          . Testing Statistical Hypotheses. Springer Science + Business Inc.,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          Ante Odi´c, Matevˇz Kunaver, Jurij Tasiˇc, and Andrej Koˇsir.
          <article-title>Open issues with contextual information in existing recommender system databases</article-title>
          .
          <source>Proceedings of the IEEE ERK</source>
          <year>2010</year>
          , A:
          <fpage>217</fpage>
          -
          <lpage>220</lpage>
          ,
          <year>September 2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Pantic</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Vinciarelli</surname>
          </string-name>
          .
          <article-title>Implicit Human-Centered Tagging</article-title>
          .
          <source>IEEE Signal Processing Magazine</source>
          ,
          <volume>26</volume>
          (
          <issue>6</issue>
          ):
          <fpage>173</fpage>
          -
          <lpage>180</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Rosalind</given-names>
            <surname>Picard</surname>
          </string-name>
          and
          <article-title>Shaundra Briant Daily</article-title>
          .
          <article-title>Evaluating affective interactions: Alternatives to asking what users feel</article-title>
          .
          <source>In CHI Workshop on Evaluating Affective Interfaces: Innovative Approaches</source>
          , Portland,
          <string-name>
            <surname>OR</surname>
          </string-name>
          ,
          <year>April 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Man-Kwan</surname>
            <given-names>Shan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang-Fei</surname>
            <given-names>Kuo</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng-Fen Chiang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Suh-Yin Lee</surname>
          </string-name>
          .
          <article-title>Emotionbased music recommendation by affinity discovery from film music</article-title>
          .
          <source>Expert Syst. Appl.</source>
          ,
          <volume>36</volume>
          (
          <issue>4</issue>
          ):
          <fpage>7666</fpage>
          -
          <lpage>7674</lpage>
          ,
          <year>2009</year>
          . ISSN 0957-
          <fpage>4174</fpage>
          . doi: http://dx.doi.org/ 10.1016/j.eswa.
          <year>2008</year>
          .
          <volume>09</volume>
          .042.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Mohammad</given-names>
            <surname>Soleymani</surname>
          </string-name>
          , Jeremy Davis, and
          <string-name>
            <given-names>Thierry</given-names>
            <surname>Pun</surname>
          </string-name>
          .
          <article-title>A collaborative personalized affective video retrieval system</article-title>
          .
          <source>2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          ,
          <year>September 2009</year>
          . doi:
          <volume>10</volume>
          .1109/ACII.
          <year>2009</year>
          .5349526. URL http://ieeexplore. ieee.org/lpdocs/epic03/wrapper.htm?arnumber=
          <fpage>5349526</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Marko</surname>
            <given-names>Tkalˇciˇc</given-names>
          </string-name>
          , Urban Burnik, and Andrej Koˇsir.
          <article-title>Using affective parameters in a content-based recommender system</article-title>
          .
          <source>User Modeling</source>
          and
          <string-name>
            <surname>User-Adapted</surname>
            <given-names>Interaction</given-names>
          </string-name>
          :
          <source>The Journal of Personalization Research</source>
          ,
          <volume>20</volume>
          (
          <issue>4</issue>
          ),
          <year>2010a</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Marko</surname>
            <given-names>Tkalˇciˇc</given-names>
          </string-name>
          , Ante Odi´c, Andrej Koˇsir, and
          <article-title>Jurij Tasiˇc. Comparison of an emotion detection technique on posed and spontaneous datasets</article-title>
          .
          <source>Proceedings of the IEEE ERK</source>
          <year>2010</year>
          ,
          <year>2010b</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Marko</surname>
            <given-names>Tkalˇciˇc</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurij</surname>
            <given-names>Tasiˇc</given-names>
          </string-name>
          , and
          <article-title>Andrej Koˇsir. The LDOS-PerAff-1 Corpus of Face Video Clips with Affective and Personality Metadata</article-title>
          . In Michael Kipp, editor,
          <source>Proceedings of the LREC 2010 Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality</source>
          ,
          <year>2010c</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Valenti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yucel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Gevers</surname>
          </string-name>
          .
          <article-title>Robustifying eye center localization by head pose cues</article-title>
          .
          <source>In IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2009</year>
          . URL http://www.science.uva.nl/research/publications/ 2009/ValentiCVPR2009.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Yongjin</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ling</given-names>
            <surname>Guan</surname>
          </string-name>
          .
          <article-title>Recognizing human emotional state from audiovisual signals</article-title>
          .
          <source>IEEE Transactionson multimedia</source>
          ,
          <volume>10</volume>
          (
          <issue>5</issue>
          ):
          <fpage>936</fpage>
          -
          <lpage>946</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Zhihong</given-names>
            <surname>Zeng</surname>
          </string-name>
          , Maja Pantic,
          <string-name>
            <given-names>Glenn I. Roisman</given-names>
            , and
            <surname>Thomas</surname>
          </string-name>
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>A survey of affect recognition methods: Audio, visual, and spontaneous expressions</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          , IEEE Transactions on,
          <volume>31</volume>
          (
          <issue>1</issue>
          ):
          <fpage>39</fpage>
          -
          <lpage>58</lpage>
          , Jan.
          <year>2009</year>
          . ISSN 0162-
          <fpage>8828</fpage>
          . doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2008</year>
          .
          <volume>52</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ruan</surname>
          </string-name>
          .
          <article-title>Facial expression recognition based on two-dimensional discriminant locality preserving projections</article-title>
          .
          <source>Neurocomputing</source>
          ,
          <volume>71</volume>
          (
          <issue>7-9</issue>
          ):
          <fpage>1730</fpage>
          -
          <lpage>1734</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>