<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Afective Computing and Bandits: Capturing Context in Cold Start Situations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sebastian Oehme</string-name>
          <email>sebastian.oehme@tum.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Linus W. Dietz</string-name>
          <email>linus.dietz@tum.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, Technical University of Munich</institution>
          ,
          <addr-line>Garching</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Munich School of Engineering, Technical University of Munich</institution>
          ,
          <addr-line>Garching</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>The cold start problem describes the initial phase of a collaborative recommender where the quality of recommendation is low due to an insuficient number of ratings. Overcoming this is crucial because the system's adoption will be impeded by low recommendation quality. In this paper, we propose capturing context via computer vision to improve recommender systems in the cold start phase. Computer vision algorithms can derive stereotypes such as gender or age, but also the user's emotions without explicit interaction. We present an approach based on the statistical framework of bandit algorithms to incorporate stereotypic information and afective reactions into the recommendation. In a preliminary evaluation in a lab study with 21 participants, we already observe an improvement of the number of positive ratings. Furthermore, we report additional findings of experimenting with afective computing for recommender systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Recommender systems (RS) match items to users, therefore the
accuracy of recommendations is highly dependent on the quality
of information the system has about these. Collaborative
filtering (CF) has frequently been used if the items’ characteristics are
unknown or it is costly to derive them. CF systems are, however,
not suited for scenarios where the user is anonymous and interacts
with the RS only for a short period. For example, a smart display
inside a fashion store could provide recommendations, however,
the interaction will be brief and tentative. In such cold start
scenarios, literature suggests including context and stereotypes into
the recommendations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. If the weather is hot, suggest bathing
attire; a male customer will need shorts instead of a bikini.
Motivated by this kind of a scenario, we develop an afective RS [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
based on stereotypes derived via computer vision with little user
collaboration. Our research was guided by the following questions:
RQ 1: How can stereotypic information be incorporated into a RS?
RQ 2: Can facial classification and afective reactions be a
surrogate for explicit feedback?
      </p>
      <p>In the following section, we describe the foundations of our RS:
bandit strategies and facial classification using computer vision.
2</p>
    </sec>
    <sec id="sec-2">
      <title>FOUNDATIONS</title>
      <p>
        Ever since Grundy [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], it has been known that using stereotypic
information can be used to model users [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and thereby improve
recommendation accuracy. Driven by our research questions, we
discuss a combination of two concepts applied for recommender
systems: contextual bandits and facial classification using computer
vision.
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Bandit Strategies</title>
      <p>In real-world applications, recommendations are often linked to a
reward. For example, the purpose of recommendations in a shop
is to improve revenue by suggesting products to customers that
they are more likely to buy. However, calculating the probabilities
of a successful recommendation directly is usually not possible
due to a lack of information about the customer’s taste and the
attractiveness of items.</p>
      <p>
        Bandit strategies provide a computational framework that trades
of profit-maximization via items that are known to sell well and
experimentation with items whose potential is yet to be determined.
The terminology stems from the probability theory of gambling [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
A gambler at a row of one-armed bandits (slot machines) has to
decide based on incomplete knowledge: what arm to play, how often
to pull and when to play [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. A bandit recommender engine seeks
to find the right balance between experimenting with new
recommendations, i.e., exploration, and exploiting items that are already
known to have a high chance of reward. A classic algorithm for
handling exploration vs. exploitation is the ε-Greedy algorithm [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
It chooses with a probability of ε to either exploit the best available
arm at the moment or to randomly explore any other arm. In cold
start situations, however, a bandit recommender sufers similar
limitations as traditional methods, such as collaborative filtering.
This can be overcome by adding context information, e.g.,
demographic information [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to augment the bandit’s choice between
exploration or exploitation with more data. These types of bandit
strategies are referred to as contextual bandits. In contrast to the
ε-Greedy algorithm, they incorporate contextual information and
are able to choose their action based on the situation. The classic
algorithm is the Contextual-ε-Greedy strategy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. At each turn, it
compares the user’s situation (e.g., location, time, social activity)
to a set of high-level ‘critical situations’. If the situation is critical,
the algorithm exploits this by showing items that are known to
be well suited and similar. Consequently, it explores other items if
      </p>
      <p>
        Our model extends the approach of Bounefouf et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and
likewise proceeds in discrete trials t = 1 . . . T . At each t , the
following tasks are performed:
Task 1: Let U t be the current user’s profile and P the set of other
known user profiles. The system compares U t with the user profiles
in P in order to choose the most similar one, U P :
      </p>
      <p>U P = argmax(sim(U t , U c ))</p>
      <p>U c ∈P
Our adapted similarity metric is the weighted sum of the similarity
metrics for age, gender, and EF, the combination of emotions and
feedback. α , β, γ are weights associated with these metrics, defined
in the following subsection:
sim(U t , U c ) = α · sim(at , ac ) + β · sim(дt , дc ) + γ · EF
(2)</p>
      <p>
        EF , short for emotional feedback, corresponds to the sum of k
afective reactions simk (ekt , ekc ) ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] depending on equal
feedback simk (fkt , fkc ) ∈ {0, 1} of the current user with respect to other
users’ profiles. This feedback, called reward in the bandit
terminology, can be any explicit or implicit feedback to the item, e.g.,
the user’s rating or adding the item to the shopping basket. If the
feedback difers for an item, this item’s afective reaction will not
contribute to the sum, hence it will be 0. EF is normalized to the
number of items i which U t has seen so far.
the situation is not critical. It has been shown that the
Contextualε-Greedy algorithm generally achieves better click-through rates
than ε-Greedy algorithms or pure exploration.
      </p>
      <p>In our approach, we propose using facial classification through
the use of computer vision to infer age, gender and emotions as
contextual information within a contextual bandit algorithm.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Facial Classification</title>
      <p>
        Computer vision has already been used to improve systems situated
in public places. For example, Müller et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] described a system
for digital signage. However, this and similar early approaches
were ahead of their time: due to low face-detection accuracy, the
outcomes of these experiments were not significant. Computer
vision-based approaches analyze users’ faces frame by frame via
facial recognition software during an experimental task such as
watching videos. Zhao et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] drew afective cues from users’
afection changes. They used emotional changes to segment videos,
classified the video’s category and then presented
recommendations. Tkalčič et al. propose a framework for afective recommender
systems, where they distinguish between three phases of user
interaction: the entry, consumption, and exit stage [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The afective
cues drawn while watching content in the consumption stage are
compared to the emotional state in the entry phase. The exit stage
can simultaneously be the following entry stage when the next
item is recommended and the looped process continues.
Afective labeling of users’ faces has been applied e.g., to RSs [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and
commercials [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], where they show promising results in terms of
accuracy and user satisfaction.
      </p>
      <p>
        The accuracy of classification and the runtime performance of
computer vision algorithms have improved over the past years and
with YOLO [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the breakthrough to real time object detection has
been achieved. In emotion detection, the state-of-the-art algorithms
are closed source and only available using web APIs. Prominent
vendors like Microsoft Face1, Kairos2 and Afectiva 3 ofer RESTful
client libraries and respective pricing models. The centralization of
this technology to few market players that cloak their algorithms in
secrecy should be seen with concern. Nevertheless, it should also be
mentioned that such systems improve with the size of the training
set and enable researchers to work with this technology without
hardware requirements. In our recommender system, we use the
Microsoft Face service to detect the age, gender and emotions of our
test subjects. The Face Emotion Recognition API returns continuous
values [0;1] for the following emotions: anger, contempt, disgust,
fear, happiness, neutral, sadness, and surprise at a small cost of about
e1.40 per 1000 requests.
3
      </p>
    </sec>
    <sec id="sec-5">
      <title>CONTEXTUAL RECOMMENDER MODEL</title>
      <p>In our RS, the items are displayed to the user successively. While
the user inspects the items, she is observed by a camera whose
imagery is continuously analyzed by computer vision. In this
section, we first present how we incorporated computer vision into
the recommendation task, followed by the experimental setup and
our findings.
1https://azure.microsoft.com/en-us/services/cognitive-services/face/
2https://www.kairos.com/emotion-analysis-api
3https://www.afectiva.com/product/emotion-sdk/
(1)
(3)
EF =
Õ
k
simk (fkt , fkc ) · 1 + simk (ekt , ekc )
2i
Task 2: Let M be the set of items, Mt the items seen by the current
user U t and MP ∈ {M \Mt } the items recommended to the user U P ,
but not to U t . After retrieving MP , the system displays the next
item m ∈ MP to U t while observing the user’s afective reactions
during presentation.</p>
      <p>Task 3: After receiving the user’s reward, the algorithm refines its
item selection strategy with the new observation: user U P gives
item mP a binary reward. The expected reward for an item is the
average reward per total number of ratings n.</p>
      <p>
        Our adapted Contextual-ε-Greedy recommends items as follows:
argmax (expectedReward(m)) if q &gt; ε
m =  MP (4)
random ((M \ Mt )) otherwise

In Equation 4, the random variable q is responsible for the
exploration versus exploitation behavior. In our approach it is uniformly
distributed over [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ]. If q is larger than ε, the item with the highest
expected reward from MP = {m1, . . . , mP } will be selected, which
are all items rated by the most similar user. For this at least one
unseen and positively rated item by the past user is required. In
case all suitable items have been exploited or the current user is
the first user and hence no other user profiles exist, the algorithms
falls back to exploration, where random(M) selects a random item.
      </p>
      <p>To influence the original ε-Greedy algorithm with contextual
information, ε is computed by maximizing Equation 2, the similarity
of the current user’s profile U t to the profile U P of the most similar
other user:
ε = 1 − argmax(sim(U t , U c ))
(5)
The Contextual-ε-Greedy strategy is driven by the stereotypic
similarity of the current user to previously seen users. In this first
experiment, we used α = β = 0.25 and γ = 0.5 as weights for
Equation 2.</p>
      <p>Gender similarity is binary, due to output of the employed
facial classification algorithm. Either it matches, or it does not:
sim(дt , дc ) ∈ {0, 1}.</p>
      <p>
        Age similarity is more fuzzy and we have not found an
established similarity measure in literature. Therefore, we constructed
an ad-hoc similarity measure sim(at , ac ) ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], which considers
age diferences of up to 15 years as somewhat similar [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Emotional similarity measures the afective response to a
displayed item in comparison to the emotional reaction of previous
users to it. As previously mentioned, today’s computer vision
algorithms are capable of detecting several emotions at once. Therefore,
it is calculated by the cosine similarity of two emotion vectors, as
can be seen in Equation 6.</p>
      <p>sim(et , ec ) =
n
Õ e¯it · e¯i</p>
      <p>c
i=1
vt n
Õ
i=1
(e¯it )2
vt n
Õ
i=1
(e¯ic )2
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Capturing Afective Cues</title>
      <p>Microsoft Face analyzes the user’s face for age, gender, and up to
eight emotions. Experimenting with the computer vision service
before the main experiment showed that users tend to express their
emotional reactions shortly before requesting the next item and
maintain their facial expression for some time when the next item
is already shown. We call this ‘overflowing emotions’ , as the user’s
emotional reaction to the previous item overflows to the current
item and is then adjusted during the consumption and exit stage.
Since we are interested in the actual response to the item after
the content has been processed, we used the following weighted
average over all analyzed frames n as the aggregated metric to
emphasize the emotions from the exit stage.</p>
      <p>n
Õ 2i · ei
e¯ = i=1
n
Õ 2i
i=1</p>
      <p>Figure 1 shows the comparison of the mean value to our
proposed weighted average. Over the course of three items, the level of
observed happiness is shown in orange for 15 frames in the case of
Item A. Since we assume that the important reaction to the content
is at the end of the item display period, we are quite satisfied with
our weighted mean calculation. Note that we used a sampling rate
of one analyzed frame per second.</p>
      <p>An alternative would have been to aggregate over the last p%
of the frames. While we think that our measure is more robust,
an in-depth analysis of diferent aggregation strategies is left for
future work. Another idea for separating successive content is to
show a neutral screen for some time before showing the next. It is,
however, unclear what an adequate time is for that, as users tend to
show emotions for an unknown duration and may find this delay
annoying.
To evaluate our approach, we implemented an image recommender
prototype using Python. Figure 2 shows the high-level architecture:
the core part is a Flask4 web server that serves web pages with
the recommendations based on context information (age, gender,
emotions) from the computer vision service and the history of user
interactions retrieved from a PostgresSQL5 database.</p>
      <p>To answer our second research question, we compare our
variant of the Contextual-ε-Greedy with the traditional ε-Greedy in
a controlled lab experiment. The experimental procedure was the
following: The participant’s task is to rate images. Hoping to evoke
a large spectrum of emotions, we used a self-scraped data set of
3000 memes from the social web platform 9gag6 over the period
from January 24, to February 9, 2018. The subject is instructed to
take a seat in front of the screen with a webcam, it pointed out
that the camera is recording and information is being stored
according to local data privacy protection laws. She is asked to view
consecutively displayed images and provide feedback for each one
in the form of a ’like’ or ’dislike’ rating. The recommendation
engine attempts to optimize the amount of positive feedback using
our Contextual-ε-Greedy or the baseline ε-Greedy. Each subject is
shown 60 images per strategy, which is our independent variable.
The order of strategy is selected at random without the subject
being aware of this.</p>
      <p>We conducted the experiment in April 2018 in Garching with 21
volunteers (11 f / 10 m) afiliated with the Technical University of
Munich. The subjects’ ages varied between 19 to 31 years with a
mean value of 24.09. The dependent variables are the users’ feedback
4http://flask.pocoo.org
5https://www.postgresql.org
6https://9gag.com
to the item, the detected afective cues from the computer vision
service and additional information collected with a questionnaire.</p>
    </sec>
    <sec id="sec-7">
      <title>3.4 Evaluation Results</title>
      <p>In the convergence analysis of the algorithms, we observe an
improvement of the accuracy of time, i.e., the number of positive
ratings, in both recommendation strategies. To showcase this, we
ift a linear model over the algorithm convergence described in
Table 1. Over the course of 21 observations, the Contextual-ε-Greedy
starts slightly worse with 46.64% positive rewards; however, it
improves faster over time reaching 60.7% at the end of the experiment.
Note that the diference between the strategies is not significant
and this model should not be used to predict further observations.
Clearly, 21 observations with 60 ratings each are not enough for
the bandit algorithms to converge.</p>
      <p>A closer look into the properties of the Contextual-ε-Greedy
algorithm reveals avenues for improvement. Figure 3 depicts the
similarity of a participant’s stereotypic attributes to the previous
subjects. The most similar user pair per column has the lowest ε and
was leveraged by the Contextual algorithm for recommending the
next item (cf. Equation 4). A clearly visible pattern is that the same
gender plays a dominant role in the distance measure. Depending
on the recommended items, this could be adjusted in future studies.</p>
      <p>Further, we notice that the Microsoft Face algorithm mostly
detected two emotions. Overall, happiness and neutral make up
93.65% of the observed emotions, with neutral being the dominant
emotion. However, as seen in Table 2, positive feedback is more
likely if the afective response was happiness instead of neutral.</p>
      <p>Overall, the subjects rated 53.97% of the items positively,
although this varied a lot per user, ranging from only 3 positive
ratings up to 47 of 60. Also, the experiment showed that the
duration of item consumption varies, underlining the need for a dynamic
aggregation of the analyzed frame as in Equation 7.</p>
    </sec>
    <sec id="sec-8">
      <title>4 CONCLUSIONS AND FUTURE WORK</title>
      <p>Bandit algorithms provide a robust framework not only for online
advertisement, but also for personalized recommendations. The
possibility of calibrating the exploration vs. exploitation
probabilities using weighted similarity measures is an elegant way for the
hybridization of recommendation and active learning. Although
computer vision has not yet reached its full potential, it is
suficiently afordable and accurate to experiment with for RS research.</p>
      <p>In this paper, we have presented an approach for recommending
images using bandit algorithms and computer vision focusing on
improving recommendations in the cold start phase. Although our
contextual bandit algorithm was not significantly better than the
baseline, our work comprises the following contributions: (1) We
have developed a practical approach for using information from
facial classification within RSs, (2) we presented an adaptation
of the Contextual-ε-Greedy suited for incorporating stereotypic
information, (3) we developed a strategy with a weighted average
to mitigate the overflowing emotions problem, and (4) we have
shown using a lab study that by putting the pieces together, an
improvement of the recommendation accuracy could be achieved.
While this study was conducted with the informed consent of the
participants, the unconscious measuring of people’s emotions in
real-world applications is critical with respect to privacy concerns.</p>
      <p>Having realized this prototype based on many assumptions, we
can highlight the path for further research: Our post-mortem
analysis has shown the necessity of an evidence-based method for
adjusting the weights of the hybrid similarity measure. Having identified
the ‘overflowing emotions’ problem in sequential recommendations,
an in-depth analysis thereof would be interesting. Finally, we plan
to analyze the long term convergence of our bandit recommender
algorithm in a larger field experiment against simpler baselines,
e.g., random items, and to investigate the accuracy of emotional
classification and its potential impact on performance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Gediminas</given-names>
            <surname>Adomavicius</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Tuzhilin</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Context-Aware Recommender Systems</article-title>
          .
          <source>In Recommender Systems Handbook</source>
          . Springer,
          <fpage>191</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Mohammad</given-names>
            <surname>Yahya H. Al-Shamri</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>User Profiling Approaches for Demographic Recommender Systems</article-title>
          .
          <source>Knowledge-Based Systems</source>
          <volume>100</volume>
          (
          <year>2016</year>
          ),
          <fpage>175</fpage>
          -
          <lpage>187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Djallel</given-names>
            <surname>Bounefouf</surname>
          </string-name>
          , Amel Bouzeghoub, and Alda Lopes Gançarski.
          <year>2012</year>
          .
          <article-title>A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System</article-title>
          .
          <source>In International Conference on Neural Information Processing</source>
          . Springer,
          <fpage>324</fpage>
          -
          <lpage>331</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Il</given-names>
            <surname>Young</surname>
          </string-name>
          <string-name>
            <given-names>Choi</given-names>
            , Myung Geun Oh, Jae Kyeong Kim, and
            <surname>Young</surname>
          </string-name>
          <string-name>
            <given-names>U.</given-names>
            <surname>Ryu</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Collaborative Filtering with Facial Expressions for Online Video Recommendation</article-title>
          .
          <source>International Journal of Information Management</source>
          <volume>36</volume>
          ,
          <issue>3</issue>
          (
          <year>2016</year>
          ),
          <fpage>397</fpage>
          -
          <lpage>402</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Juliane</given-names>
            <surname>Exeler</surname>
          </string-name>
          , Markus Buzeck, and
          <string-name>
            <given-names>Jörg</given-names>
            <surname>Müller</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>eMir: Digital Signs that React to Audience Emotion</article-title>
          .
          <source>In 2nd Workshop on Pervasive Advertising</source>
          .
          <fpage>38</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>John</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gittins</surname>
          </string-name>
          .
          <year>1979</year>
          .
          <article-title>Bandit Processes and Dynamic Allocation Indices</article-title>
          .
          <source>Journal of the Royal Statistical Society: Series B (Statistical Methodology) 42</source>
          ,
          <issue>2</issue>
          (
          <year>1979</year>
          ),
          <fpage>148</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Oehme</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Utilizing Facial Classification for Improving Recommender Systems</article-title>
          .
          <source>Bachelor's thesis</source>
          . Technical University of Munich.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Michael</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pazzani</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>A Framework for Collaborative, Content-Based and Demographic Filtering</article-title>
          .
          <source>Artificial intelligence review 13</source>
          , 5 (Dec.
          <year>1999</year>
          ),
          <fpage>393</fpage>
          -
          <lpage>408</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Joseph</given-names>
            <surname>Redmon</surname>
          </string-name>
          , Santosh Divvala,
          <string-name>
            <surname>Ross Girshick</surname>
            , and
            <given-names>Ali</given-names>
          </string-name>
          <string-name>
            <surname>Farhadi</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>You Only Look Once: Unified, Real-Time Object Detection</article-title>
          . In Conference on Computer Vision and
          <article-title>Pattern Recognition (CVPR '16)</article-title>
          . IEEE,
          <fpage>779</fpage>
          -
          <lpage>788</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Elaine</given-names>
            <surname>Rich</surname>
          </string-name>
          .
          <year>1979</year>
          .
          <article-title>User Modeling via Stereotypes</article-title>
          .
          <source>Cognitive Science 3</source>
          ,
          <issue>4</issue>
          (Oct.
          <year>1979</year>
          ),
          <fpage>329</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Andrew</surname>
            <given-names>G. Barto Richard S.</given-names>
          </string-name>
          <string-name>
            <surname>Sutton</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Reinforcement Learning</article-title>
          . MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Herbert</given-names>
            <surname>Robbins</surname>
          </string-name>
          .
          <year>1985</year>
          .
          <article-title>Some Aspects of the Sequential Design of Experiments</article-title>
          .
          <source>In Herbert Robbins Selected Papers</source>
          . Springer,
          <fpage>169</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Marko</surname>
            <given-names>Tkalčič</given-names>
          </string-name>
          , Urban Burnik, Ante Odić, Andrej Košir, and
          <string-name>
            <given-names>Jurij</given-names>
            <surname>Tasič</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Emotion-Aware Recommender Systems-a Framework and a Case Study</article-title>
          .
          <source>In ICT Innovations 2012</source>
          . Springer,
          <fpage>141</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Marko</surname>
            <given-names>Tkalčič</given-names>
          </string-name>
          , Ante Odic, Andrej Kosir, and
          <string-name>
            <given-names>Jurij</given-names>
            <surname>Tasic</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Afective Labeling in a Content-Based Recommender System for Images</article-title>
          .
          <source>IEEE Transactions on Multimedia 15</source>
          ,
          <issue>2</issue>
          (Feb.
          <year>2013</year>
          ),
          <fpage>391</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Sicheng</surname>
            <given-names>Zhao</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Hongxun</given-names>
            <surname>Yao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Xiaoshuai</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <source>Video Classification and Recommendation Based on Afective Analysis of Viewers. Neurocomputing</source>
          <volume>119</volume>
          (
          <year>2013</year>
          ),
          <fpage>101</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>