Affective Computing and Bandits:
                         Capturing Context in Cold Start Situations
                          Sebastian Oehme                                                          Linus W. Dietz
                   Munich School of Engineering                                             Department of Informatics
                   Technical University of Munich                                         Technical University of Munich
                         Garching, Germany                                                     Garching, Germany
                      sebastian.oehme@tum.de                                                   linus.dietz@tum.de
ABSTRACT                                                                    Then, an in-depth description of the proposed approach and a
The cold start problem describes the initial phase of a collaborative       preliminary evaluation in a user study follow in Section 3. Finally,
recommender where the quality of recommendation is low due to an            we draw our conclusions and point out future work.
insufficient number of ratings. Overcoming this is crucial because
the system’s adoption will be impeded by low recommendation                 2     FOUNDATIONS
quality. In this paper, we propose capturing context via computer           Ever since Grundy [10], it has been known that using stereotypic
vision to improve recommender systems in the cold start phase.              information can be used to model users [2] and thereby improve
Computer vision algorithms can derive stereotypes such as gender            recommendation accuracy. Driven by our research questions, we
or age, but also the user’s emotions without explicit interaction. We       discuss a combination of two concepts applied for recommender
present an approach based on the statistical framework of bandit            systems: contextual bandits and facial classification using computer
algorithms to incorporate stereotypic information and affective             vision.
reactions into the recommendation. In a preliminary evaluation in
a lab study with 21 participants, we already observe an improve-            2.1    Bandit Strategies
ment of the number of positive ratings. Furthermore, we report              In real-world applications, recommendations are often linked to a
additional findings of experimenting with affective computing for           reward. For example, the purpose of recommendations in a shop
recommender systems.                                                        is to improve revenue by suggesting products to customers that
                                                                            they are more likely to buy. However, calculating the probabilities
KEYWORDS                                                                    of a successful recommendation directly is usually not possible
Recommender systems, affective computing, bandit algorithms                 due to a lack of information about the customer’s taste and the
                                                                            attractiveness of items.
ACM Reference Format:
                                                                                Bandit strategies provide a computational framework that trades
Sebastian Oehme and Linus W. Dietz. 2018. Affective Computing and Ban-
                                                                            off profit-maximization via items that are known to sell well and
dits: Capturing Context in Cold Start Situations. In Proceedings of IntRS
Workshop, October 2018 (IntRS’18). ACM, New York, NY, USA, 5 pages.         experimentation with items whose potential is yet to be determined.
                                                                            The terminology stems from the probability theory of gambling [12].
                                                                            A gambler at a row of one-armed bandits (slot machines) has to de-
1    INTRODUCTION                                                           cide based on incomplete knowledge: what arm to play, how often
Recommender systems (RS) match items to users, therefore the                to pull and when to play [6]. A bandit recommender engine seeks
accuracy of recommendations is highly dependent on the quality              to find the right balance between experimenting with new recom-
of information the system has about these. Collaborative filter-            mendations, i.e., exploration, and exploiting items that are already
ing (CF) has frequently been used if the items’ characteristics are         known to have a high chance of reward. A classic algorithm for
unknown or it is costly to derive them. CF systems are, however,            handling exploration vs. exploitation is the ε-Greedy algorithm [11].
not suited for scenarios where the user is anonymous and interacts          It chooses with a probability of ε to either exploit the best available
with the RS only for a short period. For example, a smart display           arm at the moment or to randomly explore any other arm. In cold
inside a fashion store could provide recommendations, however,              start situations, however, a bandit recommender suffers similar
the interaction will be brief and tentative. In such cold start sce-        limitations as traditional methods, such as collaborative filtering.
narios, literature suggests including context and stereotypes into          This can be overcome by adding context information, e.g., demo-
the recommendations [1]. If the weather is hot, suggest bathing             graphic information [8] to augment the bandit’s choice between
attire; a male customer will need shorts instead of a bikini. Moti-         exploration or exploitation with more data. These types of bandit
vated by this kind of a scenario, we develop an affective RS [13]           strategies are referred to as contextual bandits. In contrast to the
based on stereotypes derived via computer vision with little user           ε-Greedy algorithm, they incorporate contextual information and
collaboration. Our research was guided by the following questions:          are able to choose their action based on the situation. The classic
RQ 1: How can stereotypic information be incorporated into a RS?            algorithm is the Contextual-ε-Greedy strategy [3]. At each turn, it
RQ 2: Can facial classification and affective reactions be a surro-         compares the user’s situation (e.g., location, time, social activity)
        gate for explicit feedback?                                         to a set of high-level ‘critical situations’. If the situation is critical,
   In the following section, we describe the foundations of our RS:         the algorithm exploits this by showing items that are known to
bandit strategies and facial classification using computer vision.          be well suited and similar. Consequently, it explores other items if
IntRS’18, October 2018, Vancouver, Canada                                                                    Sebastian Oehme and Linus W. Dietz


the situation is not critical. It has been shown that the Contextual-         Our model extends the approach of Bouneffouf et al. [3] and
ε-Greedy algorithm generally achieves better click-through rates           likewise proceeds in discrete trials t = 1 . . . T . At each t, the
than ε-Greedy algorithms or pure exploration.                              following tasks are performed:
   In our approach, we propose using facial classification through         Task 1: Let U t be the current user’s profile and P the set of other
the use of computer vision to infer age, gender and emotions as            known user profiles. The system compares U t with the user profiles
contextual information within a contextual bandit algorithm.               in P in order to choose the most similar one, U P :

2.2     Facial Classification                                                                   U P = argmax(sim(U t , U c ))                       (1)
                                                                                                           U c ∈P
Computer vision has already been used to improve systems situated
                                                                           Our adapted similarity metric is the weighted sum of the similarity
in public places. For example, Müller et al. [5] described a system
                                                                           metrics for age, gender, and EF, the combination of emotions and
for digital signage. However, this and similar early approaches
                                                                           feedback. α, β, γ are weights associated with these metrics, defined
were ahead of their time: due to low face-detection accuracy, the
                                                                           in the following subsection:
outcomes of these experiments were not significant. Computer
vision-based approaches analyze users’ faces frame by frame via                  sim(U t , U c ) = α · sim(at , ac ) + β · sim(дt , дc ) + γ · EF   (2)
facial recognition software during an experimental task such as
watching videos. Zhao et al. [15] drew affective cues from users’             EF , short for emotional feedback, corresponds to the sum of k
affection changes. They used emotional changes to segment videos,          affective reactions simk (ekt , ekc ) ∈ [0, 1] depending on equal feed-
classified the video’s category and then presented recommenda-             back simk (fkt , fkc ) ∈ {0, 1} of the current user with respect to other
tions. Tkalčič et al. propose a framework for affective recommender        users’ profiles. This feedback, called reward in the bandit termi-
systems, where they distinguish between three phases of user in-           nology, can be any explicit or implicit feedback to the item, e.g.,
teraction: the entry, consumption, and exit stage [13]. The affective      the user’s rating or adding the item to the shopping basket. If the
cues drawn while watching content in the consumption stage are             feedback differs for an item, this item’s affective reaction will not
compared to the emotional state in the entry phase. The exit stage         contribute to the sum, hence it will be 0. EF is normalized to the
can simultaneously be the following entry stage when the next              number of items i which U t has seen so far.
item is recommended and the looped process continues. Affec-                                  Õ                                              
tive labeling of users’ faces has been applied e.g., to RSs [14] and                                simk (fkt , fkc ) · 1 + simk (ekt , ekc )
commercials [4], where they show promising results in terms of                                  k
                                                                                       EF =                                                 (3)
accuracy and user satisfaction.                                                                                 2i
   The accuracy of classification and the runtime performance of           Task 2: Let M be the set of items, Mt the items seen by the current
computer vision algorithms have improved over the past years and           user U t and M P ∈ {M \Mt } the items recommended to the user U P ,
with YOLO [9], the breakthrough to real time object detection has          but not to U t . After retrieving M P , the system displays the next
been achieved. In emotion detection, the state-of-the-art algorithms       item m ∈ M P to U t while observing the user’s affective reactions
are closed source and only available using web APIs. Prominent             during presentation.
vendors like Microsoft Face1 , Kairos2 and Affectiva3 offer RESTful        Task 3: After receiving the user’s reward, the algorithm refines its
client libraries and respective pricing models. The centralization of      item selection strategy with the new observation: user U P gives
this technology to few market players that cloak their algorithms in       item m P a binary reward. The expected reward for an item is the
secrecy should be seen with concern. Nevertheless, it should also be       average reward per total number of ratings n.
mentioned that such systems improve with the size of the training             Our adapted Contextual-ε-Greedy recommends items as follows:
set and enable researchers to work with this technology without
                                                                                           argmax (expectedReward(m)) if q > ε
hardware requirements. In our recommender system, we use the                              
                                                                                          
                                                                                    m=
                                                                                          
                                                                                              MP
Microsoft Face service to detect the age, gender and emotions of our                                                                                (4)
test subjects. The Face Emotion Recognition API returns continuous                         random ((M \ Mt ))
                                                                                                                                    otherwise
                                                                                          
values [0;1] for the following emotions: anger, contempt, disgust,         In Equation 4, the random variable q is responsible for the explo-
fear, happiness, neutral, sadness, and surprise at a small cost of about   ration versus exploitation behavior. In our approach it is uniformly
e1.40 per 1000 requests.                                                   distributed over [0,1]. If q is larger than ε, the item with the highest
                                                                           expected reward from M P = {m 1 , . . . , m P } will be selected, which
3     CONTEXTUAL RECOMMENDER MODEL                                         are all items rated by the most similar user. For this at least one
In our RS, the items are displayed to the user successively. While         unseen and positively rated item by the past user is required. In
the user inspects the items, she is observed by a camera whose             case all suitable items have been exploited or the current user is
imagery is continuously analyzed by computer vision. In this sec-          the first user and hence no other user profiles exist, the algorithms
tion, we first present how we incorporated computer vision into            falls back to exploration, where random(M) selects a random item.
the recommendation task, followed by the experimental setup and               To influence the original ε-Greedy algorithm with contextual
our findings.                                                              information, ε is computed by maximizing Equation 2, the similarity
                                                                           of the current user’s profile U t to the profile U P of the most similar
1 https://azure.microsoft.com/en-us/services/cognitive-services/face/
2 https://www.kairos.com/emotion-analysis-api
                                                                           other user:
3 https://www.affectiva.com/product/emotion-sdk/                                               ε = 1 − argmax(sim(U t , U c ))                      (5)
                                                                                                  IntRS’18, October 2018, Vancouver, Canada


3.1    Similarity Measures
The Contextual-ε-Greedy strategy is driven by the stereotypic sim-
ilarity of the current user to previously seen users. In this first
experiment, we used α = β = 0.25 and γ = 0.5 as weights for
Equation 2.
    Gender similarity is binary, due to output of the employed
facial classification algorithm. Either it matches, or it does not:
sim(дt , дc ) ∈ {0, 1}.
    Age similarity is more fuzzy and we have not found an estab-
lished similarity measure in literature. Therefore, we constructed                   Figure 2: Prototype System Architecture
an ad-hoc similarity measure sim(at , ac ) ∈ [0, 1], which considers
age differences of up to 15 years as somewhat similar [7].
                                                                       observed happiness is shown in orange for 15 frames in the case of
    Emotional similarity measures the affective response to a dis-
                                                                       Item A. Since we assume that the important reaction to the content
played item in comparison to the emotional reaction of previous
                                                                       is at the end of the item display period, we are quite satisfied with
users to it. As previously mentioned, today’s computer vision algo-
                                                                       our weighted mean calculation. Note that we used a sampling rate
rithms are capable of detecting several emotions at once. Therefore,
                                                                       of one analyzed frame per second.
it is calculated by the cosine similarity of two emotion vectors, as
                                                                          An alternative would have been to aggregate over the last p%
can be seen in Equation 6.
                                                                       of the frames. While we think that our measure is more robust,
                                           n
                                           Õ                           an in-depth analysis of different aggregation strategies is left for
                                                 ēit · ēic           future work. Another idea for separating successive content is to
                                           i=1
               sim(e t , e c ) = v              v               (6)    show a neutral screen for some time before showing the next. It is,
                                     n
                                 t              t n
                                     Õ
                                          t
                                                 Õ                     however, unclear what an adequate time is for that, as users tend to
                                       (ēi ) 2     (ēic )2           show emotions for an unknown duration and may find this delay
                                     i=1               i=1
                                                                       annoying.
3.2    Capturing Affective Cues
                                                                       3.3      Prototype and Experiment
Microsoft Face analyzes the user’s face for age, gender, and up to
eight emotions. Experimenting with the computer vision service         To evaluate our approach, we implemented an image recommender
before the main experiment showed that users tend to express their     prototype using Python. Figure 2 shows the high-level architecture:
emotional reactions shortly before requesting the next item and        the core part is a Flask4 web server that serves web pages with
maintain their facial expression for some time when the next item      the recommendations based on context information (age, gender,
is already shown. We call this ‘overflowing emotions’, as the user’s   emotions) from the computer vision service and the history of user
emotional reaction to the previous item overflows to the current       interactions retrieved from a PostgresSQL5 database.
item and is then adjusted during the consumption and exit stage.          To answer our second research question, we compare our vari-
Since we are interested in the actual response to the item after       ant of the Contextual-ε-Greedy with the traditional ε-Greedy in
the content has been processed, we used the following weighted         a controlled lab experiment. The experimental procedure was the
average over all analyzed frames n as the aggregated metric to         following: The participant’s task is to rate images. Hoping to evoke
emphasize the emotions from the exit stage.                            a large spectrum of emotions, we used a self-scraped data set of
                                                                       3000 memes from the social web platform 9gag6 over the period
                                n
                                Õ                                      from January 24, to February 9, 2018. The subject is instructed to
                                       2i · ei
                                                                       take a seat in front of the screen with a webcam, it pointed out
                                i=1
                           ē =    n                            (7)    that the camera is recording and information is being stored ac-
                                  Õ
                                           2i                          cording to local data privacy protection laws. She is asked to view
                                     i=1                               consecutively displayed images and provide feedback for each one
  Figure 1 shows the comparison of the mean value to our pro-          in the form of a ’like’ or ’dislike’ rating. The recommendation en-
posed weighted average. Over the course of three items, the level of   gine attempts to optimize the amount of positive feedback using
                                                                       our Contextual-ε-Greedy or the baseline ε-Greedy. Each subject is
                                                                       shown 60 images per strategy, which is our independent variable.
                                                                       The order of strategy is selected at random without the subject
                                                                       being aware of this.
                                                                          We conducted the experiment in April 2018 in Garching with 21
                                                                       volunteers (11 f / 10 m) affiliated with the Technical University of
                                                                       Munich. The subjects’ ages varied between 19 to 31 years with a
                                                                       mean value of 24.09. The dependent variables are the users’ feedback
                                                                       4 http://flask.pocoo.org
                                                                       5 https://www.postgresql.org
   Figure 1: Overflowing Emotions. Happiness Example                   6 https://9gag.com
IntRS’18, October 2018, Vancouver, Canada                                                                                                                                                             Sebastian Oehme and Linus W. Dietz

Age                 20    26     19     25     23    29     31        26      24     21    25    21    28    24    24    21    21    26       23    25    24
      Gender        M     F       M     M      M      M         F      M      M      F     F     F     M     F     F     F     M      F       F     M     M          Table 2: Correlation of Emotions with Rating Feedback
               ID   1     2       3      4     5      6         7      8      9      10    11    12    13    14    15    16    17    18       19    20    21
20     M       1         0.793 0.400 0.498 0.408 0.598 0.932 0.563 0.455 0.690 0.763 0.719 0.581 0.755 0.771 0.708 0.393 0.850 0.713 0.501 0.500
26     F       2                0.824 0.658 0.712 0.710 0.498 0.718 0.703 0.451 0.312 0.501 0.707 0.378 0.399 0.494 0.806 0.418 0.404 0.705 0.690
19     M       3                       0.368 0.414 0.620 0.883 0.643 0.520 0.693 0.800 0.703 0.643 0.798 0.812 0.691 0.407 0.872 0.775 0.510 0.495
25     M       4                             0.323 0.451 0.807 0.475 0.403 0.757 0.671 0.749 0.454 0.703 0.709 0.780 0.463 0.719 0.697 0.392 0.392
                                                                                                                                                                           Feedback         happiness         neutral        other      n
23     M       5                                    0.480 0.847 0.441 0.348 0.671 0.695 0.673 0.456 0.671 0.717 0.714 0.318 0.726 0.683 0.358 0.361
29
31
       M
       F
               6
               7
                                                           0.689 0.360 0.386 0.855 0.757 0.868 0.305 0.769 0.818 0.868 0.502 0.722 0.824 0.424 0.420
                                                                     0.810 0.886 0.561 0.448 0.629 0.714 0.563 0.568 0.607 0.924 0.516 0.605 0.822 0.874
                                                                                                                                                                           positive         25.06%            68.90%         6.04%      680
26
24
       M
       M
               8
               9
                                                                            0.326 0.814 0.712 0.817 0.350 0.684 0.709 0.792 0.475 0.698 0.735 0.409 0.405
                                                                                    0.733 0.711 0.739 0.412 0.676 0.709 0.717 0.313 0.696 0.693 0.355 0.383
                                                                                                                                                                           negative         7.24%             86.04%         6.72%      580
21     F       10                                                                         0.408 0.345 0.852 0.346 0.371 0.376 0.695 0.494 0.358 0.784 0.700
25     F       11                                                                               0.442 0.719 0.383 0.391 0.462 0.765 0.389 0.319 0.705 0.712
21     F       12                                                                                     0.851 0.364 0.403 0.392 0.698 0.465 0.337 0.796 0.721
28     M       13                                                                                           0.741 0.773 0.889 0.465 0.711 0.770 0.394 0.419
24
24
       F
       F
               14
               15
                                                                                                                  0.236 0.283 0.723 0.303 0.390 0.709 0.683
                                                                                                                        0.258 0.738 0.336 0.398 0.709 0.720
                                                                                                                                                                 4    CONCLUSIONS AND FUTURE WORK
21     F       16                                                                                                             0.715 0.404 0.412 0.776 0.747
21     M       17                                                                                                                   0.814 0.693 0.429 0.391      Bandit algorithms provide a robust framework not only for online
26     F       18                                                                                                                            0.439 0.711 0.711
23     F       19                                                                                                                                  0.719 0.709   advertisement, but also for personalized recommendations. The
25     M       20                                                                                                                                        0.327
24     M       21                                                                                                                                                possibility of calibrating the exploration vs. exploitation probabili-
                              Color scale for epsilon values:        0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.000                           ties using weighted similarity measures is an elegant way for the
                                                                     exploitation                                              exploration
                                                                                                                                                                 hybridization of recommendation and active learning. Although
                                                                                                                                                                 computer vision has not yet reached its full potential, it is suffi-
Figure 3: Values of ε Throughout the Contextual Experiment                                                                                                       ciently affordable and accurate to experiment with for RS research.
                                                                                                                                                                    In this paper, we have presented an approach for recommending
to the item, the detected affective cues from the computer vision                                                                                                images using bandit algorithms and computer vision focusing on
service and additional information collected with a questionnaire.                                                                                               improving recommendations in the cold start phase. Although our
                                                                                                                                                                 contextual bandit algorithm was not significantly better than the
3.4                 Evaluation Results                                                                                                                           baseline, our work comprises the following contributions: (1) We
In the convergence analysis of the algorithms, we observe an im-                                                                                                 have developed a practical approach for using information from
provement of the accuracy of time, i.e., the number of positive                                                                                                  facial classification within RSs, (2) we presented an adaptation
ratings, in both recommendation strategies. To showcase this, we                                                                                                 of the Contextual-ε-Greedy suited for incorporating stereotypic
fit a linear model over the algorithm convergence described in Ta-                                                                                               information, (3) we developed a strategy with a weighted average
ble 1. Over the course of 21 observations, the Contextual-ε-Greedy                                                                                               to mitigate the overflowing emotions problem, and (4) we have
starts slightly worse with 46.64% positive rewards; however, it im-                                                                                              shown using a lab study that by putting the pieces together, an
proves faster over time reaching 60.7% at the end of the experiment.                                                                                             improvement of the recommendation accuracy could be achieved.
Note that the difference between the strategies is not significant                                                                                               While this study was conducted with the informed consent of the
and this model should not be used to predict further observations.                                                                                               participants, the unconscious measuring of people’s emotions in
Clearly, 21 observations with 60 ratings each are not enough for                                                                                                 real-world applications is critical with respect to privacy concerns.
the bandit algorithms to converge.                                                                                                                                  Having realized this prototype based on many assumptions, we
                                                                                                                                                                 can highlight the path for further research: Our post-mortem analy-
                         Table 1: Linear Trend Models of Rewards                                                                                                 sis has shown the necessity of an evidence-based method for adjust-
                                                                                                                                                                 ing the weights of the hybrid similarity measure. Having identified
 Strategy                                                           Linear Equation                                                                f(21)         the ‘overflowing emotions’ problem in sequential recommendations,
                                                                                                                                                                 an in-depth analysis thereof would be interesting. Finally, we plan
 ε-Greedy                                                           f (x) = 0.47754 + 0.0047835 · x                                                0.578
                                                                                                                                                                 to analyze the long term convergence of our bandit recommender
 Contextual-ε-Greedy                                                f (x) = 0.463968 + 0.0068831 · x                                               0.607         algorithm in a larger field experiment against simpler baselines,
                                                                                                                                                                 e.g., random items, and to investigate the accuracy of emotional
   A closer look into the properties of the Contextual-ε-Greedy                                                                                                  classification and its potential impact on performance.
algorithm reveals avenues for improvement. Figure 3 depicts the
similarity of a participant’s stereotypic attributes to the previous                                                                                             REFERENCES
subjects. The most similar user pair per column has the lowest ε and                                                                                              [1] Gediminas Adomavicius and Alexander Tuzhilin. 2015. Context-Aware Recom-
was leveraged by the Contextual algorithm for recommending the                                                                                                        mender Systems. In Recommender Systems Handbook. Springer, 191–226.
                                                                                                                                                                  [2] Mohammad Yahya H. Al-Shamri. 2016. User Profiling Approaches for Demo-
next item (cf. Equation 4). A clearly visible pattern is that the same                                                                                                graphic Recommender Systems. Knowledge-Based Systems 100 (2016), 175–187.
gender plays a dominant role in the distance measure. Depending                                                                                                   [3] Djallel Bouneffouf, Amel Bouzeghoub, and Alda Lopes Gançarski. 2012. A
on the recommended items, this could be adjusted in future studies.                                                                                                   Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System.
                                                                                                                                                                      In International Conference on Neural Information Processing. Springer, 324–331.
   Further, we notice that the Microsoft Face algorithm mostly                                                                                                    [4] Il Young Choi, Myung Geun Oh, Jae Kyeong Kim, and Young U. Ryu. 2016. Col-
detected two emotions. Overall, happiness and neutral make up                                                                                                         laborative Filtering with Facial Expressions for Online Video Recommendation.
93.65% of the observed emotions, with neutral being the dominant                                                                                                      International Journal of Information Management 36, 3 (2016), 397–402.
                                                                                                                                                                  [5] Juliane Exeler, Markus Buzeck, and Jörg Müller. 2009. eMir: Digital Signs that
emotion. However, as seen in Table 2, positive feedback is more                                                                                                       React to Audience Emotion. In 2nd Workshop on Pervasive Advertising. 38–44.
likely if the affective response was happiness instead of neutral.                                                                                                [6] John C. Gittins. 1979. Bandit Processes and Dynamic Allocation Indices. Journal
                                                                                                                                                                      of the Royal Statistical Society: Series B (Statistical Methodology) 42, 2 (1979),
   Overall, the subjects rated 53.97% of the items positively, al-                                                                                                    148–177.
though this varied a lot per user, ranging from only 3 positive                                                                                                   [7] Sebastian Oehme. 2018. Utilizing Facial Classification for Improving Recommender
ratings up to 47 of 60. Also, the experiment showed that the dura-                                                                                                    Systems. Bachelor’s thesis. Technical University of Munich.
tion of item consumption varies, underlining the need for a dynamic
aggregation of the analyzed frame as in Equation 7.
                                                                                                                     IntRS’18, October 2018, Vancouver, Canada


 [8] Michael J. Pazzani. 1999. A Framework for Collaborative, Content-Based and          [13] Marko Tkalčič, Urban Burnik, Ante Odić, Andrej Košir, and Jurij Tasič. 2013.
     Demographic Filtering. Artificial intelligence review 13, 5 (Dec. 1999), 393–408.        Emotion-Aware Recommender Systems–a Framework and a Case Study. In ICT
 [9] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only           Innovations 2012. Springer, 141–150.
     Look Once: Unified, Real-Time Object Detection. In Conference on Computer           [14] Marko Tkalčič, Ante Odic, Andrej Kosir, and Jurij Tasic. 2013. Affective Labeling
     Vision and Pattern Recognition (CVPR ’16). IEEE, 779–788.                                in a Content-Based Recommender System for Images. IEEE Transactions on
[10] Elaine Rich. 1979. User Modeling via Stereotypes. Cognitive Science 3, 4 (Oct.           Multimedia 15, 2 (Feb. 2013), 391–400.
     1979), 329–354.                                                                     [15] Sicheng Zhao, Hongxun Yao, and Xiaoshuai Sun. 2013. Video Classification and
[11] Andrew G. Barto Richard S. Sutton. 1998. Reinforcement Learning. MIT Press.              Recommendation Based on Affective Analysis of Viewers. Neurocomputing 119
[12] Herbert Robbins. 1985. Some Aspects of the Sequential Design of Experiments.             (2013), 101–110.
     In Herbert Robbins Selected Papers. Springer, 169–177.