MovieTweeters
                                   An Interactive Interface to Improve Recommendation Novelty

                              Ishan Ghanmode                                                  Nava Tintarev
                      Delft University of Technology                                  Delft University of Technology
                          Delft, The Netherlands                                          Delft, The Netherlands
                      I.Ghanmode@student.tudelft.nl                                        n.tintarev@tudelft.nl
ABSTRACT                                                                into an interactive interface to improve recommendation novelty.
This paper introduces and evaluates a novel interface, MovieTweet-      Our contributions are as follows:
ers. It is a movie recommendation system which incorporates so-              • We introduce a novel interface, MovieTweeters, which incor-
cial information with a traditional recommendation algorithm to                porates social information into a traditional recommendation
generate recommendations for users. Few previous studies have                  algorithm. This enables users to leverage their relevant social
investigated the influence of using social information in interactive          information and discover novel (and more recent) content.
interfaces to improve the novelty of recommendations. To address             • This paper evaluates the system in terms of its ability to
this gap, we investigate whether social information can be incor-              improve: a) system diversity; b) perceived novelty, c) perceived
porated effectively into an interactive interface to improve recom-            diversity.
mendation novelty and user satisfaction. Our initial results suggest         • We study the relationship between system and user measures.
that such an interactive interface does indeed help users discover             We also establish a positive impact of users perceived quality
more novel items. Also, we observed users who perceived that they              of recommendations on their overall satisfaction.
discovered more novel and diverse items reported increased levels          The remainder of this paper is organized as follows: First, we
of user satisfaction. Surprisingly, we observed that even though we     present a discussion of related work in Section 2. Next, we intro-
successfully were able to increase the system diversity of the rec-     duce the MovieTweeters system, including the design choices for
ommendations, it had a negative correlation with users perception       the interactive user interface in Section 3. We also discuss the un-
of novelty and diversity of the items highlighting the importance of    derlying algorithms used in the system. In Section 4, we describe
improved user-centered approaches.                                      an online user experiment (N=23) in which we evaluate the system.
                                                                        We present our results in Section 5. A brief discussion of the notable
CCS CONCEPTS                                                            results are presented in Section 6 and finally we conclude in Section
• Human-centered computing → User interface design; • In-               7 with ideas for future work.
formation systems → Recommender systems;
                                                                        2     RELATED WORK
KEYWORDS                                                                To frame our research done in this paper in terms of related work,
Social Recommendation Systems, Novelty, Diversity, User Satisfac-       we discuss three key areas in detail. First, we discuss related work
tion, Interactive User Interfaces                                       in existing social recommendation systems. Second, we focus on the
                                                                        importance of inspectability and control on recommendation system
                                                                        interfaces. We also look into how these interfaces have an impact
1    INTRODUCTION                                                       on the users. Finally, we also present a discussion of related work
Social networks such as Facebook1 and Twitter2 have emerged as          in the area of beyond accuracy metrics such as diversity and novelty.
some of the most popular social media platforms allowing users
to communicate and express their opinions and feedback. This
                                                                        2.1    Social Recommendation Systems
online interaction between users generates social and preference        One of the definitions for social recommendation is any given
information which could be effectively harnessed in recommender         recommendation with online social information as an additional
systems. This social information has been found to improve algo-        input i.e augmenting or improving the existing social recommen-
rithmic performance [28]. However, accurate recommendations do          dations with additional social information [13]. One of the earliest
not always correspond to higher levels of user satisfaction [22, 27].   works which included social properties was done in [12], where
In response, researchers have proposed ‘beyond accuracy’ metrics        the researchers built ReferralWeb. It was an interactive system for
such as diversity and novelty [10, 26], and worked on interfaces to     searching relevant social networks on the World Wide Web. Social
improve the quality of recommendations [27].                            information can be in the form of social relations, friendships, social
   A limited, but growing, body of literature has studied the influ-    influence and so on. In this definition, the social recommendation
ence of using social information in interactive interfaces to improve   systems assume that the users are related when they establish social
the novelty of recommendations. To address this gap, we investi-        relations. Under this assumption, the social information or social
gate whether social information can be incorporated effectively         relations are used to improve the performance of the recommenda-
                                                                        tions [19]. We base our study around the first definition, where we
1 https://www.facebook.com, accessed July 2018                          use existing social information as an additional input to improve
2 https://www.twitter.com, accessed July 2018
                                                                        the quality of recommendations.
IntRS Workshop, October 2018, Vancouver, Canada                                                          Ishan Ghanmode and Nava Tintarev


2.2    Inspectability and Control in                                      2.3    Beyond Accuracy
       Recommendation Systems                                             Researchers have stated that accuracy is not always the only crite-
Factors of Inspectability and Control have played an emerging role        ria which fulfill user satisfaction [10]. Different beyond accuracy
in areas of intelligent systems and in recent years, micro blogs.         metrics have been defined to evaluate recommender systems [7].
                                                                          Authors in their work [26] show how factors apart from only accu-
   2.2.1 Inspectability. Inspectability across recommendation sys-        racy can make users more satisfied. Users may also be interested in
tems literature is defined as the process of exposing users to the rea-   discovering novel products or in exploring more diverse items. In
soning and data process behind a recommendation. Inspectability of        this sub-section, two main beyond accuracy criteria are discussed:
the interface also increases the user’s trust in the recommendation       Novelty and Diversity. Both of these criteria play a critical role for
system. Authors in [1] designed a hybrid recommendation system            evaluation in this study.
which allowed users to understand and control different aspects              Novelty. This criterion has been defined as “new-original and
of the recommendation process instilling factors of inspectability        of a kind which has not been seen before” [31]. More researchers
and control. Authors in [14] worked on a modified version of the          are inclined in the direction that novelty is one of the fundamental
system built in [1] where they assumed the notion of inspectability       qualities which can be used to measure a recommendation’s effec-
be similar to the concept of transparency stated by authors in [29].      tiveness. Novel recommendations are item recommendations that
Their work concluded that social recommender systems and recom-           the user was unaware about. Good novelty metrics would usually
mender systems in general do indeed can benefit from facilities that      measure how well a recommendation system was able to make a
improve the inspectability. Inspectability and control over recom-        user aware of their previously unknown items [10].
mendation algorithms also provide an efficient way of dealing with           Diversity. This is a concept that has been well studied in the in-
vast amount of social content. In other previous work, researchers        formation retrieval literature. It is generally defined as the opposite
developed a system called TwitInfo for visualizing and summariz-          of similarity. One of the most explored methods for diversity is the
ing important events on Twitter [18]. Furthermore, incorporating          item-item similarity mostly based on the item content [25]. Authors
explanations and dynamic feedback with recommendation system              in [30], state a framework for novelty and diversity on the basis
interfaces have shown to positively impact user perception levels         of three concepts namely: choice, diversity and relevance. In [6],
of the recommendation process. While designing our interface, in-         researchers measure perceived diversity and overall attractiveness
spectability formed a crucial element of the interface allowing users     of the recommendation list.
to browse through the vast amount of relevant social information
and understand how items were recommended to them.                        3     MOVIETWEETERS SYSTEM
   2.2.2 Control. Control can be defined as the process of allowing       In this section we look in detail the underlying design of our sys-
users to interact with different recommendation system options            tem, MovieTweeters. We define the following two research goals for
to tweak recommendations. Researchers have implemented differ-            our study: RG1: Incorporate social information within an existing
ent methods of control in their systems which range from rating           traditional recommendation system and recommend new and diverse
items to assigning weights to item attributes. In [8], researchers        items to users, RG2: Study the relationship between beyond accuracy
developed SmallWorlds, a live Facebook application which had an           metrics (novelty, diversity) and user satisfaction.
interactive interface. This was used to control item predictions
based upon the underlying data from the Facebook API. Authors
in [20] developed a collaborative recommendation system with an
interactive interface allowing users to manipulate and tune differ-
ent options on the interface to generate relevant recommendations.
Authors in [1, 14] allowed users to dynamically change and update
their preferences during a recommendation process. In our study,
we base our interface design to include control to allow users to
dynamically modify different system controls (c.f., Section 3).
   2.2.3 Impact of Interfaces on Users of Recommender Systems.
People’s opinions about the items recommended to them and their
usability is also directly influenced by the interface of the recom-
mendation system in use. Researchers studied different user inter-        Figure 1: Overview of research goals, studying the effect of
actions with recommender systems and concluded that to design             offline measures on user perceptions.
an effective interface, one must consider the following two points:
one, what specific user needs are satisfied with the interaction and
two, what specific systems features lead to satisfaction of those            In order to study our research goals, we define the following
needs [27]. A more user-centric approach towards evaluating rec-          research steps in Figure 1. Based on the user-centric framework
ommendation systems has been suggested in the ResQue model by             defined by Knijnenburg et al. [15], we define system diversity as
Pearl et al. in [22] which aims to assess the perceived qualities of      an objective system aspect, the perceived user qualities (perceived
recommenders such as their usability, usefulness, user’s satisfaction     novelty and perceived diversity) as subjective system aspects and
and so on.                                                                overall user satisfaction as an user experience aspect for our system.
IntRS Workshop, October 2018, Vancouver, Canada                                                          Ishan Ghanmode and Nava Tintarev


Figure 2: MovieTweeters: A. Initial Recommendations Phase, B. Social Information Phase, C. Revised Recommendations Phase


   We first analyze how system diversity influences user perception           To on-board new users in the system and to generate the initial
(perceived diversity and perceived novelty) of the recommended             set of movie recommendations, we decided to use the MovieLens
items. Then, we analyze the impact of these perceived qualities on         1M dataset [9] due to its popularity and being a stable benchmark
the overall user satisfaction.                                             dataset. First step, was to select the initial seed items. One common
   We designed our system, MovieTweeters, a web-based movie                selection strategy is to use the popularity measure while determin-
recommendation interface, to help us understand the impact of              ing the seed items. In this, the items are ranked in a decreasing
social information on the perceived quality of recommendations             order of the number of ratings. TheMovielens 1M dataset suffers
and to study the relationships between perceived quality of the            from a long tail distribution problem [21]. This essentially means
recommended items and the overall user satisfaction. Figure 2 shows        there are some movies in the dataset which have been very fre-
an overview of the system. MovieTweeters consists of three main            quently rated (the popular movies). Some methods to deal with
phases namely: Initial Recommendations Phase, Social Information           this phenomenon was explained using diffusion theory [11] and
Phase and Revised Recommendations Phase. All three phases were             graph-based approaches [32].
visible to the user when conducting the experiment. Next, we look             Using the popularity strategy to select seed items would end up
into these three phases in more detail.                                    selecting items which are most popular (highly rated) in the dataset
                                                                           and would ignore the unpopular or new ones. This could create a
3.1    Initial Recommendations Phase                                       bias where only popular (highly rated) movies are recommended. In
The initial recommendations phase was involved with the main               order to avoid this bias, we based our approach to select items based
task of on-boarding new users into our system. We achieved this            on 2 criteria: popularity (number of times rated) and ratings of the
by first understanding the movie preferences of the new users and          movies (as defined by the authors in [23]). We calculated a seed
generating an initial set of movie recommendations for them.               score for each movie in the dataset using the following approach:

   3.1.1 On-boarding New Users. Recommender systems help sug-                          seedscore = log(popularity) × σ 2 (ratinдs)           (1)
gest items to users they may like based upon the knowledge about           To calculate the seed score (Equation 1), we first took the logarithm
the user and the space of available items. However, when new users         (base 10) of the popularity. In the second half of the formula, we
first enter the system, the system has no information about them.          consider the variance of the ratings. This gives us a measure of how
The process of including new users into the system is known as             diverse the ratings have been for a particular movie.
on-boarding. One of the most popular and direct ways to achieve
this is to ask the new users to rate an initial set of items, also known      3.1.2 Recommendation Algorithm. In our system, we used Item-
as seed items.                                                             Item Collaborative Filtering algorithm [24] to generate the initial
IntRS Workshop, October 2018, Vancouver, Canada                                                          Ishan Ghanmode and Nava Tintarev


set of movie recommendations for the new users. As the main focus
of this study is the role and impact of recommendation interfaces,
we decided to use a Collaborative Filtering algorithm to make the
initial set of movie recommendations because of their popularity
and simplicity of use and implementation. We used the GraphLab
toolkit [16] to implement it in our system.
                                                                         Figure 3: Selecting and ordering of retrieved relevant movies
   3.1.3 Process Flow. After on-boarding new users into our sys-
tem, a set of 10 initial movie recommendations were generated for
them (as shown in Initial Recommendations Phase in Figure 2). We
refer to these first list of top-10 recommendations as Recommenda-         3.3.1 Maximal Marginal Relevance. It is one of the diversity-
tion List 1 (RL1).                                                       based re-ranking methods used for reordering ranked list of docu-
                                                                         ments [3]. Following its success in the field of text retrieval and sum-
3.2    Social Information Phase                                          marization, we tweaked the Maximal Marginal Relevance method
                                                                         and apply it to our relevant movie list. It was calculated using a
The social information phase was mainly responsible for incorpo-
                                                                         weighted linear combination of relevance and diversity [3]:
rating a relevant social information dataset into our system.
   3.2.1 Social Information Dataset. We used MovieTweetings [5] as        MMR ≜ max [λ(Rel(D i , Q))−(1−λ)( max (1−Sim(D i , D j ))] (2)
                                                                                    Di ∈R/S                         D j ∈S
our social information source. MovieTweetings comprises of IMDb
ratings expressed by Twitter users who have connected their IMDb         In Equation 2, R refers to the ranked list of movies in the relevant
accounts to their Twitter account.                                       movie list. R/S is the set difference, i.e, the set of unselected movies
    3.2.2 Process Flow. The next task for the system user was to         from R. S refers to the movies which are already retrieved. The first
select the most relevant movies out of the initial recommendation        half of the equation, λ(Rel(D i , Q)) addresses the relevance aspect
list (RL1). Relevant movies here is defined as the movies which the      of the equation and is calculated by comparing how similar is D i is
system user has already watched (consumed) or the movies which           to Q. D i refers to a movie in the relevant movie list. Q for a given
seem the most interesting to him/her. After the relevant movies          system user is calculated by considering the movies (from the Initial
were selected by the system user, the next step was to retrieve          Recommendations Phase) which were selected by him/her. This
the relevant Twitter users from our pre-processed MovieTweetings         similarity was calculated by comparing the genres of the list of
dataset and display them. This was a two step process: First, all        movies in Q with D i using cosine similarity. This gave a relevance
distinct Twitter users who rated at least one of the selected relevant   score which constituted the first half of the equation.
movies were first retrieved. Second step was calculating how similar         The second half of the equation (1−λ)(maxD j ∈S (1−Sim(D i , D j ))
these retrieved Twitter users were to the system user. In order to       addresses the diversity aspect of the MMR equation. S refers to the
perform this task, we first retrieved all movies rated by each Twitter   movies which are already selected in the re-ranked MMR List. Here,
user individually. For the system user, we considered all the movies     cosine similarity was used to calculate how similar D i was with D j
he/she rated from the initial seed items. We ran the cosine similarity   based on their genre details. This constituted a score for diversity
to measure the relevant similarity between the genre distribution        which formed the second half of the equation. These two scores
of movies consumed by the system user and each of the Twitter            are combined together to form the final MMR score.
User. This produced a similarity score which denoted how similar             Based on the MMR scores, a revised list of movie recommenda-
was a given Twitter user to the system user.                             tions, both relevant to the system user and diverse according to
    We displayed the Twitter Users in the decreasing order of their      their tastes, was generated and presented to them (as shown in
cosine similarity scores (as shown in Social Information Phase in        Revised Recommendations Phase in Figure 2). The size of the MMR
Figure 2). A slider was also included in the system which allowed        list was set to be 10 (top-10 recommendations). We refer to this
the system users to segregate Twitter users based on their similarity    revised list of top-10 recommendations as Recommendation List 2
score. The system users had to select at least one preferred Twitter     (RL2).
user. There was no limit on the maximum number of users selected.
                                                                         4    EXPERIMENT
3.3    Revised Recommendations Phase                                     We evaluated our system MovieTweeters using both offline and on-
The revised recommendations phase was responsible for generating         line metrics. We seek to answer the following: First, the impact of
a revised list of movie recommendations. After the most preferred        social information (with an interactive interface) on the quality of
Twitter users were selected by the system user, the next step was        recommendations. Second, the relationship between the quality of
to retrieve a list of all the movies rated by these Twitter users from   the recommendations and user satisfaction. The offline evaluation
our pre-processed MovieTweetings dataset. We calculated the local        metrics which we define later in this section, help us analyze how
popularity score of each movie in the list of retrieved movies. Local    an additional input of social information with a traditional recom-
popularity is basically the number of occurrences of the movie in        mendation system affects system diversity. The online evaluations
the list. We then ordered this list of movies in a descending order      help us analyze the user perceptions (perceived novelty & perceived
based on their local popularity scores. We refer to this movie list as   diversity) of the recommendations and understand their satisfaction
the relevant movie list. Figure 3 shows a description of this process.   levels.
IntRS Workshop, October 2018, Vancouver, Canada                                                            Ishan Ghanmode and Nava Tintarev


4.1     Variable Description                                                4.3.2 MovieTweetings Dataset. We used MovieTweetings as our
We describe all the dependent and independent variables which are        social information source. We used this dataset to make our revised
evaluated during the course of the experiment and also form a part       set of movie recommendations (RL2). Our pre-processing steps
of our hypotheses. There is one main independent variable in our         included removal or irrelevant fields such as time-stamp of the rat-
experiment:                                                              ings, retrieving the Twitter IDs of the users in the dataset, removal
                                                                         of movies with no relevant genre information present. Our pre-
      • System Diversity: In the context of this study, we define        processed MovieTweetings dataset had 606767 ratings from 45871
        System Diversity for the recommended items (movies) in           users for 27093 movies.
        terms of how different they are in terms of genre.
Based on the independent variable defined above, we now define           4.4     Experimental design
our two dependent variables whose effects will be evaluated and
                                                                         Keeping in mind the hypotheses stating the impact of system di-
tested during the course of the experiment.
                                                                         versity on user perception and the relationship between user per-
      • Perceived Novelty: The extent to which users receive “new”       ception of the recommendations with their satisfaction (defined in
        movie recommendations. Here, we evaluate whether users           Section 4.2), we study and analyze the impact of different system
        are able to come across movies which they have not seen          variables on the perceived quality attributes.
        before. It is derived from the ResQue framework developed
        by Pearl et al. in [22] which accesses the quality of the rec-      4.4.1 Evaluation Metrics. We study the impact of system diver-
        ommended items.                                                  sity on perceived novelty and perceived diversity.
      • Perceived Diversity: The extent to which users felt that               • System Diversity: We used Intra List Diversity [2] to cal-
        the recommended items were diverse to them. It is defined                culate the system diversity of the two generated recommen-
        from the “Perceived System Qualities” stated in [22] by Pearl            dation lists (RL1 & RL2) in our system. For our study, we
        et al.                                                                   define it as the following:
                                                                                                   Ín Ín
4.2     Hypotheses                                                                                   i=1    j=i (1 − sim(c i , c j ))
                                                                                           ILD =                                             (3)
Our system MovieTweeters generated two recommendation lists of                                             n ∗ (n − 1)/2
movies for the users; one before (RL1) and one after (RL2) the
participant’s interaction with their relevant social information. We             where c 1 , c 1 ...c n are items in a given list and n refers to
hypothesized that our system will help the users discover more                   the total number of items in the list. We used the Cosine
novel and diverse content. Following are our hypotheses:                         similarity measure to calculate the distance between the
                                                                                 items.
Hypothesis H1 : System Diversity has a correlation with the Per-               • Perceived Novelty: For our study, we defined Novelty as
ceived Diversity of the participants on the two recommendation list              movies which the user has never seen before. We measured
items.                                                                           Perceived Novelty with the following two processes:
Hypothesis H2 : System Diversity has a correlation with the Per-                 – Perceived Novelty of the Recommendation Lists: Par-
ceived Novelty of the participants on the two recommendation list                   ticipants selected novel items from both recommendation
items.                                                                              lists (RL1 and RL2).
                                                                                 – Perceived Novelty Questionnaire: We asked the par-
Hypothesis H3 : Perceived Novelty of the users increases between                    ticipants to answer a set of three questions which helped
the two lists.                                                                      us understand their level of perceived novelty across the
Hypothesis H4 : As the Perceived Novelty of participants increases,                 two lists:
their User Satisfaction increases as well.                                          ∗ Q 1: The movies recommended in Recommendation List 1
                                                                                       were interesting to me.
Hypothesis H5 : As the Perceived Diversity of users increases, their                ∗ Q 2: The movies recommended in Recommendation List 2
User Satisfaction increases as well.                                                   were interesting to me.
                                                                                    ∗ Q 3: The Twitter Users helped me obtain novel movie rec-
4.3     Materials                                                                      ommendations and improved the overall recommendation
We used two datasets to make movie recommendations in our                              process.
system.                                                                        • Perceived Diversity: We asked the participants to answer
                                                                                 four questions about their level of perceived diversity between
   4.3.1 MovieLens 1M Dataset. The first one, to generate the ini-               the two lists:
tial list of recommendations (RL1), we used the MovieLens 1M                     – Q 1: The list of movies in Recommendation List 2 vary from
dataset. Our pre-processing steps included adding relevant IMDb                     the list of movies in Recommendation List 1.
IDs to the movies, making sure all movies had their relevant genre               – Q 2: Most of the movies in Recommendation List 2 belong
information present, removal of irrelevant fields such as time-stamp                to similar genres as Recommendation List 1.
of the ratings. Our pre-processed MovieLens 1M dataset had 964712                – Q 3: The movies recommended to me in Recommendation
ratings from 6040 users for 2835 movies.                                            List 2 are diverse.
IntRS Workshop, October 2018, Vancouver, Canada                                                        Ishan Ghanmode and Nava Tintarev


      – Q 4: Selecting the relevant Twitter Users helped me obtain       5.1    Participants
        diverse movie recommendations and improved the overall           The experiment was held in a controlled setting with 23 partici-
        recommendation process.                                          pants. Each interview lasted 15-25 minutes. The participants were
   We also analyze the impact of perceived novelty and perceived         mainly Master students at a university with a varied educational
diversity on overall user satisfaction.                                  background. We had an equal gender distribution with 52.2% female
    • User Satisfaction: We define User Satisfaction not only in         and 47.8% male participants. Most of the participants were between
      terms of how satisfied they are with the quality of the recom-     the ages of 25-34 (60.9%).
      mendations but also their experience with the inspectability,
      control and overall interface aspects of our system. We asked
      the participants to answer a set of six questions which helped
      us understand their overall user satisfaction:
      – Q 1: The recommendation system provided me with good
         movie suggestions.
      – Q 2: The recommendation system helped me discover new
         movies.
      – Q 3: The movies recommended to me are diverse.
      – Q 4: The recommendation system made me more confident
         about my selection/decision.
      – Q 5: I am convinced I will like the movies recommended to                   Figure 4: Movie Consumption Behavior
         me.
      – Q 6: Overall, I am satisfied with the recommendation system         Figure 4 gives us an overview of their movie consumption pat-
         and the interface.                                              terns with most participants in the range of 1-6 movies consumed
   4.4.2 MMR Value. The value of λ is used to adjust the relevance       per month.
and the diversity scores to emphasize between relevance & diversity.
In our system, we made the design decision to have an equal balance      5.2    Offline Evaluation
between the relevance and the diversity aspects of the revised list      System Diversity. Using the offline evaluation metric defined in
of recommended movies. Hence, the value of λ was set to 0.5.             Section 4, we calculated the system diversity (Intra List Diversity)
                                                                         of both the generated recommendation lists (RL1 and RL2). Overall,
   4.4.3 Procedure and Tasks. As seen in Section 3, we divide our        we found a significant increase in the system diversity across the
system into three phases:                                                two recommendation lists (Figure 5).
    • Initial Recommendations Phase:
     (1) Participants were asked to provide basic demographic
         information (gender, age, movie consumption details) and
         to rate at least 20 movies out of 40 movies (initial seed).
    • Social Information Phase:
     (1) Participants were asked to select their relevant list of
         movies from the initial recommendation list (RL1) and
         select their most preferred Twitter users
    • Revised Recommendations Phase:
     (1) Participants were shown the revised list of recommenda-
         tions based on their selections (RL2).
   After completion of the main experiment, they had to answer
different post evaluation questionnaires which evaluated different
aspects of both the recommendation lists and also their user satisfac-
tion. Responses to the post evaluation questionnaires were collected
in the form of a Likert scale (1-5). We followed a within-subjects
experimental design where all the participants were exposed to
both their recommendation lists (RL1 and RL2) and were made to
compare the two lists.                                                   Figure 5: System Diversity Scores, before (RL1) and after
                                                                         (RL2) the interaction.
5   RESULTS
In this section, we discuss the results we found regarding the impact
of system diversity of the perceived quality of recommendations,         5.3    Online Evaluation
and also the relationship between the perceived quality of recom-           5.3.1 Hypothesis 1: System Diversity and Perceived Diversity. To
mendations and user satisfaction.                                        validate our Hypothesis H1 , we ran the Spearman’s Rank-Order
IntRS Workshop, October 2018, Vancouver, Canada                                                           Ishan Ghanmode and Nava Tintarev


Correlation test to compare the impact of the change in System            relevant social information. This rejects the null hypothesis and
Diversity between the recommendation lists on the Perceived Di-           our alternative hypothesis (H3 ) is accepted.
versity of the participants on the recommended items from both
the recommendation lists (RL1 and RL2). Spearman’s correlation               5.3.4 Hypothesis 4: Perceived Novelty and Satisfaction. To vali-
coefficient ρ measures the strength and direction of the correlation      date our Hypothesis H4 , we ran the Spearman’s Rank-Order Cor-
between two associated variables.                                         relation test to study the impact of Perceived Novelty of the partici-
   We observed that the ρ value is -0.54 and it is significant (p =       pants on their overall User Satisfaction.
0.007, p < 0.05), this demonstrates a strong negative correlation            We observed that the ρ value is 0.70 and it is significant (p =
between the System Diversity and the Perceived Diversity of the           0.00016, p < 0.05). The ρ value shows that there is a strong pos-
recommended items in both the recommendation lists. This rejects          itive correlation between the Perceived Novelty and overall User
the null hypothesis and our alternative hypothesis (H1 ) is accepted.     Satisfaction of the participants. This rejects the null hypothesis and
                                                                          our alternative hypothesis (H4 ) is accepted.
   5.3.2 Hypothesis 2: System Diversity and Perceived Novelty. To
validate our Hypothesis H2 , we ran the Spearman’s Rank-Order                5.3.5 Hypothesis 5: Perceived Diversity and Satisfaction. To vali-
Correlation test to compare the impact of the change in System            date our Hypothesis H5 , we ran the Spearman’s Rank-Order Cor-
Diversity between the recommendation lists on the Perceived Nov-          relation test to study the impact of Perceived Diversity of the par-
elty of the participants on the recommended items from both the           ticipants on their overall User Satisfaction. We observed that the ρ
recommendation lists (RL1 and RL2).                                       value is 0.58 and it is significant (p = 0.003, p < 0.05), demonstrat-
   We observed that the ρ value is -0.42 and it is significant (p =       ing a strong positive correlation between the Perceived Diversity
0.04, p < 0.05), demonstrating a strong negative correlation be-          and overall User Satisfaction of the participants. This rejects the null
tween the System Diversity and the Perceived Novelty of the recom-        hypothesis and our alternative hypothesis (H5 ) is accepted.
mended items in both the recommendation lists. This rejects the
null hypothesis and our alternative hypothesis (H2 ) is accepted.         6     DISCUSSION AND LIMITATIONS
                                                                          In this section we discuss our initial results from the experiment
    5.3.3 Hypothesis 3: Perceived Novelty Increases. To validate our
                                                                          and also the limitations. According to our research goals defined
Hypothesis H3 , we ran the Wilcoxon Signed Rank Test to compare
                                                                          in Section 3, we wanted to build an interactive interface that could
the difference (before and after) between the two recommendation
                                                                          assist users to discover more novel content. Our initial results (for
lists (RL1 (before) and RL2 (after)) in terms of novel items.
                                                                          Hypothesis H3 ) suggested that we were successful in this aspect
                                                                          and our interface indeed helped users discover more novel items.
                                                                             While analyzing the impact of system diversity (of the recommen-
                                                                          dations) on the perceived measures of quality (perceived diversity
                                                                          and perceived novelty), we found some surprising results (Hypothe-
                                                                          ses H1 and H2 ). Notably as system diversity increased across the
                                                                          two lists, the perceived diversity and perceived novelty of the users
                                                                          decreased. The decrease in the perceived diversity of the users could
                                                                          be attributed to the following factors. Our MMR formula (λ = 0.5
                                                                          balanced out the revised recommendation list in terms of relevance
                                                                          and diversity. This could impact the perception of the participants in
                                                                          a way where they were actually focused on checking the relevance
                                                                          aspect of the recommendations. In a post-hoc analysis, we studied
                                                                          the impact of popularity of the diversity perception of the users.
                                                                          Popularity for a recommendation list was calculated by taking the
                                                                          average of the top 3 IMDb movie ratings for that list. We observed
                                                                          that the popularity scores actually decreased across the lists. We
                                                                          compared this to the perceived diversity of the participants and
                                                                          found that as the popularity decreased across the lists, the perceived
                                                                          diversity decreased as well. We state that popularity also played a
                                                                          role in affecting the perceived diversity of the participants. Diversi-
Figure 6: Number of items marked as novel, before (RL1) and               fication (increase in the system diversity) led to less popular movies
after (RL2) the interaction.                                              which made the participants perceive them as less diverse.
                                                                             We found a positive correlation between users perceived novelty
   We observe that there is statistical significance in the Perceived     and perceived diversity on overall user satisfaction (Hypotheses H4
Novelty of the participants after interacting with the relevant Twit-     and H5 ). Users who perceived that they discovered more novel and
ter information (p = 0.0009, p < 0.01). Figure 6 shows the frequency      diverse items reported increased levels of satisfaction.
of novel movies for both the lists (RL1 & RL2). Analyzing all the
data statistics and the relevant information, we infer that the partic-   6.1    Limitations
ipants did indeed find more novel items after interaction with the        We identify four main limitations in our study:
IntRS Workshop, October 2018, Vancouver, Canada                                                                                   Ishan Ghanmode and Nava Tintarev


    • An experiment comparing different recommendation algo-                                [7] Mouzhi Ge, Carla Delgado-Battenfeld, and Dietmar Jannach. 2010. Beyond
      rithms could prove helpful to understand the impact these dif-                            accuracy: evaluating recommender systems by coverage and serendipity. In
                                                                                                Proceedings of the fourth ACM conference on Recommender systems. ACM, 257–
      ferent recommendation algorithms have on perceived quality                                260.
      of the recommendations and on overall user satisfaction.                              [8] Brynjar Gretarsson, John O’Donovan, Svetlin Bostandjiev, Christopher Hall,
                                                                                                and Tobias Höllerer. 2010. Smallworlds: visualizing social recommendations. In
    • Multiple experiments with different MMR (λ) values could                                  Computer Graphics Forum, Vol. 29. Wiley Online Library, 833–842.
      help us understand its impact on the revised recommenda-                              [9] F Maxwell Harper and Joseph A Konstan. 2016. The movielens datasets: History
      tion list and ultimately on the perception of the users and                               and context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4 (2016),
                                                                                                19.
      their overall user satisfaction.                                                     [10] Jonathan L Herlocker, Joseph A Konstan, Loren G Terveen, and John T Riedl.
    • The inclusion of other movies specific features (e.g., features                           2004. Evaluating collaborative filtering recommender systems. ACM Transactions
      such as box office revenues, year/era of release) could provide                           on Information Systems (TOIS) 22, 1 (2004), 5–53.
                                                                                           [11] Masayuki Ishikawa, Peter Geczy, Noriaki Izumi, and Takahira Yamaguchi. 2008.
      more insight into how two users are correlated which would                                Long tail recommender utilizing information diffusion theory. In Proceedings of
      also impact the final recommendation list.                                                the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelli-
                                                                                                gent Agent Technology-Volume 01. IEEE Computer Society, 785–788.
    • Recent research has studied the impact of personality on the                         [12] Henry Kautz, Bart Selman, and Mehul Shah. 1997. Referral Web: combining
      diversity needs of users [4, 17]. Our study did not consider                              social networks and collaborative filtering. Commun. ACM 40, 3 (1997), 63–65.
      the effects of individualistic traits such as personality.                           [13] Irwin King, Michael R Lyu, and Hao Ma. 2010. Introduction to social recom-
                                                                                                mendation. In Proceedings of the 19th international conference on World wide web.
   Overall, our interface proved that the incorporation of relevant                             ACM, 1355–1356.
                                                                                           [14] Bart P Knijnenburg, Svetlin Bostandjiev, John O’Donovan, and Alfred Kobsa.
social information with an interactive interface does indeed help                               2012. Inspectability and control in social recommenders. In Proceedings of the
users discover more novel items. It also provided valuable insight                              sixth ACM conference on Recommender systems. ACM, 43–50.
into the relationship between how users perceive the recommended                           [15] Bart P Knijnenburg, Martijn C Willemsen, Zeno Gantner, Hakan Soncu, and
                                                                                                Chris Newell. 2012. Explaining the user experience of recommender systems.
items and their overall satisfaction.                                                           User Modeling and User-Adapted Interaction 22, 4-5 (2012), 441–504.
                                                                                           [16] Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola,
                                                                                                and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine
7   CONCLUSION                                                                                  learning and data mining in the cloud. Proceedings of the VLDB Endowment 5, 8
                                                                                                (2012), 716–727.
In this paper, we introduced and evaluated a novel interface, Movi-                        [17] Feng Lu and Nava Tintarev. 2018. A Diversity Adjusting Strategy with Personality
eTweeters. It is a movie recommender system which combines social                               for Music Recommendation. In IntRS@ RecSys.
information with a traditional recommendation algorithm. This                              [18] Adam Marcus, Michael S Bernstein, Osama Badar, David R Karger, Samuel Mad-
                                                                                                den, and Robert C Miller. 2011. Twitinfo: aggregating and visualizing microblogs
allows us to generate recommendations that are both current (since                              for event exploration. In Proceedings of the SIGCHI conference on Human factors
the social information is constantly updating), and novel. We con-                              in computing systems. ACM, 227–236.
ducted offline and online evaluations to test our interface. We found                      [19] Paolo Massa and Paolo Avesani. 2007. Trust-aware recommender systems. In
                                                                                                Proceedings of the 2007 ACM conference on Recommender systems. ACM, 17–24.
that incorporation of social information with an interactive inter-                        [20] John O’Donovan, Barry Smyth, Brynjar Gretarsson, Svetlin Bostandjiev, and
face can indeed help users discover more novel content. Also, we                                Tobias Höllerer. 2008. PeerChooser: visual interactive recommendation. In Pro-
                                                                                                ceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM,
observed that users who perceived that they discovered more novel                               1085–1088.
and diverse items also reported increased levels of user satisfaction.                     [21] Yoon-Joo Park and Alexander Tuzhilin. 2008. The long tail of recommender
Even though we successfully were able to increase the system di-                                systems and how to leverage it. In Proceedings of the 2008 ACM conference on
                                                                                                Recommender systems. ACM, 11–18.
versity of the recommendations, it had a negative correlation with                         [22] Pearl Pu, Li Chen, and Rong Hu. 2011. A user-centric evaluation framework for
users perception of novelty and diversity of the items.                                         recommender systems. In Proceedings of the fifth ACM conference on Recommender
   In future work, inclusion of different recommendation algo-                                  systems. ACM, 157–164.
                                                                                           [23] Al Mamunur Rashid, Istvan Albert, Dan Cosley, Shyong K Lam, Sean M McNee,
rithms along with varying values of MMR will be studied. We                                     Joseph A Konstan, and John Riedl. 2002. Getting to know you: learning new
believe it could have a significant impact on how users perceive                                user preferences in recommender systems. In Proceedings of the 7th international
                                                                                                conference on Intelligent user interfaces. ACM, 127–134.
their recommendations, and also on their overall satisfaction.                             [24] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based
                                                                                                collaborative filtering recommendation algorithms. In Proceedings of the 10th
                                                                                                international conference on World Wide Web. ACM, 285–295.
REFERENCES                                                                                 [25] Barry Smyth and Paul McClave. 2001. Similarity vs. diversity. Case-Based
[1] Svetlin Bostandjiev, John O’Donovan, and Tobias Höllerer. 2012. TasteWeights: a             Reasoning Research and Development (2001), 347–361.
    visual interactive hybrid recommender system. In Proceedings of the sixth ACM          [26] Kirsten Swearingen and Rashmi Sinha. 2001. Beyond algorithms: An HCI per-
    conference on Recommender systems. ACM, 35–42.                                              spective on recommender systems. In ACM SIGIR 2001 Workshop on Recommender
[2] Keith Bradley and Barry Smyth. 2001. Improving recommendation diversity. In                 Systems, Vol. 13. 1–11.
    Proceedings of the Twelfth Irish Conference on Artificial Intelligence and Cognitive   [27] Kirsten Swearingen and Rashmi Sinha. 2002. Interaction design for recommender
    Science, Maynooth, Ireland. Citeseer, 85–94.                                                systems. In Designing Interactive Systems, Vol. 6. 312–334.
[3] Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based              [28] Jiliang Tang, Xia Hu, and Huan Liu. 2013. Social recommendation: a review.
    reranking for reordering documents and producing summaries. In Proceedings of               Social Network Analysis and Mining 3, 4 (2013), 1113–1133.
    the 21st annual international ACM SIGIR conference on Research and development         [29] Nava Tintarev and Judith Masthoff. 2015. Explaining recommendations: Design
    in information retrieval. ACM, 335–336.                                                     and evaluation. In Recommender Systems Handbook. Springer, 353–382.
[4] Li Chen, Wen Wu, and Liang He. 2013. How personality influences users’ needs           [30] Saúl Vargas and Pablo Castells. 2011. Rank and relevance in novelty and diversity
    for recommendation diversity?. In CHI’13 Extended Abstracts on Human Factors                metrics for recommender systems. In Proceedings of the fifth ACM conference on
    in Computing Systems. ACM, 829–834.                                                         Recommender systems. ACM, 109–116.
[5] Simon Dooms, Toon De Pessemier, and Luc Martens. 2013. Movietweetings: a               [31] Liang Zhang. 2013. The Definition of Novelty in Recommendation System.
    movie rating dataset collected from twitter. In Workshop on Crowdsourcing and               Journal of Engineering Science & Technology Review 6, 3 (2013).
    human computation for recommender systems, CrowdRec at RecSys, Vol. 2013. 43.          [32] Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúš Medo, Joseph Rushton Wakeling,
[6] Bruce Ferwerda, Mark P Graus, Andreu Vall, Marko Tkalcic, and Markus Schedl.                and Yi-Cheng Zhang. 2010. Solving the apparent diversity-accuracy dilemma of
    2017. How item discovery enabled by diversity leads to increased recommenda-                recommender systems. Proceedings of the National Academy of Sciences 107, 10
    tion list attractiveness. In Proceedings of the Symposium on Applied Computing.             (2010), 4511–4515.
    ACM, 1693–1696.