MovieTweeters An Interactive Interface to Improve Recommendation Novelty Ishan Ghanmode Nava Tintarev Delft University of Technology Delft University of Technology Delft, The Netherlands Delft, The Netherlands I.Ghanmode@student.tudelft.nl n.tintarev@tudelft.nl ABSTRACT into an interactive interface to improve recommendation novelty. This paper introduces and evaluates a novel interface, MovieTweet- Our contributions are as follows: ers. It is a movie recommendation system which incorporates so- • We introduce a novel interface, MovieTweeters, which incor- cial information with a traditional recommendation algorithm to porates social information into a traditional recommendation generate recommendations for users. Few previous studies have algorithm. This enables users to leverage their relevant social investigated the influence of using social information in interactive information and discover novel (and more recent) content. interfaces to improve the novelty of recommendations. To address • This paper evaluates the system in terms of its ability to this gap, we investigate whether social information can be incor- improve: a) system diversity; b) perceived novelty, c) perceived porated effectively into an interactive interface to improve recom- diversity. mendation novelty and user satisfaction. Our initial results suggest • We study the relationship between system and user measures. that such an interactive interface does indeed help users discover We also establish a positive impact of users perceived quality more novel items. Also, we observed users who perceived that they of recommendations on their overall satisfaction. discovered more novel and diverse items reported increased levels The remainder of this paper is organized as follows: First, we of user satisfaction. Surprisingly, we observed that even though we present a discussion of related work in Section 2. Next, we intro- successfully were able to increase the system diversity of the rec- duce the MovieTweeters system, including the design choices for ommendations, it had a negative correlation with users perception the interactive user interface in Section 3. We also discuss the un- of novelty and diversity of the items highlighting the importance of derlying algorithms used in the system. In Section 4, we describe improved user-centered approaches. an online user experiment (N=23) in which we evaluate the system. We present our results in Section 5. A brief discussion of the notable CCS CONCEPTS results are presented in Section 6 and finally we conclude in Section • Human-centered computing → User interface design; • In- 7 with ideas for future work. formation systems → Recommender systems; 2 RELATED WORK KEYWORDS To frame our research done in this paper in terms of related work, Social Recommendation Systems, Novelty, Diversity, User Satisfac- we discuss three key areas in detail. First, we discuss related work tion, Interactive User Interfaces in existing social recommendation systems. Second, we focus on the importance of inspectability and control on recommendation system interfaces. We also look into how these interfaces have an impact 1 INTRODUCTION on the users. Finally, we also present a discussion of related work Social networks such as Facebook1 and Twitter2 have emerged as in the area of beyond accuracy metrics such as diversity and novelty. some of the most popular social media platforms allowing users to communicate and express their opinions and feedback. This 2.1 Social Recommendation Systems online interaction between users generates social and preference One of the definitions for social recommendation is any given information which could be effectively harnessed in recommender recommendation with online social information as an additional systems. This social information has been found to improve algo- input i.e augmenting or improving the existing social recommen- rithmic performance [28]. However, accurate recommendations do dations with additional social information [13]. One of the earliest not always correspond to higher levels of user satisfaction [22, 27]. works which included social properties was done in [12], where In response, researchers have proposed ‘beyond accuracy’ metrics the researchers built ReferralWeb. It was an interactive system for such as diversity and novelty [10, 26], and worked on interfaces to searching relevant social networks on the World Wide Web. Social improve the quality of recommendations [27]. information can be in the form of social relations, friendships, social A limited, but growing, body of literature has studied the influ- influence and so on. In this definition, the social recommendation ence of using social information in interactive interfaces to improve systems assume that the users are related when they establish social the novelty of recommendations. To address this gap, we investi- relations. Under this assumption, the social information or social gate whether social information can be incorporated effectively relations are used to improve the performance of the recommenda- tions [19]. We base our study around the first definition, where we 1 https://www.facebook.com, accessed July 2018 use existing social information as an additional input to improve 2 https://www.twitter.com, accessed July 2018 the quality of recommendations. IntRS Workshop, October 2018, Vancouver, Canada Ishan Ghanmode and Nava Tintarev 2.2 Inspectability and Control in 2.3 Beyond Accuracy Recommendation Systems Researchers have stated that accuracy is not always the only crite- Factors of Inspectability and Control have played an emerging role ria which fulfill user satisfaction [10]. Different beyond accuracy in areas of intelligent systems and in recent years, micro blogs. metrics have been defined to evaluate recommender systems [7]. Authors in their work [26] show how factors apart from only accu- 2.2.1 Inspectability. Inspectability across recommendation sys- racy can make users more satisfied. Users may also be interested in tems literature is defined as the process of exposing users to the rea- discovering novel products or in exploring more diverse items. In soning and data process behind a recommendation. Inspectability of this sub-section, two main beyond accuracy criteria are discussed: the interface also increases the user’s trust in the recommendation Novelty and Diversity. Both of these criteria play a critical role for system. Authors in [1] designed a hybrid recommendation system evaluation in this study. which allowed users to understand and control different aspects Novelty. This criterion has been defined as “new-original and of the recommendation process instilling factors of inspectability of a kind which has not been seen before” [31]. More researchers and control. Authors in [14] worked on a modified version of the are inclined in the direction that novelty is one of the fundamental system built in [1] where they assumed the notion of inspectability qualities which can be used to measure a recommendation’s effec- be similar to the concept of transparency stated by authors in [29]. tiveness. Novel recommendations are item recommendations that Their work concluded that social recommender systems and recom- the user was unaware about. Good novelty metrics would usually mender systems in general do indeed can benefit from facilities that measure how well a recommendation system was able to make a improve the inspectability. Inspectability and control over recom- user aware of their previously unknown items [10]. mendation algorithms also provide an efficient way of dealing with Diversity. This is a concept that has been well studied in the in- vast amount of social content. In other previous work, researchers formation retrieval literature. It is generally defined as the opposite developed a system called TwitInfo for visualizing and summariz- of similarity. One of the most explored methods for diversity is the ing important events on Twitter [18]. Furthermore, incorporating item-item similarity mostly based on the item content [25]. Authors explanations and dynamic feedback with recommendation system in [30], state a framework for novelty and diversity on the basis interfaces have shown to positively impact user perception levels of three concepts namely: choice, diversity and relevance. In [6], of the recommendation process. While designing our interface, in- researchers measure perceived diversity and overall attractiveness spectability formed a crucial element of the interface allowing users of the recommendation list. to browse through the vast amount of relevant social information and understand how items were recommended to them. 3 MOVIETWEETERS SYSTEM 2.2.2 Control. Control can be defined as the process of allowing In this section we look in detail the underlying design of our sys- users to interact with different recommendation system options tem, MovieTweeters. We define the following two research goals for to tweak recommendations. Researchers have implemented differ- our study: RG1: Incorporate social information within an existing ent methods of control in their systems which range from rating traditional recommendation system and recommend new and diverse items to assigning weights to item attributes. In [8], researchers items to users, RG2: Study the relationship between beyond accuracy developed SmallWorlds, a live Facebook application which had an metrics (novelty, diversity) and user satisfaction. interactive interface. This was used to control item predictions based upon the underlying data from the Facebook API. Authors in [20] developed a collaborative recommendation system with an interactive interface allowing users to manipulate and tune differ- ent options on the interface to generate relevant recommendations. Authors in [1, 14] allowed users to dynamically change and update their preferences during a recommendation process. In our study, we base our interface design to include control to allow users to dynamically modify different system controls (c.f., Section 3). 2.2.3 Impact of Interfaces on Users of Recommender Systems. People’s opinions about the items recommended to them and their usability is also directly influenced by the interface of the recom- mendation system in use. Researchers studied different user inter- Figure 1: Overview of research goals, studying the effect of actions with recommender systems and concluded that to design offline measures on user perceptions. an effective interface, one must consider the following two points: one, what specific user needs are satisfied with the interaction and two, what specific systems features lead to satisfaction of those In order to study our research goals, we define the following needs [27]. A more user-centric approach towards evaluating rec- research steps in Figure 1. Based on the user-centric framework ommendation systems has been suggested in the ResQue model by defined by Knijnenburg et al. [15], we define system diversity as Pearl et al. in [22] which aims to assess the perceived qualities of an objective system aspect, the perceived user qualities (perceived recommenders such as their usability, usefulness, user’s satisfaction novelty and perceived diversity) as subjective system aspects and and so on. overall user satisfaction as an user experience aspect for our system. IntRS Workshop, October 2018, Vancouver, Canada Ishan Ghanmode and Nava Tintarev Figure 2: MovieTweeters: A. Initial Recommendations Phase, B. Social Information Phase, C. Revised Recommendations Phase We first analyze how system diversity influences user perception To on-board new users in the system and to generate the initial (perceived diversity and perceived novelty) of the recommended set of movie recommendations, we decided to use the MovieLens items. Then, we analyze the impact of these perceived qualities on 1M dataset [9] due to its popularity and being a stable benchmark the overall user satisfaction. dataset. First step, was to select the initial seed items. One common We designed our system, MovieTweeters, a web-based movie selection strategy is to use the popularity measure while determin- recommendation interface, to help us understand the impact of ing the seed items. In this, the items are ranked in a decreasing social information on the perceived quality of recommendations order of the number of ratings. TheMovielens 1M dataset suffers and to study the relationships between perceived quality of the from a long tail distribution problem [21]. This essentially means recommended items and the overall user satisfaction. Figure 2 shows there are some movies in the dataset which have been very fre- an overview of the system. MovieTweeters consists of three main quently rated (the popular movies). Some methods to deal with phases namely: Initial Recommendations Phase, Social Information this phenomenon was explained using diffusion theory [11] and Phase and Revised Recommendations Phase. All three phases were graph-based approaches [32]. visible to the user when conducting the experiment. Next, we look Using the popularity strategy to select seed items would end up into these three phases in more detail. selecting items which are most popular (highly rated) in the dataset and would ignore the unpopular or new ones. This could create a 3.1 Initial Recommendations Phase bias where only popular (highly rated) movies are recommended. In The initial recommendations phase was involved with the main order to avoid this bias, we based our approach to select items based task of on-boarding new users into our system. We achieved this on 2 criteria: popularity (number of times rated) and ratings of the by first understanding the movie preferences of the new users and movies (as defined by the authors in [23]). We calculated a seed generating an initial set of movie recommendations for them. score for each movie in the dataset using the following approach: 3.1.1 On-boarding New Users. Recommender systems help sug- seedscore = log(popularity) × σ 2 (ratinдs) (1) gest items to users they may like based upon the knowledge about To calculate the seed score (Equation 1), we first took the logarithm the user and the space of available items. However, when new users (base 10) of the popularity. In the second half of the formula, we first enter the system, the system has no information about them. consider the variance of the ratings. This gives us a measure of how The process of including new users into the system is known as diverse the ratings have been for a particular movie. on-boarding. One of the most popular and direct ways to achieve this is to ask the new users to rate an initial set of items, also known 3.1.2 Recommendation Algorithm. In our system, we used Item- as seed items. Item Collaborative Filtering algorithm [24] to generate the initial IntRS Workshop, October 2018, Vancouver, Canada Ishan Ghanmode and Nava Tintarev set of movie recommendations for the new users. As the main focus of this study is the role and impact of recommendation interfaces, we decided to use a Collaborative Filtering algorithm to make the initial set of movie recommendations because of their popularity and simplicity of use and implementation. We used the GraphLab toolkit [16] to implement it in our system. Figure 3: Selecting and ordering of retrieved relevant movies 3.1.3 Process Flow. After on-boarding new users into our sys- tem, a set of 10 initial movie recommendations were generated for them (as shown in Initial Recommendations Phase in Figure 2). We refer to these first list of top-10 recommendations as Recommenda- 3.3.1 Maximal Marginal Relevance. It is one of the diversity- tion List 1 (RL1). based re-ranking methods used for reordering ranked list of docu- ments [3]. Following its success in the field of text retrieval and sum- 3.2 Social Information Phase marization, we tweaked the Maximal Marginal Relevance method and apply it to our relevant movie list. It was calculated using a The social information phase was mainly responsible for incorpo- weighted linear combination of relevance and diversity [3]: rating a relevant social information dataset into our system. 3.2.1 Social Information Dataset. We used MovieTweetings [5] as MMR ≜ max [λ(Rel(D i , Q))−(1−λ)( max (1−Sim(D i , D j ))] (2) Di ∈R/S D j ∈S our social information source. MovieTweetings comprises of IMDb ratings expressed by Twitter users who have connected their IMDb In Equation 2, R refers to the ranked list of movies in the relevant accounts to their Twitter account. movie list. R/S is the set difference, i.e, the set of unselected movies 3.2.2 Process Flow. The next task for the system user was to from R. S refers to the movies which are already retrieved. The first select the most relevant movies out of the initial recommendation half of the equation, λ(Rel(D i , Q)) addresses the relevance aspect list (RL1). Relevant movies here is defined as the movies which the of the equation and is calculated by comparing how similar is D i is system user has already watched (consumed) or the movies which to Q. D i refers to a movie in the relevant movie list. Q for a given seem the most interesting to him/her. After the relevant movies system user is calculated by considering the movies (from the Initial were selected by the system user, the next step was to retrieve Recommendations Phase) which were selected by him/her. This the relevant Twitter users from our pre-processed MovieTweetings similarity was calculated by comparing the genres of the list of dataset and display them. This was a two step process: First, all movies in Q with D i using cosine similarity. This gave a relevance distinct Twitter users who rated at least one of the selected relevant score which constituted the first half of the equation. movies were first retrieved. Second step was calculating how similar The second half of the equation (1−λ)(maxD j ∈S (1−Sim(D i , D j )) these retrieved Twitter users were to the system user. In order to addresses the diversity aspect of the MMR equation. S refers to the perform this task, we first retrieved all movies rated by each Twitter movies which are already selected in the re-ranked MMR List. Here, user individually. For the system user, we considered all the movies cosine similarity was used to calculate how similar D i was with D j he/she rated from the initial seed items. We ran the cosine similarity based on their genre details. This constituted a score for diversity to measure the relevant similarity between the genre distribution which formed the second half of the equation. These two scores of movies consumed by the system user and each of the Twitter are combined together to form the final MMR score. User. This produced a similarity score which denoted how similar Based on the MMR scores, a revised list of movie recommenda- was a given Twitter user to the system user. tions, both relevant to the system user and diverse according to We displayed the Twitter Users in the decreasing order of their their tastes, was generated and presented to them (as shown in cosine similarity scores (as shown in Social Information Phase in Revised Recommendations Phase in Figure 2). The size of the MMR Figure 2). A slider was also included in the system which allowed list was set to be 10 (top-10 recommendations). We refer to this the system users to segregate Twitter users based on their similarity revised list of top-10 recommendations as Recommendation List 2 score. The system users had to select at least one preferred Twitter (RL2). user. There was no limit on the maximum number of users selected. 4 EXPERIMENT 3.3 Revised Recommendations Phase We evaluated our system MovieTweeters using both offline and on- The revised recommendations phase was responsible for generating line metrics. We seek to answer the following: First, the impact of a revised list of movie recommendations. After the most preferred social information (with an interactive interface) on the quality of Twitter users were selected by the system user, the next step was recommendations. Second, the relationship between the quality of to retrieve a list of all the movies rated by these Twitter users from the recommendations and user satisfaction. The offline evaluation our pre-processed MovieTweetings dataset. We calculated the local metrics which we define later in this section, help us analyze how popularity score of each movie in the list of retrieved movies. Local an additional input of social information with a traditional recom- popularity is basically the number of occurrences of the movie in mendation system affects system diversity. The online evaluations the list. We then ordered this list of movies in a descending order help us analyze the user perceptions (perceived novelty & perceived based on their local popularity scores. We refer to this movie list as diversity) of the recommendations and understand their satisfaction the relevant movie list. Figure 3 shows a description of this process. levels. IntRS Workshop, October 2018, Vancouver, Canada Ishan Ghanmode and Nava Tintarev 4.1 Variable Description 4.3.2 MovieTweetings Dataset. We used MovieTweetings as our We describe all the dependent and independent variables which are social information source. We used this dataset to make our revised evaluated during the course of the experiment and also form a part set of movie recommendations (RL2). Our pre-processing steps of our hypotheses. There is one main independent variable in our included removal or irrelevant fields such as time-stamp of the rat- experiment: ings, retrieving the Twitter IDs of the users in the dataset, removal of movies with no relevant genre information present. Our pre- • System Diversity: In the context of this study, we define processed MovieTweetings dataset had 606767 ratings from 45871 System Diversity for the recommended items (movies) in users for 27093 movies. terms of how different they are in terms of genre. Based on the independent variable defined above, we now define 4.4 Experimental design our two dependent variables whose effects will be evaluated and Keeping in mind the hypotheses stating the impact of system di- tested during the course of the experiment. versity on user perception and the relationship between user per- • Perceived Novelty: The extent to which users receive “new” ception of the recommendations with their satisfaction (defined in movie recommendations. Here, we evaluate whether users Section 4.2), we study and analyze the impact of different system are able to come across movies which they have not seen variables on the perceived quality attributes. before. It is derived from the ResQue framework developed by Pearl et al. in [22] which accesses the quality of the rec- 4.4.1 Evaluation Metrics. We study the impact of system diver- ommended items. sity on perceived novelty and perceived diversity. • Perceived Diversity: The extent to which users felt that • System Diversity: We used Intra List Diversity [2] to cal- the recommended items were diverse to them. It is defined culate the system diversity of the two generated recommen- from the “Perceived System Qualities” stated in [22] by Pearl dation lists (RL1 & RL2) in our system. For our study, we et al. define it as the following: Ín Ín 4.2 Hypotheses i=1 j=i (1 − sim(c i , c j )) ILD = (3) Our system MovieTweeters generated two recommendation lists of n ∗ (n − 1)/2 movies for the users; one before (RL1) and one after (RL2) the participant’s interaction with their relevant social information. We where c 1 , c 1 ...c n are items in a given list and n refers to hypothesized that our system will help the users discover more the total number of items in the list. We used the Cosine novel and diverse content. Following are our hypotheses: similarity measure to calculate the distance between the items. Hypothesis H1 : System Diversity has a correlation with the Per- • Perceived Novelty: For our study, we defined Novelty as ceived Diversity of the participants on the two recommendation list movies which the user has never seen before. We measured items. Perceived Novelty with the following two processes: Hypothesis H2 : System Diversity has a correlation with the Per- – Perceived Novelty of the Recommendation Lists: Par- ceived Novelty of the participants on the two recommendation list ticipants selected novel items from both recommendation items. lists (RL1 and RL2). – Perceived Novelty Questionnaire: We asked the par- Hypothesis H3 : Perceived Novelty of the users increases between ticipants to answer a set of three questions which helped the two lists. us understand their level of perceived novelty across the Hypothesis H4 : As the Perceived Novelty of participants increases, two lists: their User Satisfaction increases as well. ∗ Q 1: The movies recommended in Recommendation List 1 were interesting to me. Hypothesis H5 : As the Perceived Diversity of users increases, their ∗ Q 2: The movies recommended in Recommendation List 2 User Satisfaction increases as well. were interesting to me. ∗ Q 3: The Twitter Users helped me obtain novel movie rec- 4.3 Materials ommendations and improved the overall recommendation We used two datasets to make movie recommendations in our process. system. • Perceived Diversity: We asked the participants to answer four questions about their level of perceived diversity between 4.3.1 MovieLens 1M Dataset. The first one, to generate the ini- the two lists: tial list of recommendations (RL1), we used the MovieLens 1M – Q 1: The list of movies in Recommendation List 2 vary from dataset. Our pre-processing steps included adding relevant IMDb the list of movies in Recommendation List 1. IDs to the movies, making sure all movies had their relevant genre – Q 2: Most of the movies in Recommendation List 2 belong information present, removal of irrelevant fields such as time-stamp to similar genres as Recommendation List 1. of the ratings. Our pre-processed MovieLens 1M dataset had 964712 – Q 3: The movies recommended to me in Recommendation ratings from 6040 users for 2835 movies. List 2 are diverse. IntRS Workshop, October 2018, Vancouver, Canada Ishan Ghanmode and Nava Tintarev – Q 4: Selecting the relevant Twitter Users helped me obtain 5.1 Participants diverse movie recommendations and improved the overall The experiment was held in a controlled setting with 23 partici- recommendation process. pants. Each interview lasted 15-25 minutes. The participants were We also analyze the impact of perceived novelty and perceived mainly Master students at a university with a varied educational diversity on overall user satisfaction. background. We had an equal gender distribution with 52.2% female • User Satisfaction: We define User Satisfaction not only in and 47.8% male participants. Most of the participants were between terms of how satisfied they are with the quality of the recom- the ages of 25-34 (60.9%). mendations but also their experience with the inspectability, control and overall interface aspects of our system. We asked the participants to answer a set of six questions which helped us understand their overall user satisfaction: – Q 1: The recommendation system provided me with good movie suggestions. – Q 2: The recommendation system helped me discover new movies. – Q 3: The movies recommended to me are diverse. – Q 4: The recommendation system made me more confident about my selection/decision. – Q 5: I am convinced I will like the movies recommended to Figure 4: Movie Consumption Behavior me. – Q 6: Overall, I am satisfied with the recommendation system Figure 4 gives us an overview of their movie consumption pat- and the interface. terns with most participants in the range of 1-6 movies consumed 4.4.2 MMR Value. The value of λ is used to adjust the relevance per month. and the diversity scores to emphasize between relevance & diversity. In our system, we made the design decision to have an equal balance 5.2 Offline Evaluation between the relevance and the diversity aspects of the revised list System Diversity. Using the offline evaluation metric defined in of recommended movies. Hence, the value of λ was set to 0.5. Section 4, we calculated the system diversity (Intra List Diversity) of both the generated recommendation lists (RL1 and RL2). Overall, 4.4.3 Procedure and Tasks. As seen in Section 3, we divide our we found a significant increase in the system diversity across the system into three phases: two recommendation lists (Figure 5). • Initial Recommendations Phase: (1) Participants were asked to provide basic demographic information (gender, age, movie consumption details) and to rate at least 20 movies out of 40 movies (initial seed). • Social Information Phase: (1) Participants were asked to select their relevant list of movies from the initial recommendation list (RL1) and select their most preferred Twitter users • Revised Recommendations Phase: (1) Participants were shown the revised list of recommenda- tions based on their selections (RL2). After completion of the main experiment, they had to answer different post evaluation questionnaires which evaluated different aspects of both the recommendation lists and also their user satisfac- tion. Responses to the post evaluation questionnaires were collected in the form of a Likert scale (1-5). We followed a within-subjects experimental design where all the participants were exposed to both their recommendation lists (RL1 and RL2) and were made to compare the two lists. Figure 5: System Diversity Scores, before (RL1) and after (RL2) the interaction. 5 RESULTS In this section, we discuss the results we found regarding the impact of system diversity of the perceived quality of recommendations, 5.3 Online Evaluation and also the relationship between the perceived quality of recom- 5.3.1 Hypothesis 1: System Diversity and Perceived Diversity. To mendations and user satisfaction. validate our Hypothesis H1 , we ran the Spearman’s Rank-Order IntRS Workshop, October 2018, Vancouver, Canada Ishan Ghanmode and Nava Tintarev Correlation test to compare the impact of the change in System relevant social information. This rejects the null hypothesis and Diversity between the recommendation lists on the Perceived Di- our alternative hypothesis (H3 ) is accepted. versity of the participants on the recommended items from both the recommendation lists (RL1 and RL2). Spearman’s correlation 5.3.4 Hypothesis 4: Perceived Novelty and Satisfaction. To vali- coefficient ρ measures the strength and direction of the correlation date our Hypothesis H4 , we ran the Spearman’s Rank-Order Cor- between two associated variables. relation test to study the impact of Perceived Novelty of the partici- We observed that the ρ value is -0.54 and it is significant (p = pants on their overall User Satisfaction. 0.007, p < 0.05), this demonstrates a strong negative correlation We observed that the ρ value is 0.70 and it is significant (p = between the System Diversity and the Perceived Diversity of the 0.00016, p < 0.05). The ρ value shows that there is a strong pos- recommended items in both the recommendation lists. This rejects itive correlation between the Perceived Novelty and overall User the null hypothesis and our alternative hypothesis (H1 ) is accepted. Satisfaction of the participants. This rejects the null hypothesis and our alternative hypothesis (H4 ) is accepted. 5.3.2 Hypothesis 2: System Diversity and Perceived Novelty. To validate our Hypothesis H2 , we ran the Spearman’s Rank-Order 5.3.5 Hypothesis 5: Perceived Diversity and Satisfaction. To vali- Correlation test to compare the impact of the change in System date our Hypothesis H5 , we ran the Spearman’s Rank-Order Cor- Diversity between the recommendation lists on the Perceived Nov- relation test to study the impact of Perceived Diversity of the par- elty of the participants on the recommended items from both the ticipants on their overall User Satisfaction. We observed that the ρ recommendation lists (RL1 and RL2). value is 0.58 and it is significant (p = 0.003, p < 0.05), demonstrat- We observed that the ρ value is -0.42 and it is significant (p = ing a strong positive correlation between the Perceived Diversity 0.04, p < 0.05), demonstrating a strong negative correlation be- and overall User Satisfaction of the participants. This rejects the null tween the System Diversity and the Perceived Novelty of the recom- hypothesis and our alternative hypothesis (H5 ) is accepted. mended items in both the recommendation lists. This rejects the null hypothesis and our alternative hypothesis (H2 ) is accepted. 6 DISCUSSION AND LIMITATIONS In this section we discuss our initial results from the experiment 5.3.3 Hypothesis 3: Perceived Novelty Increases. To validate our and also the limitations. According to our research goals defined Hypothesis H3 , we ran the Wilcoxon Signed Rank Test to compare in Section 3, we wanted to build an interactive interface that could the difference (before and after) between the two recommendation assist users to discover more novel content. Our initial results (for lists (RL1 (before) and RL2 (after)) in terms of novel items. Hypothesis H3 ) suggested that we were successful in this aspect and our interface indeed helped users discover more novel items. While analyzing the impact of system diversity (of the recommen- dations) on the perceived measures of quality (perceived diversity and perceived novelty), we found some surprising results (Hypothe- ses H1 and H2 ). Notably as system diversity increased across the two lists, the perceived diversity and perceived novelty of the users decreased. The decrease in the perceived diversity of the users could be attributed to the following factors. Our MMR formula (λ = 0.5 balanced out the revised recommendation list in terms of relevance and diversity. This could impact the perception of the participants in a way where they were actually focused on checking the relevance aspect of the recommendations. In a post-hoc analysis, we studied the impact of popularity of the diversity perception of the users. Popularity for a recommendation list was calculated by taking the average of the top 3 IMDb movie ratings for that list. We observed that the popularity scores actually decreased across the lists. We compared this to the perceived diversity of the participants and found that as the popularity decreased across the lists, the perceived diversity decreased as well. We state that popularity also played a role in affecting the perceived diversity of the participants. Diversi- Figure 6: Number of items marked as novel, before (RL1) and fication (increase in the system diversity) led to less popular movies after (RL2) the interaction. which made the participants perceive them as less diverse. We found a positive correlation between users perceived novelty We observe that there is statistical significance in the Perceived and perceived diversity on overall user satisfaction (Hypotheses H4 Novelty of the participants after interacting with the relevant Twit- and H5 ). Users who perceived that they discovered more novel and ter information (p = 0.0009, p < 0.01). Figure 6 shows the frequency diverse items reported increased levels of satisfaction. of novel movies for both the lists (RL1 & RL2). Analyzing all the data statistics and the relevant information, we infer that the partic- 6.1 Limitations ipants did indeed find more novel items after interaction with the We identify four main limitations in our study: IntRS Workshop, October 2018, Vancouver, Canada Ishan Ghanmode and Nava Tintarev • An experiment comparing different recommendation algo- [7] Mouzhi Ge, Carla Delgado-Battenfeld, and Dietmar Jannach. 2010. Beyond rithms could prove helpful to understand the impact these dif- accuracy: evaluating recommender systems by coverage and serendipity. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 257– ferent recommendation algorithms have on perceived quality 260. of the recommendations and on overall user satisfaction. [8] Brynjar Gretarsson, John O’Donovan, Svetlin Bostandjiev, Christopher Hall, and Tobias Höllerer. 2010. Smallworlds: visualizing social recommendations. In • Multiple experiments with different MMR (λ) values could Computer Graphics Forum, Vol. 29. Wiley Online Library, 833–842. help us understand its impact on the revised recommenda- [9] F Maxwell Harper and Joseph A Konstan. 2016. The movielens datasets: History tion list and ultimately on the perception of the users and and context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4 (2016), 19. their overall user satisfaction. [10] Jonathan L Herlocker, Joseph A Konstan, Loren G Terveen, and John T Riedl. • The inclusion of other movies specific features (e.g., features 2004. Evaluating collaborative filtering recommender systems. ACM Transactions such as box office revenues, year/era of release) could provide on Information Systems (TOIS) 22, 1 (2004), 5–53. [11] Masayuki Ishikawa, Peter Geczy, Noriaki Izumi, and Takahira Yamaguchi. 2008. more insight into how two users are correlated which would Long tail recommender utilizing information diffusion theory. In Proceedings of also impact the final recommendation list. the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelli- gent Agent Technology-Volume 01. IEEE Computer Society, 785–788. • Recent research has studied the impact of personality on the [12] Henry Kautz, Bart Selman, and Mehul Shah. 1997. Referral Web: combining diversity needs of users [4, 17]. Our study did not consider social networks and collaborative filtering. Commun. ACM 40, 3 (1997), 63–65. the effects of individualistic traits such as personality. [13] Irwin King, Michael R Lyu, and Hao Ma. 2010. Introduction to social recom- mendation. In Proceedings of the 19th international conference on World wide web. Overall, our interface proved that the incorporation of relevant ACM, 1355–1356. [14] Bart P Knijnenburg, Svetlin Bostandjiev, John O’Donovan, and Alfred Kobsa. social information with an interactive interface does indeed help 2012. Inspectability and control in social recommenders. In Proceedings of the users discover more novel items. It also provided valuable insight sixth ACM conference on Recommender systems. ACM, 43–50. into the relationship between how users perceive the recommended [15] Bart P Knijnenburg, Martijn C Willemsen, Zeno Gantner, Hakan Soncu, and Chris Newell. 2012. Explaining the user experience of recommender systems. items and their overall satisfaction. User Modeling and User-Adapted Interaction 22, 4-5 (2012), 441–504. [16] Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine 7 CONCLUSION learning and data mining in the cloud. Proceedings of the VLDB Endowment 5, 8 (2012), 716–727. In this paper, we introduced and evaluated a novel interface, Movi- [17] Feng Lu and Nava Tintarev. 2018. A Diversity Adjusting Strategy with Personality eTweeters. It is a movie recommender system which combines social for Music Recommendation. In IntRS@ RecSys. information with a traditional recommendation algorithm. This [18] Adam Marcus, Michael S Bernstein, Osama Badar, David R Karger, Samuel Mad- den, and Robert C Miller. 2011. Twitinfo: aggregating and visualizing microblogs allows us to generate recommendations that are both current (since for event exploration. In Proceedings of the SIGCHI conference on Human factors the social information is constantly updating), and novel. We con- in computing systems. ACM, 227–236. ducted offline and online evaluations to test our interface. We found [19] Paolo Massa and Paolo Avesani. 2007. Trust-aware recommender systems. In Proceedings of the 2007 ACM conference on Recommender systems. ACM, 17–24. that incorporation of social information with an interactive inter- [20] John O’Donovan, Barry Smyth, Brynjar Gretarsson, Svetlin Bostandjiev, and face can indeed help users discover more novel content. Also, we Tobias Höllerer. 2008. PeerChooser: visual interactive recommendation. In Pro- ceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, observed that users who perceived that they discovered more novel 1085–1088. and diverse items also reported increased levels of user satisfaction. [21] Yoon-Joo Park and Alexander Tuzhilin. 2008. The long tail of recommender Even though we successfully were able to increase the system di- systems and how to leverage it. In Proceedings of the 2008 ACM conference on Recommender systems. ACM, 11–18. versity of the recommendations, it had a negative correlation with [22] Pearl Pu, Li Chen, and Rong Hu. 2011. A user-centric evaluation framework for users perception of novelty and diversity of the items. recommender systems. In Proceedings of the fifth ACM conference on Recommender In future work, inclusion of different recommendation algo- systems. ACM, 157–164. [23] Al Mamunur Rashid, Istvan Albert, Dan Cosley, Shyong K Lam, Sean M McNee, rithms along with varying values of MMR will be studied. We Joseph A Konstan, and John Riedl. 2002. Getting to know you: learning new believe it could have a significant impact on how users perceive user preferences in recommender systems. In Proceedings of the 7th international conference on Intelligent user interfaces. ACM, 127–134. their recommendations, and also on their overall satisfaction. [24] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web. ACM, 285–295. REFERENCES [25] Barry Smyth and Paul McClave. 2001. Similarity vs. diversity. Case-Based [1] Svetlin Bostandjiev, John O’Donovan, and Tobias Höllerer. 2012. TasteWeights: a Reasoning Research and Development (2001), 347–361. visual interactive hybrid recommender system. In Proceedings of the sixth ACM [26] Kirsten Swearingen and Rashmi Sinha. 2001. Beyond algorithms: An HCI per- conference on Recommender systems. ACM, 35–42. spective on recommender systems. In ACM SIGIR 2001 Workshop on Recommender [2] Keith Bradley and Barry Smyth. 2001. Improving recommendation diversity. In Systems, Vol. 13. 1–11. Proceedings of the Twelfth Irish Conference on Artificial Intelligence and Cognitive [27] Kirsten Swearingen and Rashmi Sinha. 2002. Interaction design for recommender Science, Maynooth, Ireland. Citeseer, 85–94. systems. In Designing Interactive Systems, Vol. 6. 312–334. [3] Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based [28] Jiliang Tang, Xia Hu, and Huan Liu. 2013. Social recommendation: a review. reranking for reordering documents and producing summaries. In Proceedings of Social Network Analysis and Mining 3, 4 (2013), 1113–1133. the 21st annual international ACM SIGIR conference on Research and development [29] Nava Tintarev and Judith Masthoff. 2015. Explaining recommendations: Design in information retrieval. ACM, 335–336. and evaluation. In Recommender Systems Handbook. Springer, 353–382. [4] Li Chen, Wen Wu, and Liang He. 2013. How personality influences users’ needs [30] Saúl Vargas and Pablo Castells. 2011. Rank and relevance in novelty and diversity for recommendation diversity?. In CHI’13 Extended Abstracts on Human Factors metrics for recommender systems. In Proceedings of the fifth ACM conference on in Computing Systems. ACM, 829–834. Recommender systems. ACM, 109–116. [5] Simon Dooms, Toon De Pessemier, and Luc Martens. 2013. Movietweetings: a [31] Liang Zhang. 2013. The Definition of Novelty in Recommendation System. movie rating dataset collected from twitter. In Workshop on Crowdsourcing and Journal of Engineering Science & Technology Review 6, 3 (2013). human computation for recommender systems, CrowdRec at RecSys, Vol. 2013. 43. [32] Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúš Medo, Joseph Rushton Wakeling, [6] Bruce Ferwerda, Mark P Graus, Andreu Vall, Marko Tkalcic, and Markus Schedl. and Yi-Cheng Zhang. 2010. Solving the apparent diversity-accuracy dilemma of 2017. How item discovery enabled by diversity leads to increased recommenda- recommender systems. Proceedings of the National Academy of Sciences 107, 10 tion list attractiveness. In Proceedings of the Symposium on Applied Computing. (2010), 4511–4515. ACM, 1693–1696.