Using Visualizations to Encourage Blind-Spot Exploration Jayachithra Kumar Nava Tintarev Delft University of Technology Delft University of Technology j.kumar-1@student.tudelft.nl n.tintarev@tudelft.nl ABSTRACT different techniques for blind-spot visualization and its effect on In this paper, we help users to better understand their consumption users’ exploration of the recommendation space. profiles by exposing them to their unexplored regions, thereby In the next section, we describe related work. This is followed indirectly nudging them to diverse exploration. We refer to these by a description of the method used to generate the visualizations regions as a user’s blind-spots, and we visualize these by enabling used in this study (Section 3). Next in Section 4, we describe a comparisons between a user’s consumption pattern with that of lab study with 23 participants which investigates the relationship other users of the system. We compare the effectiveness of two between understanding the visualizations and exploring blind-spot visualizations – a bar-line chart and a scatterplot — for increasing regions in a user’s profile. Section 5 outlines the main results. This is a user’s intention to explore new content. The results suggest that followed by discussion of qualitative findings, and post-hoc analysis users can understand both visualizations. Furthermore, our results for surprising results in Section 6. We conclude with suggestions confirmed that users with higher understanding of their profile for future work in Section 7 tend to explore their blind-spot categories more. This experiment is a first step towards increasing user’s awareness of their choices as well as providing the kind of user control that encourages users 2 RELATED WORK to explore new types of items. This work sits at the intersection between two important recom- mender systems themes: 1) the use of visualization to aid trans- CCS CONCEPTS parency and explanation, and 2) techniques for dealing with filter • Information systems → Decision support systems; • Hu- bubbles. One important objective of this work is to increase user’s man centered computing → Human computer interaction awareness of their filter-bubble, to improve decision making by (HCI); better informing users about their consumption pattern. To help users understand their own consumption patterns we propose an KEYWORDS approach for visualizing user profiles. This builds on work for visu- Visualization, Recommender Systems, Blind Spots, Filter Bubble, alizing consumption blind-spots in movie recommender systems Scatterplot, Bar-line chart [18], and visualizing consumption profiles in music [11]. When it comes to mitigating filter-bubbles, there are two com- mon responses in the literature. The first approach is to develop recommendation algorithms that are more responsive to the risks 1 INTRODUCTION inherent in the filter bubble. This can be achieved by focusing on While personalized recommendations can help people to cope with tuning algorithms to increase beyond accuracy aspects (such as di- the information overload problem, over time, using recommender versity, serendipity, coverage and novelty), in addition to relevance systems can decrease the diversity of content that we consume of recommendations (c.f., [1, 2, 4, 17, 20]), and re-ranking recom- [15], thereby, limiting our exposure to some novel content, views mendation lists to include diversity in an optimization function (c.f., and opinions contrary to our own. Our current preferences often [12, 19]). reflect our past preferences, and our behaviors may also interact While improving recommendation diversity can go some way with online filtering and ranking algorithms to further narrow our to coping with the filter bubble, it is far from a complete solution. views. This phenomena of algorithmic narrowing, or over-tailoring, For example, it does not increase user awareness of the filter bub- is called ‘filter bubbles’ [3, 6, 16]. However, there may be design ble itself. A second approach helps users to better understand the choices for recommender systems that could decrease over-tailoring. available options – the recommendation space – so as to inform Flaxman et al. found evidence that recent technological changes them about the compromises that are inherent in any set of rec- both increase and decrease various aspects of over-tailoring [7]. ommendations, relative to a wider set of items. In this regard, the This work addresses this possibility by helping users understand work of [14, 18] is pertinent, showing how visualization was found the limitations of their consumption patterns using visualizations. to increase user awareness of the filter bubble, understandability of Specifically, we propose a novel approach for recognizing ‘blind- the filtering mechanism, and a user’s sense of control. spots’ in user profiles - regions of the preference space that are In this paper, we address the blind-spot issue by showing the under-represented - and describe techniques for revealing these consumption behaviours of users, and highlighting blind-spots that blind-spots to users. By helping users to recognize these blind-spots, may exist in their consumption relative to a larger user population. we also study if this has a demonstrable effect on their consump- We further study whether by making users aware of their blind- tion; whether this recognition encourages them to further explore spots, we may be able to influence them to explore items in the the recommendation space. In the following sections, we describe under-explored parts of their catalogue. the results of a user experiment to evaluate the efficacy of two IntRS Workshop, October 2018, Vancouver, Canada Jayachithra Kumar and Nava Tintarev 3 METHOD etc). Besides, users can easily relate to a genre-based categorization, In this section, we provide a brief overview of the stages involved since it is used in existing recommender systems like Spotify. in the extraction and visualization of consumption pattern. With In addition to providing genre-level categorization between user our visualization we aim to give users a holistic view of their filter- and global profiles, it is also important for the system to be able bubble by enabling them to compare their consumption pattern to distinguish between items in the same genre, between user and (user profile) with the (aggregate) consumption pattern of other global profiles. In order to achieve this, a second dimension is users of the system (‘global’ consumption pattern or ‘global’ pro- added to the visualization. To select the most representative feature file). However, in doing so, we do not aim to explain individual we looked into the Million Song Dataset (MSD) which provides items to users, but rather highlight the important aspects of their a total of 55 features for each track, and we chose the feature profile as a whole (i.e., by grouping tracks based on genres). That ‘Artist hotness’. ‘Artist hotness’ is a value (0 to 1) assigned by MSD way visualization could scale better and still provide an accurate for each artist, which corresponds to how much buzz the artist is representation of global and user’s preferences. getting right now. This value is computed algorithmically based on In comparing global and user’s preferences, we not only enable information derived from several undisclosed sources, including comparisons between different categories, but also within the same mentions in the web, mentions in music blogs, music reviews, play category between user and global profiles (i.e., within the same counts etc. In comparison to other features, artist hotness is proven genre, we highlight the differences between user’s preferences and to provide a stable representation of user’s preferences [10]. global preferences). To further emphasize significant categories, in addition to representing a range of categories, we also represent 3.2 Data extraction interaction between these categories (i.e., when a track belongs to We used the Million Song Dataset [13], which is the largest available more than one genre). This enables us to highlight a user’s most music feature dataset containing audio features, song and artist familiar categories thereby increasing their trust in the visualiza- meta-data for a million contemporary music tracks. It is also the tion. In the following sections we describe the design decisions only dataset that provides artist hotness value for tracks. that went into the extraction of consumption data and creation of To obtain global consumption pattern, we used one of the com- visualizations. plementary datasets of MSD, the ‘Taste Profile Subset’ (TPS) and Figure 1 provides a brief overview of the stages involved in the merged this with the MSD dataset. TPS dataset provides a list of extraction and visualization of consumption pattern. Steps 1 & 2 tracks listened by a number of users of last.fm, along with the play involve feature extraction and data collection respectively. Step 3 count of these tracks. We retained users who listened to at least 20 involves extracting global and local preferences using frequent item- tracks. The artist hotness values of these tracks were obtained by set mining algorithm. Once the global and local preferences are merging TPS and MSD. Since MSD does not provide genre informa- extracted, visualizations are constructed to represent this data (step tion for tracks, we obtained this information from a third dataset 4). The following sections describe in detail, the design decisions provided by tagtraum [8]. that went into each of these stages. To build a user’s individual profile, we obtained a specific user’s real time music listening pattern from Spotify. Similar to global pro- file, this entails all the track preferences of the user, the genre/genre- combinations, and artist hotness values of these tracks. We used Spotify since it is the only API that provides all these three required information for research. Besides, Spotify is one of the largest music service providers, and hence it is relatively easy to find real users for evaluation. 3.3 Frequent genre-set extraction We applied frequent itemset mining algorithm (RElim, [5]) in order to obtain the most frequently listened genres/genre-combinations. Frequent itemset mining algorithms work by identifying all com- mon sets of items in a given list, and it is used for discovering regularities between frequently co-occuring items in large datasets. We used ‘Recursive Elimination Algorithm’ (RElim) provided by the ‘pymining’ package of Python. For both global and user’s Figure 1: Steps involved in the extraction and visualization profile, this algorithm gives a set of most frequently consumed of consumption pattern genres/genre-combinations and their frequency values (i.e., how many times the item appears in the profile). By visualizing this information we believe that we can enable users to compare their 3.1 Feature Selection consumption pattern with the global consumption pattern and For visualization, we categorize tracks based on their genre tags. subsequently to identify the blind-spots in their profiles. Genres provide a good collective representation of a user’s prefer- RElim was parameterized at a minimum frequency value (mini- ences compared to other acoustic features (such as tempo, pitch mum support) of 2. This means that all itemsets that occur less than IntRS Workshop, October 2018, Vancouver, Canada Jayachithra Kumar and Nava Tintarev two times will be eliminated from the global profile and user profile. it belongs to the global profile or the user’s profile (‘yours’ label). This support value was chosen to ensure a faster computation time We also implemented a hover feature wherein on hovering over a while still preserving significant genres. bubble, the genre corresponding to the bubble gets highlighted in Table 1 shows the top 20 most frequent genre/genre-combinations both global and user profile. This enables easy comparison between along with their (normalized) frequencies, for the global data set. both profiles. Furthermore, on hovering over a bubble, the genre Certain genres (‘Rock’, ‘Pop’) are highly preferred globally com- name, frequency and average artist hotness value corresponding pared to others. We also notice that certain genre-combinations are to the bubble gets displayed. From the given visualization, we can preferred more than other individual genres. For example, ‘Alterna- infer the following: tive, Rock’ has higher frequency compared to Rap or Metal. For each (1) For the given user, Pop is the most frequently consumed of the top-20 genre/genre-combinations, we compute the average genre, since it corresponds to the largest bubble under ‘yours’ artist hotness values of all the tracks listened in that genre (Table category of vertical axis. 1). For all the genres, the average artist hotness value lies closer to (2) Pop is also highlighted under the global category, which the center (0.5) which accounts for the diverse music consumption means that it is also globally one of the most (but not the of users. most) frequent genre(s). (3) The user prefers more popular artists compared to the aver- Table 1: Top 20 frequent item-sets for global dataset with a age user of the system since the user’s bubbles are generally minimum support value of 2. ‘Alt’ represents ‘Alternative’ aligned more towards the right. Genres (1-10) Frequency Genres (11-20) Frequency 3.4.2 Visualization 2: Bar-line chart. We compare the perfor- Rock 0.308 Metal 0.029 mance of scatterplot with the base-line visualization bar-line chart. Pop 0.108 Rock, Punk 0.029 Bar-line chart is a combination of bar chart and line chart and it Alt 0.075 Rock, Metal 0.028 can represent up to three variables. A bar chart based visualization Alt, Rock 0.071 Country 0.027 was chosen as the base-line for the following reasons: Hip-Hop 0.038 Dance 0.023 (1) It is proven to be the most compelling and persuasive means Electronic 0.036 Rock,Pop 0.023 to convey explanations in recommender systems [9]. Rap 0.032 Alt,Punk 0.021 (2) It is used in existing recommender systems such as Movie- Rap, Hip-Hop 0.032 Alt,Rock,Punk 0.021 Lens1 to represent user’s ratings across genres, and fre- R&B 0.032 Latin, Indie 0.020 quency of ratings (Figure 3). Punk 0.029 Indie 0.020 Figure 4 shows an example of a bar-line chart. Here the horizontal axis represents the itemset name; left-vertical axis corresponding to the line-chart represents the genre frequency, and right-vertical 3.4 Choice of Visualization axis corresponding to the bar-chart represents the average artist For our visualizations we represent the top-20 most frequent genre- hotness of the genre. Unlike scatterplot, a single bar-line chart is sets for user and global profiles. The choice of visualizations was not capable of showing both user’s and global data, and hence we made based on their ability to represent all the required dimensions use two separate charts. This separation might make comparison (i.e., genre/genre-combinations, frequency of genres and average between both profiles cumbersome, but it accounts for the relative of artist hotness values for each genre, for top 20 genres), to span simplicity of the chart. across global and user profiles, and to be able to represent all re- quired data points. We used scatterplot as our main visualization 4 EXPERIMENT and we compare the performance of scatterplot with the baseline - We performed an online evaluation of our system to compare the bar-line chart. In this section, we describe both these charts. effectiveness of visualizations, and to study changes in user’s pref- erences. For ease of explanation, we divide our evaluation process 3.4.1 Visualization 1: Scatterplot. Scatterplot is the type of chart into two conceptual stages: Stage 1 - where we evaluate user’s in which data is represented as a collection of points, with each understanding of visualizations, and Stage 2 - where we observe point having the value of its first variable determining its position a user’s music exploration pattern after they are exposed to their along the horizontal (x-) axis, and the second variable determining blind-spots. It is important to note here that this classification is in- its position along the vertical (y-) axis. Traditional scatterplots are troduced solely for the purpose of better representation of concepts, capable of representing only two dimensions, however, with the and from participant’s perspective the whole evaluation process inclusion of visual attributes such as color, size and shape, it is is staged as a single experimental session. In the following sec- possible to represent up to five dimensions. tions, we explain the experimental design and research hypotheses An example scatterplot used in our study is shown in Figure 2. for Stage 1 and Stage 2 in Sections 4.1 and 4.2 respectively. We Here the size of the bubbles represent the frequency of the item sets. then brief about the materials (Section 4.3) and detailed procedures So larger the bubble, higher the frequency of the genre correspond- (Section 4.4) involved in the study. ing to that bubble. To distinguish between genres we use color hues. The horizontal orientation of a bubble represents its average artist hotness value and its vertical orientation represents whether 1 https://movielens.org/, retrieved June 2018 IntRS Workshop, October 2018, Vancouver, Canada Jayachithra Kumar and Nava Tintarev (a) (b) Figure 2: (a) Example scatterplot visualization used in the study. (b) On hovering over a bubble its corresponding genre name gets highlighted in both global and user’s profile. (2) Confidence: In addition to measuring user’s actual under- standing, we also measure the perceived ease of understand- ing for both the visualizations. These are self-suggested con- fidence scores provided by the user for each question about their consumption pattern and blind-spots. It says how con- fident the users are in the answers they provide. 4.1.4 Hypotheses. • H1: Users are able to answer questions about their consump- tion pattern more accurately with scatterplot than with bar- line chart. • H2: Users have more confidence in their answers about their consumption pattern for scatterplot more than bar-line chart. Figure 3: Example bar chart visualization used in MovieLens • H3: Users are able to answer questions about their blind- system to represent the distribution of ratings (retrieved spots more accurately with scatterplot than with bar-line June 2018). chart • H4: Users have more confidence in their answers about their blind-spots for scatterplot more than bar-line chart. 4.1 Stage 1: To study the understanding of visualization 4.2 Stage 2: To study user’s music exploration 4.1.1 Design. For stage 1 of our evaluation, we used a within- 4.2.1 Design. For stage 2, we perform a simple correlation subjects repeated measures design, where each participant was analysis to study the relation between a user’s understanding of presented with both scatterplot and bar-line chart. In order to mini- their profile and their music exploration pattern. mize order effects we performed counterbalancing by changing the 4.2.2 Independent variable. For all users, we measure if their order of visualization for each participant. understanding in their profile has an impact on their exploration of 4.1.2 Independent variable. For each user we show both types blind-spot genres. Hence a user’s correctness of understanding of visualizations (bar-line chart and scatterplot), and study the is the independent variable. This value is directly computed for effectiveness of each of these visualizations in increasing the under- each user as a dependent variable in Stage 1 (Section 4.1.3). standing of a user’s consumption pattern and blind-spots. Hence type of visualization is our independent variable. 4.2.3 Dependent variable. Exploration factor: Exploration fac- tor is measured for each user by computing the proportion of tracks 4.1.3 Dependent variables. the user has explored in their blind-spot category, and it quantifies (1) Correctness of understanding: Understandability of a vi- a user’s exploration in that category. sualization is measured by asking users to answer questions 4.2.4 Hypothesis. about information represented in the visualization. These questions test a user’s understanding of their consumption • H5: Users who score higher for their questions about their pattern and their blind-spots. blind-spots, explore their blind-spot genres more. IntRS Workshop, October 2018, Vancouver, Canada Jayachithra Kumar and Nava Tintarev (a) All users: ‘Global’ (b) User’s individual profile: ‘Yours’. Figure 4: Example bar-line chart visualization used in the study. 4.3 Materials combinations from their blind-spot and frequent genre categories. Visualizations were designed using D3.js Javascript visualization li- More specifically, users are asked to select one or more genres to brary 2 . The online interfaces for web-based survey were developed listen to from these categories. Based on their chosen genres, songs using Python Flask framework3 . are recommended using Spotify’s recommendation API. Users are asked to listen to tracks that they find interesting, and if they like 4.4 Procedure and Tasks any track they are asked to "add" it to their list. Our interface (Figure 6) was inspired by Spotify’s old exploration interface (Figure 5). Each participant goes through six steps to complete the experiment. We use color coding to differentiate user’s frequent and blind-spot Participants start by providing their basic demographic informa- genres (green = frequent, red = blind-spots). tion (Step 1) after which, they log in with their Spotify account (Step 2). From the user’s account, we collect his/her top 50 tracks using spotify’s API. We then use frequent pattern mining algorithm (Section 3.3) on the genres of these tracks to compute the user’s top 20 frequent genres/genre-combinations, and their average artist hotness values. In the next two steps, users are presented with each of the two visualizations (bar-line chart and scatterplot), accompanied by a set of instructions on how to read the visualization. After a minimum buffer time of 1 minute to read and understand the visualization, questionnaires are shown below the visualization for the users to answer. The questionnaire is designed in such a way that, for each Figure 5: Original interface from Spotify that lets users se- visualization, they evaluate user’s understanding of the system in lect multiple genres all four aspects - global consumption pattern, user’s consumption pattern, user’s blind-spots and artist hotness values. More partic- Once users have listened and rated songs for at least five genre/genre- ularly, we ask users to identify - the top first and second highest combinations, in the final-step users fill-out a post-stage assessment consumed genres (globally and in their profile, i.e., 2x2 = 4 ques- survey. Here users are provided with a set of questions to test their tions), their top first and second highest blind-spots (i.e., genres overall impression of the visualizations used in the study, with re- with high frequency in global profile but not found in their profile, 2 spect to their perceived - ease of understanding, ease of interaction, questions) and the artist hotness values of the all genres chosen for usefulness and interest. The answers are collected in a five-point these questions (6 questions). In order to reduce the learning effect, likert scale. we split the above 12 questions, and performed counterbalancing to assign half of the questions for each chart. For each question, 5 RESULTS the user is also asked to provide their confidence in their answers In this section, we summarize the results of our online experiment in a 5-point Likert scale. with respect to our proposed hypotheses. Once users examine both visualizations, in the next step, we study user’s music exploration pattern. We provide an interface 5.1 Participants where users can listen to music from different genres and genre There were a total of 23 participants. 83% of the participants (n = 2 https://d3js.org, retrieved March 2018 19) were male and 17% female (n = 4). Participants were between 3 http://flask.pocoo.org, retrieved March 2018 age-groups 19-35. 20 participants had computer science background IntRS Workshop, October 2018, Vancouver, Canada Jayachithra Kumar and Nava Tintarev Table 2: Mean (std) scores of all participants for identifying their first and second most consumed genres and their artist hotness values. Scores lie between 0 to 1 with 1 being the highest score representing maximum understanding. * indi- cates significance (p<0.05). Bar-line Scatter 1st place (frequency) 1.00 (0) 0.95(0.14) 2nd place (frequency) 1.00 (0) 0.92(0.29) 1st place (artist hotness) 0.87 (0.31) 1.00 (0.00) 2nd place (artist hotness) 1.00 (0) 0.92 (0.29) Figure 6: Exploration interface used in the study identifying their first most consumed genre. For their second most consumed genre, they had higher confidence with scatterplot. How- (PhD and MSc). They all had diverse music backgrounds and music ever, the results of statistical tests show that the obtained scores are consumption behavior (Figure 7). significant for identification of artist hotness values of first most consumed genre (Mann-Whitney U-test, U-value = 29, p<0.05). Hypothesis 2 predicted that participants will have higher confi- dence for answers about their genres with scatterplot more than bar-line chart. The trend is significant in the reverse direction and the hypothesis is consequently discarded. Table 3: Mean (std) confidence scores of all participants for identifying their first and second most consumed genres and their artist hotness values. Scores lie between 1 to 5 with 5 Figure 7: Participant’s demographics by their music listen- representing maximum confidence. * indicates significance ing frequency (p<0.05). 5.2 Understandability 1: Genres Bar-line Scatter Participants were asked to identify their first and second most con- 1st place (frequency) 4.60 (0.49) 4.00 (0.89) sumed genres. Understandability was measured by how accurately 2nd place (frequency) 4.18 (0.75) 4.45 (0.96) participants could identify these genres. For each answer, a score 1st place (artist hotness)* 4.90 (0.28) 4.27 (0.64) was provided based on its correctness. For example, when answer- 2nd place (artist hotness) 4.30 (0.67) 4.70 (0.49) ing about their first most consumed genre, a score of 1 is assigned if the answer is right, a score of 0.5 is assigned if the participant provided the name of their second most consumed genre, and a score of 0 is assigned for all other wrong answers. The average scores for all participants for identifying their first and second most consumed genre, and the artist hotness values of 5.4 Understandability 2: Blind-spots these genres, is given in table 2. The difference in the mean scores Participants were asked to identify their first and second highest are not statistically significant (Mann-Whitney U-test at p<0.05). blind-spots. For each answer, a score of 1 is assigned if the answer is Thus the results provide no support for hypothesis 1, which stated right, a score of 0.5 is assigned if the participant provided the second that participants would be able to answer questions about their best answer, and a score of 0 is assigned for all other answers. consumption pattern more accurately with scatterplot than with The average scores for all participants for identifying their first bar-line chart. and second highest blind-spot and their artist hotness values are shown in Table 4. The average scores are slightly higher for scatter- 5.3 Confidence 1: Genres plot than for bar-line chart, but the results are not statistically sig- Participants were asked to provide their confidence values in their nificant. Hence hypothesis 3, which stated that participants would answers for identifying their first and second most consumed gen- be able to answer questions about their consumption pattern more res. The average scores are summarized in Table 3. The trends show accurately with scatterplot than with bar-line chart, is not con- that the participants had higher confidence with bar-line chart for firmed. IntRS Workshop, October 2018, Vancouver, Canada Jayachithra Kumar and Nava Tintarev Table 4: Mean (std) scores of all participants for identifying scores for all their answers, from Stage 1 evaluation - Section 4.1). A their top 2 blind-spot genres and their artist hotness values. positive Spearman’s correlation of 0.44 was obtained between user’s Scores lie between 0 to 1 with 1 representing maximum un- understanding of their profile and their exploration in blind-spot derstanding. * indicates significance (p<0.05). genres (Significant at p<0.05). Thus hypothesis H5 is confirmed. Bar-line Scatter 5.7 Post-hoc analysis We did a post-hoc analysis to confirm that the positive correlation 1st place (frequency) 0.66 (0.49) 0.68 (0.46) obtained between a user’s exploration in blind-spot category and 2nd place (frequency) 0.82 (0.34) 0.83 (0.39) their understanding of their profile (Section 5.6) is exclusive, and not 1st place (artist hotness) 0.66 (0.49) 0.70 (0.46) observed in frequent and bridge (i.e., frequent + blind-spot combina- 2nd place (artist hotness) 0.81 (0.34) 0.83 (0.39) tion) categories. The results of correlation analysis for frequent and bridge categories are shown in Table 6. The results state that user’s understanding of their profile has a negative correlation (p<0.05) with their exploration in frequent category and an insignificant 5.5 Confidence 2: Blind-spots weak positive correlation with bridge category. This observation Participants were asked to provide their confidence values for their implies that the positive correlation between user’s exploration and answers about their first and second highest blind-spots. The av- their understanding is exclusive to blind-spot category. erage scores are summarized in Table 5. The trends show that the participants had higher confidence with scatterplot for identify- Table 6: Spearman’s correlation between exploration fac- ing their first most consumed genre. For their second most con- tor in frequent (EF f ) and bridge (EFb ) genre categories, and sumed genre, they had higher confidence with bar-line chart. The user’s understanding of their profile. * indicates significance observed trends are significant for identification of artist hotness (p<0.05). values (Mann-Whitney U-test, U-value = 33 at p<0.05 for artist hot- ness of first highest blind-spot, and U-value = 16.5 at p<0.05 for EF f EFb artist hotness of second highest blind-spot). Hypothesis 4 predicted that participants will have higher confidence for answers about Actual understanding -0.45* 0.01 their blind-spots with scatterplot more than bar-line chart. The trend is significant in the both directions, and the hypothesis is not confirmed. Table 5: Mean (std) confidence scores of all participants for 6 DISCUSSION identifying their top 2 blind-spots, and their artist hotness In the first stage of our evaluation (Section 4.1) we aimed to under- values. Scores lie between 1 to 5 with 5 representing maxi- stand the effectiveness of visualizations at conveying information mum confidence. * indicates significance (p<0.05). about (a) user’s consumption pattern, and (b) user’s blind-spots. The correctness scores show that conventional bar-line chart is Bar-line Scatter better at conveying information that is explicit about user’s pro- files (i.e., information about consumption pattern). For conveying 1st place (frequency) 3.80 (0.93) 4.20 (0.90) information about blind-spots, or implicit information, scatterplot 2nd place (frequency) 4.30 (0.82) 3.90 (0.67) obtained higher scores. But the obtained results were not significant, 1st place (artist hotness)* 4.25 (0.62) 4.80 (0.40) and therefore, we did a post-hoc analysis on user’s comments for 2nd place (artist hotness)* 4.90 (0.30) 4.00 (0.50) each of the visualization. We found that a large number of the users agreed that bar-line charts were easier to get detailed information (8 users agreed and no one disagreed), while scatterplot was easier for comparison of their profile with global profile (9 users agreed and no one disagreed). This reasoning supports the scores obtained 5.6 Exploration for both the charts, especially for scatterplot for the identification In the exploration stage, participants were asked to explore music of blind-spots, since, the ability of a chart to compare global and from their frequent and blind-spot genres/genre-combinations. Hy- individual profile is significant for blind-spot recognition. pothesis 5 states that users who have higher understanding of their In Stage 2 of evaluation (Section 4.2), we study the impact of profile explore their blind-spot genres more. For each user, an ex- user’s understanding of their profile, on their intention to explore ploration factor (EFb s ) was computed to quantify their exploration blind-spot genres. A positive correlation concluded that users who in their blind-spot genres: are more aware of their profile tend to explore their blind-spot EFb s = Nb s * wb s , genres more. Furthermore, the results of post-hoc analysis (Sec- where Nb s is the number of genres listened in blind-spot category, tion 5.7) showed that the observed positive correlation (between and wb s is the number of tracks listened in each of these genres. We user’s exploration in blind-spot category and their understanding) compared this exploration factor with the user’s understanding of is exclusive to blind-spot category, and not observed in frequent their consumption pattern and blind-spots (obtained from their total or bridge (frequent + blind-spot combination) categories, thereby IntRS Workshop, October 2018, Vancouver, Canada Jayachithra Kumar and Nava Tintarev further reinforcing the fact that users with higher understanding Furthermore, on studying users’ exploration pattern we found of their profile explore their blind-spot category more. that users who have more understanding of their profile, also ac- Additionally, during exploration, we found that users show inter- tively explore their blind-spots more. Together, our findings suggest est in mixing genres from their frequent and blind-spot categories that it is possible to break a user’s filter-bubble by increasing a user’s (i.e., bridge genres), to discover new songs. The total number of awareness of their choices, and providing user control to explore genres that users explored in bridge category is almost as high as new item-sets. the number of genres explored in purely frequent or blind-spot In our future work, we will learn to detect a user’s exploration genres/genre-combinations (Table 7). This suggests that, irrespec- preferences and incorporate this information to refine our recom- tive of their understanding in their profile, users are equally inclined mendations. Our first step will be to differentiate between content to combined genres from different categories. During exploration that a user is not consuming because they are not aware of it, from phase, we used different color codes to distinguish between fre- content that the user does not engage with because they are not quent and blind-spot genres. This might have stimulated an urge interested. We also plan to continue this work in other domains among users to combine genres from these two categories. than music, such as news recommendations. Table 7: Total number of genre/genre-combinations ex- REFERENCES [1] Zeinab Abbassi, Vahab S. Mirrokni, and Mayur Thakur. 2012. Diversity Maxi- plored by the user in different categories. mization Under Matroid Constraints. Technical Report. Department of Computer Science, Columbia University. [2] Gediminas Adomavicius and YoungOk Kwon. 2011. Improving Aggregate Rec- Frequent Blind-spot Bridge ommendation Diversity Using Ranking-Based Techniques. IEEE Transactions on 50 52 47 Knowledge and Data Engineering 24 (2011), 896–911. [3] Eytan Bakshy, Solomon Messing, and Lada A. Adamic. 2015. Exposure to Ide- ologically Diverse News and Opinion on Facebook. Science 348, 6239 (2015), 1130–1132. [4] Derek Bridge and John Paul Kelly. 2006. Ways of Computing Diverse Collabora- 6.1 Limitations tive Recommendations. In Adaptive Hypermedia and Adaptive Web-based Systems. 41–50. In this section, we delineate the limitations and delimitations of [5] Borgelt C. 2005. Keeping things simple: finding frequent item sets by recursive our system which restrict the scope of our results. elimination. (2005). [6] Michael D Conover, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer. Firstly, when it comes to the data used in our experiment, the 2012. Partisan asymmetries in online political activity. EPJ Data Science 1, 1 global consumption data obtained from Million Song Dataset’s taste (2012), 6. [7] Seth Flaxman, Sharad Goel, and Justin M Rao. 2016. Filter bubbles, echo chambers, profile subset (TPS), is available only until the year 2011. There is and online news consumption. Public Opinion Quarterly 80, S1 (2016), 298–320. no known way to extract data beyond this time period, and hence [8] Schreiber H. 2015. Improving Genre Annotations for the Million Song Dataset. it is quite possible that recent changes in trends are not reflected in (2015). [9] Jonathan L. Herlocker, Joseph A. Konstan, and John Riedl. 2000. Explaining col- our global profile. Secondly, for comparison of visualizations, we laborative filtering recommendations. In ACM conference on Computer supported only compare between the scores of bar-line chart and scatterplot. cooperative work. 241–250. However, there could be other visualizations that obtain higher [10] Li D Hu Y. 2013. Evaluation on Feature Importance for Favorite Song Detection. (2013). scores than these two visualizations. Future studies could focus on [11] Yucheng Jin, Nava Tintarev, and Katrien Verbert. 2018. Effects of Individual exploring better means of representation. Traits on Diversity-aware Music Recommender User Interfaces. In UMAP. [12] Michael Jugovac, Dietmar Jannach, and Lukas Lerche. 2017. Efficient optimiza- Finally, when studying the correlation between user’s explo- tion of multiple recommendation quality factors according to individual user ration and their understanding, our study is restricted to user’s tendencies. Expert Systems with Applications 81 (2017), 321–331. exploration at that specific point during the experiment. Neither [13] Ellis DP Lanckriet GR McFee B, Bertin-Mahieux T. 2012. The million song dataset challenge. (2012). do we confirm if users continue to explore diverse music, nor do [14] Sayooran Nagulendra and Julita Vassileva. 2014. Understanding and controlling we consider the impacts of contextual factors such as user’s mood, the filter bubble through interactive visualization: a user study. In Proceedings of time of the day etc. In future work, these factors should be taken the 25th ACM conference on Hypertext and social media. ACM, 107–115. [15] Tien T Nguyen, Pik-Mai Hui, F Maxwell Harper, Loren Terveen, and Joseph A into account. Konstan. 2014. Exploring the filter bubble: the effect of using recommender systems on content diversity. In Proceedings of the 23rd international conference on World wide web. ACM, 677–686. 7 CONCLUSIONS AND FUTURE WORK [16] Eli Pariser. 2011. The filter bubble: What the Internet is hiding from you. Penguin Recommender systems continue to inform our beliefs and opinions Books. [17] Barry Smyth and Paul McClave. 2001. Similarity vs. Diversity. In 4th International as they influence the information we consume in the world around Conference on Case-Based Reasoning. us, ranging from the music we listen and movies we watch, to the [18] Nava Tintarev, Shahin Rostami, and Barry Smyth. 2018. Knowing the unknown: news we read and food we consume. This raises the bar in terms visualising consumption blind-spots in recommender systems. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing, SAC 2018, Pau, France, of the ethics of responsible recommendation, and if recommender April 09-13, 2018. 1396–1399. systems are to earn our trust then they must help us understand [19] Saúl Vargas and Pablo Castells. 2011. Rank and relevance in novelty and diversity metrics for recommender systems. In Proceedings of the fifth ACM conference on why certain suggestions are being made and why others are not. Recommender systems. ACM, 109–116. We have presented a user-centered study to assess the effectiveness [20] Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. 2005. of a visualizations to improve human decision making. The results Improving Recommendation Lists Through Topic Diversification. In WWW’05. 22–32. suggest that users can understand the two visualizations, and that these visualizations are effective for helping users to identify their consumption blind-spots.