A Survey of Music Recommendation Aids Pirkka Åman and Lassi A. Liikkanen Helsinki Institute for Information Technology HIIT Aalto University and University of Helsinki Tel. +358 50 384 1514 firstname.lastname@hiit.fi ABSTRACT constant competition and can reduce the usefulness of This paper provides a review of explanations, visualizations and recommendations unless they can persuade the user to try the interactive elements of user interfaces (UI) in music suggested content. Explanations and other recommendation recommendation systems. We call these UI features aiding UI features are examined in this paper as a way to “recommendation aids”. Explanations are elements of the increase the satisfaction towards recommenders among users. interface that inform the user why a certain recommendation was made. We highlight six possible goals for explanations, The first interactive systems to have explanations were expert resulting in overall satisfaction towards the system. We found systems, including legal and medical databases [4]. Their that the most of the existing music recommenders of popular present successors are commercial recommendation systems systems provide no explanations, or very limited ones. Since commonly found embedded in various entertainment systems explanations are not independent of other UI elements in such as iTunes [9] or Last.fm [12]. Explanations can be recommendation process, we consider how the other elements described as textual information telling e.g. why and how a can be used to achieve the same goals. To this end, we recommendation was produced to the user. Earlier research evaluated several existing music recommenders. We wanted to shows that even rudimentary explanations build more trust discover which of the six goals (transparency, scrutability, towards the systems than the so-called “black box” effectiveness, persuasiveness, efficiency and trust) the different recommenders [13]. Explanations also provide system UI elements promote in the existing music recommenders, and developers a graceful way for handling errors that recommender how they could be measured in order to create a simple algorithms sometimes produce [6]. framework for evaluating recommender UIs. By using this framework designers of recommendation systems could The majority of previous recommendation system research has promote users’ trust and overall satisfaction towards a been focused on the statistical accuracy of the algorithms recommender system thereby improving the user experience driving the systems, with little emphasis on interface issues and with the system. the user experiences [13]. However, it has been noted lately that when the new algorithms are compared to the older ones, both Categories and Subject Descriptors tuned to the optimum, they all produce nearly similar results. H5.m. Information interfaces and presentation: Miscellaneous. Researchers have speculated that we may have reached a level H.5.5 Sound and Music Computing. where human variability prevents the systems from getting much more accurate [7]. This mirrors the human factor: it has Author Keywords been shown that users provide inconsistent ratings when asked Recommendation systems, music recommendation, to rate the same item several times [14]. Thereby an algorithm explanations, user experience, UI design. cannot be more accurate than the variance in the user’s ratings for the same item. 1. INTRODUCTION Recommender systems are a specific type of information An important aspect for the assessment of recommendation filtering technique that aims at presenting items (music, news, systems is to evaluate them subjectively, e.g. how well they can other users, etc.) that user the might be interested in. To do this, communicate their reasoning to users. That’s why user interface information about the user is compared to reference elements such as explanations, interactive elements and characteristics, e.g. information on the other users of the system visualizations are increasingly important in improving user (collaborative filtering) or content features, such as genre in the experience. In the past years subjectively perceived aspects of case of books or music (content-based filtering). In its most recommendations systems have accordingly gained ground in common formulation, the recommendation task is reduced to their evaluation. the problem of estimating relevance of the items that a user has not encountered yet, and then presenting the items that have the In this paper we want to illustrate the possibilities of user- highest estimated ratings [6]. The importance of recommender evaluation of recommendation supporting features in systems lies in their potential to help users to more effectively recommendation systems. We do this by performing a review identify items of interest from a potentially overwhelming set of on several publicly available music recommenders. Music is choices [7]. The importance of these mechanisms has become today one of the most ubiquitous commodities and the evident as commercial services over the Internet have extended availability of digital music is constantly growing. Massive their catalogue to dimensions unexplorable to a single user. online music libraries with millions of tracks are easily However, the overwhelming numbers of content create a available in the Internet. However, finding new and relevant music from those vast collections of music becomes similarly WOMRAD 2010 Workshop on Music Recommendation and Discovery, increasingly difficult. One approach to tackle the problem of colocated with ACM RecSys 2010 (Barcelona, SPAIN) finding new, relevant music is developing better (reliable and Copyright (c). This is an open-access article distributed under the terms trustworthy) recommendation systems. Music recommenders of the Creative Commons Attribution License 3.0 Unported, which are also easy to access and music has reasonably short process permits unrestricted use, distribution, and reproduction in any medium, in determining the quality of recommendation results. provided the original author and source are credited. 2. GOALS FOR RECOMMENDATION AIDS items compared to the same user in a system without an Tintarev and Masthoff [16] present a taxonomy for evaluating explanation facility [16] and what kind of persuasion techniques goals for explanations. Those are shown slightly modified in the are utilized. Persuasion could also be measured by applying Table 1 below. We argue that satisfaction towards a click-through rates used in measuring online ads. recommendation system is an aggregate of the six other dimensions, more a goal of itself than the other dimensions. In 5. Efficient explanations help users to decide faster which addition, we noticed that the dimensions are not so recommendation items are best for their current situation. straightforward as Tintarev and Masthoff present them. Some Efficiency can be improved by allowing the user to understand of them cannot be evaluated using objective measures, and the relation between recommended options [12]. A simple way therefore framework for evaluation recommendation aids must to evaluate efficiency is to give users tasks and measure how be drawn from user research. In the following we describe each long it takes to find e.g. an artist that is novel and pleasing to dimension and give examples of how they could be evaluated the user. and measured. 6. Increasing users’ confidence in the system results in trust Table 1. Dimensions for recommendation explanations. towards a recommender. Trust is in the core of any kind of recommendation process, and it is perhaps the most important Goal Definition single factor leading to better user satisfaction and user Transparency Explain how the system works experience with the interactive system. A study of users’ trust (defined as perceived confidence in a recommender system’s Scrutability Allow users to tell the system it is wrong competence) suggests that users intend to return to Effectiveness Help users make good decisions recommender systems, which they find trustworthy [2]. The interface design of a recommender affects its credibility and Persuasiveness Convince users to try or buy earlier research has shown that in user evaluation of web page Efficiency Help users make decisions faster credibility the largest proportion of users’ comments referred to the UI design issues [5]. Trust needs to be measured using Trust Increase users’ confidence in the system subjective scales over multiple tasks or questions about the Resulting in  recommendation aiding features of a recommender UI. Satisfaction (increasing the ease of use or enjoyment towards the system) The ease of use or enjoyment results finally in more satisfaction towards a system. Descriptions of recommended items have 1. An explanation may tell users how or why a recommendation been found to be positively correlated with both the perceived was made, allowing them to see behind the UI and thus making usefulness and ease of use of the recommender system [6], recommendation transparent. Transparency is also a standard enhancing users' overall satisfaction. Even though we see usability principle, formulated as a heuristic of ’Visibility of satisfaction as an aggregate of the dimensions presented above, System Status’ [13]. Transparency can be measured objectively, satisfaction with the process could be measured e.g. by using binary scale (yes/no), e.g. if a UI provides some kind of conducting a user walk-through for a task such as finding a explanation how a recommendation was made transparency satisfactory item. gets a vote. However, evaluating transparency subjectively may involve users to be asked if they understand how the recommendation was made using e.g. Likert scale. 3. RELATED EMPIRICAL RESEARCH It is widely agreed that expert systems that act as decision- 2. Scrutability means that users are allowed to provide feedback support systems need to provide explanations and justifications for the system about the recommendations. Scrutability is for their advice [13]. However, there is no clear consensus on related to the established usability principle of ‘User Control’ how explanations should be designed in conjunction with other [13]. Scrutability can be measured objectively by finding out if UI elements or evaluated by users. Studies with search engines there is a way to tell the system it is wrong. To evaluate show the importance of explanations. Koenmann & Belkin [11] scrutability subjectively, users may be given a task to find a found that greater interactivity for feedback on way to tell how to stop receiving e.g. recommendations of Elvis recommendations helped search performance and satisfaction songs. If users feel they can control the recommendations by with the system. Johnson & Johnson [10] note that changing their profile, the UI has the possibility to scrutinize. explanations play a crucial role in the interaction between users and interactive systems. According to their research, one 3. Effectiveness of an explanation help users make better purpose of explanations is to illustrate the relationship between decisions. Effectiveness is highly dependent on the accuracy of cause and effect. In the context of recommender systems, the recommendation algorithm. An effective explanation would understanding the relationship between the input to the system help the user evaluate the quality of suggested items according (ratings and choices made by user) and output to their own preferences [16]. This would increase the (recommendations) allows the user to interact efficiently with likelihood that the user discards irrelevant options while helping the system. Sinha and Swearingen [15] studied the role of them to recognize useful ones. Unlike travel or film transparency in recommender systems. Their results show that recommenders, in the case of music recommenders the process users like and feel more confident about recommendations that of deciding the goodness of a recommendation is done quite they perceive transparent. Explanations allow users to quickly. meaningfully revise the input in order to improve recommendations, rather than making “shots in the dark.” 4. Persuasiveness. Explanations may convince users to try or buy recommended items. However, persuasion may result in an Herlocker and Konstan [6] suggest that recommender systems adverse reaction towards the system, if users continuously have not been used in high-risk decision-making because of a decide to choose bad recommendations. Persuasion could be lack of transparency. While users might take a chance on an measured according to how much the user actually tries or buys opaque movie recommendation, they might be unwilling e.g. to commit to a vacation spot without understanding the reasoning behind such a recommendation. Building an explanation facility Some of the dimensions are easy to connect to certain UI into a recommender system can benefit the user in various elements. For instance, scrutability is usually designed as a ways. It removes the “black box” around the recommender combination of explanation and interactivity, whereas other, system, providing transparency. Some of the other benefits more general level dimensions depend strongly on subjective include justification. If users understand the reasoning behind a experience and are hard to connect with specific UI elements. recommendation, they may decide how much confidence to For example, satisfaction or trust towards a system is usually place in the suggestion. That results in greater acceptance or combination of different experienced UI dimensions. Therefore satisfaction of the recommender system as a decision aide, the most common dimensions promoted in the evaluated since its limits and strengths are more visible and its systems were trust and satisfaction. Those, together with suggestions are justified. persuasiveness, are experienced very subjectively, which means that empirical user evaluation is needed for more reliable and 4. RECOMMENDATION AIDS IN EXISTING comparable evaluations of those dimensions. MUSIC RECOMMENDERS We conducted an expert walkthrough of six publicly available Obvious example of an explanation providing transparency is music systems with recommendation functionalities in order to Amazon’s “Customers with Similar Searches Purchased…”, find out which of the six goals explanations, visualizations and with up to ten albums’ list. Pandora tells a user: “This song was interactive UI elements promote in the existing music recommended to you because it has jazzy vocals, light rhythm recommenders, and how they can be measured in order to and a horn section.” Transparency is very hard to achieve create a simple framework for evaluating recommenders. The without textual, explicit explanations. Of the reviewed systems, walkthrough was conducted by authors listing the UI features only Musicovery’s UI with several interactive elements, capable of promoting the goals mentioned above. The reviewed graphical visualization of the recommendations and the systems include Pandora, Amazon, Last.fm, Audiobaba, relations between them give users clear clues of why certain Musicovery and Spotify. We wanted to include the most pieces of music were recommended, without providing popular online music services, and on the other hand, include a explanations. variety of different UIs. Each of the evaluated systems provides recommendations but not necessarily explanations. Systems Last.fm offers users scrutability in many ways, e.g. with its without textual explanations were also included in order to find music player (Figure 1). One of the system’s more sophisticated out what kind of goals or functions similar to verbal scrutinizing tactics is a social one. Last.fm allows users to turn explanations other recommendation aids provide. off the registering (called scrobbling) of the listened music. The system’s users can perform identity work by turning scrobbling Table 2. The occurrences of recommendation aids in a selection of off, if they feel they do not want to communicate what they music recommenders have listened to the other users. Amazon provides “Fix this Trans. Scrt. Effect. Pers. Effic. Trust recommendation” option for telling the system to remove Amazon 1 2 2 3 1 3 12 recommended item from the users browsing history. Last.fm - 2 2 1 2 2 9 Audiobaba 1 1 2 1 1 2 8 Musicovery 2 2 2 2 2 1 11 Spotify - - 1 1 1 1 4 Pandora 2 2 3 3 2 3 15 6 9 12 11 9 12 Figure 1: Example of scrutable interactivity: Last.fm player’s love, If a recommender has a possibility to promote a goal with ban, stop and skip buttons give users a tool to control their profiles and thereby affect recommendations. explanations, visualizations or interactive elements, it gets a vote in Table 2. For example, persuasiveness promoted through Users can be helped in efficiency and effectiveness, i.e. making visualizations is potentially possible in all of the interfaces that better and faster decisions by offering appropriate controls with have visualizations, even rudimentary ones, such as an album interactive elements. For instance, Musicovery’s timeline slider cover. A single user might be persuaded to try or buy by is presented in Figure 2. It works in real time with the system’s presenting a subjectively compelling album cover. From Table graphical presentation of recommended items. 2 we can see that Pandora, Amazon and Musicovery have the greatest number of UI elements able to provide users support for sense-making of recommendations. Effectiveness, persuasiveness and trust are the most commonly promoted goals. In each recommender, each UI element has the potential to increase trust towards the systems, but for more accurate measurement, it remains to be evaluated by empirical user research, to which extent each elements in certain recommender interface really promote trust. This applies to most of the six goals: without empirical data, it is almost impossible to decide, Figure 2: Musicovery’s timeline slider: interactivity promoting whether the potential for promoting effectiveness, efficiency, scrutability, and effectiveness, resulting in more trust and satisfaction towards the system. persuasiveness and efficiency actually realizes. Only transparency and scrutability can be measured using objective binary scale of yes/no, but they can be evaluated also using 5. DISCUSSION AND CONCLUSIONS subjective (Likert style) scales. We argue that by measuring We reviewed dimensions of explanations in six music these goals for UI elements together with a set of usability recommendation systems and found out that most of the guidelines, it is possible to evaluate and design better user reviewed commercial music recommendation systems are experiences for recommendation systems. “black boxes”, producing recommendations without any, or very limited explanations. Most of the dimensions are poorly artists, not to mention advanced social media features that promoted by textual explanations, but can be promoted by other together effectively work towards the same goals as the means, namely by visualizations and interactive elements, and dimensions of explanations. Furthermore, Spotify, a popular further, by user-generated content and social facilities. From the European music service with very simple recommendation expert walkthrough of the selected music recommendation facility, does not provide any explanations whatsoever. Its systems we can draw a tentative conclusion that if UI elements popularity relies on providing users a minimalistic UI with can fulfill similar functions as explanations, there is necessarily effective search facility and a functional, high-quality audio no need for textual descriptions. By using non-verbal streaming. Spotify’s usability and functionality work effectively recommendation aids as “implicit” explanations and using them towards overall satisfaction of the system, making explanations, in recommendation system design, we can promote better user visualizations or advanced interactivity redundant. Obviously, experience. This is the case especially when the user has Spotify’s abilities for helping to find new music are limited, enough cultural capital and therefore competence for “joining because of very simple recommendation facility, but it can be the dots” between recommended items without explicit used as an example of the argument that user trust and explanations. On the other hand, if the recommender is used satisfaction can be promoted by diverse means depending on e.g. for learning about musical genre, textual explanations may the different users’ various needs and desires. be indispensable. The next step of our research is to conduct an empirical user As an example of the dimensions that UI elements other than evaluation of the importance and functions of different UI verbal explanations can promote is the overall satisfaction or elements in music recommenders. We are looking for feasible trust towards the systems that can be achieved by scales of measurement that are drawn from user evaluation of conversational interaction such as in UI example presented in the goals for UI elements in recommenders. User evaluation Figure 3, where users are given a chance for optional could be done with modified music recommender UIs where recommendations based on their situational desires and needs. users are given tasks and comparing e.g. how much taking away a UI feature such as an explanation effects to the time the task is completed. It would also be interesting to explore how different goals can be promoted by combining various UI elements, and by assigning unconventional roles for UI elements, e.g. creating visualizations that would reveal the logic behind a recommendation and at the same time give a user a tool to scrutinize. Figure 3: A recommendation aid with optional inputs. Last.fm is an example of recommendation system with no explanations. However, it has an abundance of other elements such as user created biographies, genre tags and pictures of REFERENCES [1] Adomavicius G., Tuzhilin, A. 2005. Towards the Next [8] Hill, W., Stead, L., Rosenstein, M., Furnas, G. 1995. Generation of Recommender Systems: A Survey of the “Recommending and Evaluating Choices in a Virtual State-of-the-Art and Possible Extensions. IEEE Community of Use”, Proc. of Conference on Human Transactions on Knowledge and Data Engineering, 17(6), Factors in Computing Systems ‘05. 734-749. [9] iTunes, http://www.apple.com/itunes. [2] Buchanan, B., Shortcliffe, E. 1984. Rule-Based Expert [10] Johnson, J. & Johnson, P. 1993. Explanation facilities and Systems: The Mycin Experiments of the Stanford Heuristic interactive systems. In Proceedings of Intelligent User Programming Project. Reading, MA: Addison Wesley Interfaces '93. (159-166). Publishing Company. [11] Koenemann, J., Belkin, N. 1996. A case for interaction: A [3] Chen, L., P. Pu. 2002. Trust building in recommender study of interactive information retrieval behavior and agents. In Proc. of International Workshop on Web effectiveness. In Proc. of Conference on Human Factors in Personalisation, Recommender Systems and Intelligent Computing Systems ‘96, ACM Press, NY. User Interfaces ‘02. [12] Last.fm, http://www.last.fm. [4] Doyle, D., A. Tsymbal, and P. Cunningham. 2003. A review of explanation and explanation in case-based [13] Nielsen, J. and R. Molich. Heuristic evaluation of user reasoning. Technical report, Dept. of Computer Science, interfaces. In Proc. of Conference on Human Factors in Trinity College, Dublin. Computing Systems ’90. [5] Fogg, B. J., Soohoo, C., Danielson, D. R., Marable, L., [14] Pu, P., Chen, L. 2006. Trust building with explanation Stanford, J, Tauber, E.R. 2003. How do users evaluate the interfaces. In Proc. of Intelligent User Interfaces ‘06, credibility of web sites? In Proc. of Designing for User pages 93-100. Experiences ‘03. Pages 1-15. [15] Sinha, R., Swearingen, K. 2002. The role of transparency [6] Herlocker J. L., Konstan, J. A. 2000. Explaining in recommender systems. In Proc. of Conference on collaborative filtering recommendations. Proc. of Human Factors in Computing Systems ‘02. Computer Supported Collaborative Work ‘00. Pages 241- [16] Tintarev, N., Masthoff, J. 2007. Survey of explanations in 250. recommender systems. In Proc. of International Workshop [7] Herlocker, J. L., Konstan, J.A., Terveen, L, Riedl, J. T. on Web Personalisation, Recommender Systems and 2004. Evaluating collaborative filtering recommender Intelligent User Interfaces ‘07. systems. ACM Trans. Inf. Syst., 22(1):5-53.