Music Recommendation in the Personal Long Tail: Using a Social-based Analysis of a Userʼs Long-Tailed Listening Behavior Kibeom Lee Woon Seung Yeo Kyogu Lee Graduate School of Culture Graduate School of Culture Department of Digital Contents Technology, KAIST Technology, KAIST Convergence, Seoul National Daejeon, Korea Daejeon, Korea University Seoul, Korea kiblee@kaist.ac.kr woon@kaist.ac.kr kglee@snu.ac.kr ABSTRACT sales have moved from physical album sales to digital sales from online stores. Currently, these services offer millions of tracks to The online music industry has been growing at a fast pace, users, the catalog growing rapidly in size compared to the size especially during the recent years. Even music sales have moved when the services were first announced. For instance, Amazon from physical sales to digital sales, paving the way for millions of offered over 2 million songs to users when the music service digital music becoming available for all users. However, this launched, but now offers over 11.8 million songs as of 2010. produces information overload, where there are so many items Some notable online music stores, including Amazon, are available due to, virtually, no storage limitations, it becomes Amazon MP3 (11,000,000+ songs), iTunes Store (12,000,000+ difficult for users to find what they are looking for. There have songs) and Rhapsody (9,000,000+ songs). Apart from music been many approaches in recommending music to users to tackle stores, there are also music streaming services that offer millions information overload. One successful approach is collaborative of songs, such as Lala (8,000,000 songs), Spotify (8,000,000 filtering, which is currently widely used in commercial services. songs), and Last.fm (7,000,000 songs). Although collaborative filtering produces very satisfying results, it becomes prone to popularity bias, recommending items that are These large numbers of songs available to users are a result of the correct recommendations but quite "obvious". In this paper, a new Long Tail business model [1], contrary to only products that were recommendation algorithm is proposed that is based on in demand being sold in stores. However, as a result, although collaborative filtering and focuses on producing novel paradoxical, users have ended up listening to less music now that recommendations. The algorithm produces novel, yet relevant, so much is available, simply because it is hard to find new and recommendations to users based on analyzing the users' and the relevant music. For instance, digital track sales surpassed the 1 entire population's listening behaviors. An online user test shows billion sales mark in 2008. However, the Top 200 digital tracks that the system is able to produce relevant and novel alone accounted for 17% of the entire track sales (184 million recommendations and has greater potential with some minor sales) [2]. adjustments in parameters. 2. RELATED WORK Categories and Subject Descriptors I.1.2 [Computing Methodologies]: Algorithms – Nonalgebraic 2.1 Collaborative Filtering-based algorithms, analysis of algorithms Recommender Systems One of the earliest recommender systems based on collaborative General Terms filtering is Tapestry [3]. Stemming from the need to handle Algorithms increasing numbers of emails, Tapestry used explicit opinions of people in a relatively small group, such as an office workgroup, to filter out incoming email for a given user. However, a drawback Keywords of this system was that users had to be familiar with the Recommender systems, collaborative filtering, music preferences and opinions of other people in their network, which recommendation is why Tapestry worked on small networks like the office. A more general collaborative filtering approach was developed by 1. INTRODUCTION Resnick et al. called GroupLens [4]. The basic idea behind With advances in the Internet, lower hardware costs, increasing GroupLens, which aimed to help users find news articles amongst peer-to-peer networks, and the popularity of high-storage portable the vast available numbers, was that "people who agreed in the media players, the online music industry has been growing past will probably agree again". Using this heuristic, the rapidly, especially during the past few years. Gradually, music GroupLens system was able to predict the ratings of certain news WOMRAD 2010 Workshop on Music Recommendation and Discovery, articles by a given user. An advantage that this provided was that colocated with ACM RecSys 2010 (Barcelona, SPAIN) the collaborative filtering could be scaled, unlike Tapestry, Copyright (c). This is an open-access article distributed under the terms because a user was not required to actually know other users that of the Creative Commons Attribution License 3.0 Unported, which had similar preferences to him. This was done by the system, permits unrestricted use, distribution, and reproduction in any medium, which gathered information on the ratings of users, naturally provided the original author and source are credited. creating another advantage of users being anonymous inside the proposed a method for generating social tags for music that lack whole system. such tags [9]. Audio features of songs were analyzed and mapped to tags, using a set of boosted classifiers. These were then utilized Research related to, and including, the above studies focused on on untagged songs, populating them with the associated social filtering a vast amount of text, which were in forms of emails, tags depending on the musical content. This enables unpopular news, and messages, to those that were worth reading. Items songs and/or new songs that have no social tags to be used in would be given to the user with their prediction scores, aiding the music recommenders that use a social algorithm. It also tackles user in which item to read next. The next wave of studies focused the cold start problem, a problem found in collaborative filtering- on a more direct approach in recommending items. based recommender systems. Symeonidis et. al analyzed social Ringo was a system developed to provide personalized music tags in order to tackle the problem of the multimodal use of music recommendations [5]. It maintained a user's profile, a history of [10]. They developed a framework that modeled users, tags, and ratings on various artists that were essentially explicit labelings on items, altogether. This was then used in recommending musical which artists the user does or does not enjoy listening to. These items (artists, songs, and albums) to users by performing latent profiles were matched by the system to calculate semantic analysis and dimensionality reduction according to each recommendations on which artists had the highest probabilities of user's multimodal perception of music. Levy and Sandler inspect being liked by the user. the seemingly ad hoc and informal language of tagging as a high- volume source of semantic metadata for music. Results show that While Ringo was focused on music items, Bellcore's tags establish a low-dimensional semantic space, being extremely recommender system focused on movies [6]. Like Ringo, it used a polished at the track level, especially by artist and genre. Using database of movie ratings by users and matched rating profiles to these results, the authors also introduce an interface for users to provide recommendations by finding similar users and the movies browse by mood, through a two-dimensional subspace that that they had watched and rated positively. Tests on the reliability represents musical emotion. of the recommender system showed that three out of every four recommendations would be rated highly by the user, and also Celma introduces a system that recommends music and the showed that the system produced extremely more accurate relevant information associated with the recommended music recommendations compared to nationally-known movie critics. [11]. The proposed system uses the Friend of a Friend and RSS vocabularies for creating recommendations, taking in While there were numerous advances and algorithms related to consideration the user's musical tastes and listening habits. The collaborative filtering since then, the most well-known FOAF project provides protocols and a language to describe collaborative filtering system today, however, is probably the homepage-like content and social networks, ultimately providing system used in Amazon.com, an electronic commerce company the proposed system with the user's profile. The RSS vocabulary that sells books, movies, music, etc. Amazon.com offers provides the system with syndicated content, which includes data recommendations on items that are similar to the item being such as new album releases, album reviews, podcast sessions, purchased, rather than finding similar users and then upcoming gigs, etc. Thus, the proposed system improves the recommending the items those users have purchased. This existing recommendation systems by understanding the users method, which is called item-to-item collaborative filtering, scales through psychological factors (personality, demographic to extremely large datasets and generates satisfiable results. preferences, socioeconomics, situation, social relationships) and explicit music preferences. 2.2 Collaborative Filtering-based Recommender Systems for Music 3. LIMITATIONS OF COLLABORATIVE Although the collaborative filtering-based approaches above were FILTERING designed on specific items, the algorithms can be generalized and applied to music recommendation. Hence, the results of such 3.1 Popularity Bias algorithms applied to music are not much different than applied to Collaborative filtering-based recommender systems produce good the original items. results and are used widely in commercial services such as Amazon.com and Last.fm. However, collaborative filtering has Apart from recommender systems that use data on the ratings some common limitations that occur naturally due to its roots and/or purchases of items, there are other collaborative filtering- lying in the wisdom of crowds. One of the largest problems of based recommender systems that take advantage of metadata collaborative filtering is popularity bias [12, 13]. produced by users that are found in music. This happens when a popular item is associated with many other [7] presents some examples of metadata used in such algorithms, related items. Users that interact with these items are then which include reviews, lyrics, blogs, social tags, bios, and recommended the popular item. The system recommends the playlists. Examples of commercial services that use such popular item often, leading to item purchases (or any other form approaches are Rate Your Music (reviews), The Hype Machine of positive input from user) and as this item is purchased more, it (blogs), last.fm (social tags), and playlist.com (playlists). is also recommended more. This loop, in which the rich become Social tags, a representative product of online collaboration, has richer, creates popularity bias. been used heavily in music recommendation systems. Hu and Naturally, as a result of the above feedback loop, the Downie explored the mood metadata associated with songs and recommender system tends to bias its recommendations towards their relationships with music genre, artist, and usage metadata popular items. Thus, the recommendations lose their novelty [12, [8]. They found that the genre-mood relationships and artist-mood 13] and make it extremely difficult to recommend lesser-known relationships showed consistencies, showing the potential of being artists. utilized in automated mood classification tasks. Eck et. al In Amazon.com, in which collaborative filtering is heavily used, 4.1 Concept of Recommendation Algorithm the popularity bias can be seen when viewing the recommendations that are offered when searching for popular 4.1.1 Changing Perspectives on Novel items. For instance, the 98 recommendations that appear when Recommendations searching for Harry Potter includes The Da Vinci Code, To Kill a While the goal of recommenders in general is to provide Mockingbird and 28 other Harry Potter books and DVDs. In the recommendations that are novel and relevant to the user, as stated case of music, searching for The Beatles' Revolver album results beforehand social-based recommendations, although relevant, fail in 33 albums from The Beatles, out of a total of 97 in providing novel recommendations to users. In contrast, content- recommendations, as shown in Figure 1. The other recommended based recommender systems work better in providing novel items show well-known artists that user's, who are interested in recommendations because they are not affected by popularity or The Beatles, will most likely have heard of already such as The any other social influence [15]. Rolling Stones, Led Zeppelin, and Neil Young. These Another method to provide novel recommendations to users is to recommended artists are correct recommendations but fail to be use the long tail popularity distribution of the artists [7]. This idea novel recommendations. can be applied to both content-based and social-based algorithms. Content-based algorithms can use the long tail distribution to recommend similar items based on content-analysis and also found in the tail portion of the distribution. For social-based algorithms, or collaborative filtering, the idea can be applied by first obtaining the full list of recommendations and then removing the recommendations that lie in the head portion of the distribution. This would result in recommendations being novel to the user, since it is unlikely that artists residing in the tail portion of the distribution are known to the user. However, although strictly recommending artists from the long tail and avoiding recommending those that are obvious (those that are located in the head portion of the distribution) have a high probability of being novel recommendations, we need to take in consideration that novel recommendations are relative to the user. Figure 1. Recommendations from Amazon.com, which In other words, it is naive to assume that the user will be aware of are all quite "obvious" recommendations, although certain artists just because they are in the head portion of the long they are correct recommendations. tail distribution. Thus, the fact that even popular artists have a Due to this popularity bias, a large portion of the recommended possibility of being novel recommendations to certain users must not be overlooked. items result in obvious recommendations that may be relevant to easy-going, casual listeners, but not so helpful for enthusiastic 4.1.2 User Listening Behavior music listeners, who have a high probability of already being As shown in Figure 2, which shows a random Last.fm user's knowledgeable on the artists being recommended. playlist in descending order of playcount, the listening behavior The number of high quality, or "correct", recommended items that shows a distribution that is similar to that of long-tail are produced with collaborative filtering is verified by [14]. distributions. Users tend to listen to an extremely small portion of However, the problem of popularity bias was also verified as the their playlists often while the remaining songs seldom get played. amount of novel recommendations given to a user was the lowest Due to the data available, which is the top 500 played songs in the for collaborative filtering in an experiment comparing user's playlist, all of the songs in the graph are played at least collaborative filtering, content-based, and hybrid methods [14]. once. Thus, it was confirmed that collaborative filtering results in less percentage of novel songs but of higher quality. 4. ALGORITHM In this section, we provide an algorithm that is based on collaborative filtering, yet overcomes popularity bias, a natural problem that arises from CF. Also, the algorithm focuses on Figure 2. The listening behavior of a user and his/her entire providing recommendations that are novel to the user, while also playlist. Although not exact, the graph shows a long-tailed remaining relevant. distribution where the majority of tracks are seldom played. To implement this algorithm, user data from Last.fm, an Internet service that provides users with streaming music via radio 4.1.3 Defining Experts and Novices stations, was used. Reasons for selecting Last.fm was the readily Using this long-tailed distribution of users' listening behaviors, the available developer API and the various, massive amount of data users can be divided into two groups: experts and novices. Here, that was available such as user playlists, playcounts for artists and users are considered "experts" regarding the songs/artists that they individual songs, artist information, song information, and most listen to often, i.e. songs/artists that lie in the head portion of the importantly, the worldwide popularity of the site. long-tailed listening behavior. On the other hand, users are considered "novices" regarding the songs that reside in the tail portion. 4.1.4 The Mystery of Unpopular “Loved” Songs Last.fm provides users with an option to mark songs "loved" (Figure 3). This kind of feedback from users explicitly shows that a user enjoys a particular song. One would expect that these "loved" songs would all lie in the head portion of the listening behavior distribution. However, these songs that are marked "loved" can be found scattered throughout the entire distribution. Here, a paradox can be found: Why are some songs marked "loved" lying at the tail end of the playcount distribution? One would assume that a "loved" song would have a high playcount, but a quick inspection shows that this is not the case. Thus, an assumption that is made here, a key assumption in this algorithm, is that songs are marked "loved", yet remain in the tail, because the user is unfamiliar with that song/artist/genre, i.e. is a novice, but happened to stumble upon that particular song and liked it. Figure 4. The overview of the algorithm showing the concept of novices and experts. By using the listening behavior of experts to provide recommendations to novices, the recommended items will be novel to the user, contrasting to other recommendation systems that simply recommended artists/songs from the tail of the popularity distribution of items. In other words, while remaining novel to the specific user, the recommended items may or may not be in the far, unpopular end of the popularity distribution. In fact, even popular items that reside in the head of the popularity Figure 3. The "tail" portion of a random user’s playlist. distribution may be recommended, but the user may not be aware There are two songs marked "loved" by the user, but have of the recommended item since the recommendations were based only been played three times. on the user's tail portion of her listening behavior distribution, in Among the 21,688 users whose data was used for the algorithm, which the user was considered a novice. 78.3%, or 16,973 users, used the "love" function provided in In addition to being novel recommendations, the recommended Last.fm. Among the 16,973 users who utilized the "love" function, items will also be relevant to the user since the recommendations 77.8% of the users had "loved" songs in the tail portion of their were found using songs that the user had marked "loved", playlist's song distribution sorted by playcount. explicitly stating the user's view on that particular item, and then Upon closer inspection of the random user in Figure 3, the using collaborative filtering to find experts on those "loved" songs songs/artists in the "head" portion came from various genres such to find relevant recommendations. as electronic, hip-hop, and reggae. What they did have in common, however, was that they were all German artists, 4.2 Data including the user herself. Looking at the songs that were marked User data was collected in order to test the algorithm and evaluate "loved" but were not played often, we can see that they too come the results of the recommendations from early March to late April from different genres, but are both artists from the U.S. in 2010. Data was collected from the Last.fm website using a custom web crawler and the Last.fm API. The user data that was The previously mentioned assumption that fuels this algorithm collected included the songs that the user had listened to overall, was made after observing such occurrences in users' playlists. meaning the songs that the user listened to from the day he/she According to our assumption, we assume that the user, who is registered at Last.fm up until the day the data was collected. It German, is a novice in artists from the U.S. and stumbled across also included the playcount for each song, song title, artist name, several songs that she liked. However, she did not get to venture user ID, rank, and whether it was marked "loved" or not. The data similar songs and/or artists because she was unaware of which that was collected is summarized below in Table 1. artists/songs were similar. Table 1. Summary of amount of data collected 4.1.5 The Big Picture Data Count Once the basic assumptions are made and the new definition of Users 21,681 novices and experts are established, the concept of the recommendation algorithm can be explained. As shown in Figure Unique Songs 2,001,324 4, recommendations can be made to novices of certain song sets Songs from All Playlists 9,073,681 using the information that can be obtained by a group of experts that have those song sets in the head portion of their listening behavior distribution. 4.2.1 Last.fm API strength of the match between the songs in the expert's "head" and All the collected information, except the playlist history, was song set S. gathered via the Last.fm API. Although the algorithm could have queried the information in real-time, it was decided that having begin Recommendations REC (aGivenUser U1); local data would facilitate in quicker results. After fetching the do data, we had song titles and corresponding artist names of approximately 2 million songs. Result R1 := retrieveListeningBehaviorDistribution(U1); In addition to the user and song data collected with the Last.fm SongSet S1 := getSongsInLongTail(R1); API, artist popularity was also measured indirectly via the API. S1_loved := filterLovedSongs(S1); Because the Last.fm API did not provide the artist ranking for i := 2 to n (n: number of users) step 1 do directly through the API, we had to collect the number of Listeners and Plays, which were offered through the API. By Result Ri := retrieveListeningBehaviorDistribution(Ui) having the Listeners and Plays of a given artist, we would be able SongSet Si := getSongsInHead(Ri); to determine the overall ranking of popularity of the artists. This if (Si ∩ S1 ≠ ∅) do will be further explained in the next section. CandidateSongSet CSi := (Si ∪ S1) – (Si ∩ S1); 4.2.2 User Data Crawler Unfortunately, the Last.fm API query for a given user's listening incrementWeight(CSi); history returns the top 50 songs ordered by playcount. This was REC += CSi; od not adequate enough since the algorithm needed the entire playlist od in order to utilize the long tail of the playcount distribution. od In order to solve this problem, a custom crawler was implemented to collect the users' listening history (referred to as ‘playlist’ in printRecommendations(); this paper) and playcount information. Although this returned a end; 1. Pseudoalgorithm for proposed recommender Listing maximum of 500 results (Last.fm displays only top 500 songs in system. the playlist), the data was adequate to be divided into the short head and long tail and used in the algorithm. These recommendation candidates are accumulated in the global song set REC, and the weight of the candidate are incremented as Data on a total of 21,681 random users were crawled. The they are recommended to REC. Finally, the recommendations are playlists and the according information were also stored for each given to the user in the order of their weights. user, resulting in 21,681 playlists with a total of 9,073,681 songs. Because playlists from different users contain lots of duplicate 4.4 Parameters entries, the number of unique songs that were crawled, as stated The algorithm is quite flexible as it has many parameters that can above, was 2,001,324 unique songs. be changed, which greatly influences the recommended items to the user. Parameters that play a crucial role in the overall quality 4.3 Algorithm of the recommendations include: As shown in Listing 1, the user that will receive the • The size of the “head” of experts recommendations, whom we will call "novice" according to the • The size of the “tail” of novices algorithm's concept, is given as input to the algorithm. Then, the • Weights of recommended items listening behavior for the novice is retrieved using data available at Last.fm. As long as the user is not a new user and has been listening to his/her playlist, the playcount distribution of his/her 4.4.1 Expert Parameter playlist is more than likely to show a long-tailed distribution, in The parameter that influences the outcome most is the size of the which a small set of songs have been listened with a heavily "head" portion of the expert's listening behavior distribution. For example, if the value for this parameter is set to "10", a user is biased frequency while the remaining songs listened only occasionally. Since we are interested in the songs/artists that the considered an expert only if the top ten songs that s/he listened to given user is a novice on (i.e. songs marked “loved” in the long contains any number of songs from the set of songs that are tail), we discard the head portion of the distribution and from the marked "loved" in the novice's "tail" portion of his/her listening remaining songs, which are songs in the tail portion, we discard distribution. In other words, this parameter determines the qualification strictness on which users are considered experts. all songs except those that are explicitly labeled "loved" by the novice. These remaining songs, denoted by ‘S_1’, will be the song The lower the value, the harder it is for a given user to be set that will be used to create recommendations. considered an expert. Also, as the value is lower, the resulting Next, using the listening behavior of the other users from our recommendations are more novel, in contrast to when the values database, we find those that listen to the songs in song set S. In are higher, in which the resulting recommendations become those other words, we find the "experts" on song set S by finding users that are well-known. As the value is set higher, the recommendations represent those that are from the existing music that have a subset of song set S in the head portion of their listening behavior distribution. If such users exist, we compare the recommendations that are offered using traditional collaborative- filtering methods. songs in the “head” of their playcount distribution with song set S and use the remaining, non-overlapping songs as recommendation 4.4.2 Novice Parameter candidates and assign the weight for those items according to the The parameter that can be varied for the novice users is the size of the "tail" portion of the novice's listening behavior distribution. Opposite of the expert parameter, the novice parameter sets the user would also be one that was pre-calculated was extremely range of songs in the user's playlist that the user is a novice on. low. When the user returns, he/she is presented with two sets of Using loved songs that lay near the "head" portion may result in recommendations. songs that the user is aware of, leading to the recommendations being less novel to the novice. However, this parameter does not Recommendation Set 1 was the results of the algorithm with the Expert Parameter, the parameter that determines the size of the have as much influence as the expert parameter has because once the novice parameter is set, the entire range of songs are not used, "head" portion of the expert, set to 5. A value of 5 for the Expert but only those that are explicitly marked "loved" by the user. Parameter means that the algorithm is being very strict about which users are qualified to be experts. This produces dense novel items. Recommendation Set 2 was the results with the Expert 4.4.3 Weights of Recommended Items Parameter set to 10. A value of 10 tends to mix novel A formal set of rules and equations to assign weights to the recommendations and well-known recommendations, so is more recommended items can greatly change the songs that will be of a general setting that aims to resemble recommendations from presented to the user as recommendations. This is important Last.fm. After the user views the recommendations, a survey page because it is inappropriate to present the entire collection of songs was available to provide explicit feedback on the quality of the that result from the algorithm, as the number may vary depending recommendations given to them. on the two parameters above. Among the final song set that contains hundreds of candidate songs for recommendations, only a subset, namely the top N songs are presented to the user. Thus, assigning the appropriate weights for these candidates can ultimately influence the outcome of the recommended items. Currently, the algorithm uses a simple approach in which the weight is equal to the number of times a song is a member of both the head of an expert and tail of the novice. 5. USER TEST & EVALUATION There are many ways to evaluate a recommender system, both offline and online. A common online method to evaluate a recommender system is to generate test sets to be evaluated later [16]. Another popular method is to use cross-validation, in which the data is partitioned and used as test sets [17]. 5.1 Difficulties in Evaluating Novel Recommendations However, offline evaluations are not appropriate for recommender Figure 5. Screenshot of the recommended items at the user- systems where the recommendations of novel items are important. test website. Each facet of the recommended items are linked This is because when a truly novel item is actually recommended to pages at Last.fm for supplementary information to a user, meaning that the user does not already know about this item, it is extremely difficult for the user to evaluate the unknown Since the goal of the algorithm is to provide novel item without providing any additional information [18]. Because recommendations, there had to be an easy way for the user to of this, measuring novelty in the recommended items is a rather evaluate the recommended items, since it is assumed that if the challenging task, leaving no option but to carry out live user recommended items are indeed novel, then the user has no studies where the users explicitly indicate whether the provided knowledge about the item. Thus, each recommended item was recommendations were novel or not [19]. hyperlinked to the according page in Last.fm, as shown in Figure 5. Through these links, users were able to evaluate the Thus, in order to measure the novelty and relevance of the recommended items that were novel to them by visiting the linked recommended items, an online user test was carried out using a pages. Last.fm provides related information regarding specific fully functional website, including a section for explicit user songs, which include music videos, song previews, and even a feedback regarding the recommendations given to the users. radio for the song's artist. By utilizing these pages, users were able to listen to the songs that were recommended to them. 5.2 Design A fully functional website was created in order to perform an online evaluation of the recommendations for random users. On 5.3 Survey the website, a user has to sign-up and input his/her Last.fm ID. On the survey page, a set of five questions were given to the user, After receiving a new ID, the server runs the recommendation each regarding one of the two sets of recommendation results that algorithm on that particular Last.fm ID. Meanwhile, the user was were produced by the algorithm. The questions were answered on requested to come back shortly afterwards, while the a five-point Likert item. The final question was a subjective recommendations were being processed. The algorithm had to be question, asking for any comments or feedbacks on the run in real-time online because of the nature of it being heavily recommendations. The questions used in the survey are shown in dependent on the user information. Also, pre-calculating the Table 2. recommendations for users in the local database offline and then providing them online was unrealistic as the probability that a new Table 2. Questions used in the user survey. Q. 1 How would you rate the relevance of items? Q. 2 How would you rate the novelty of the recommended items? Q. 3 How would you rate the serendipity of the recommended items? Q. 4 How would you rate the recommendations overall? Q. 5 Provide any comments/feedback about the recommendations that were given to you. Figure 8. Comparison of the overall ratings for the two sets. 6. RESULTS & DISCUSSION A user survey was carried out online accompanying the online The results of the user test on the recommendations produced by music recommendation service because of the difficulties in the proposed algorithm are generally positive. The mean value for measuring novelty. A total of 24 users tested the the relevance of the items was 3.417 (on a 5 point scale) with a recommendations offered to them on the website. These users confidence interval of 0.390 (with alpha value of 0.05). The mean were random Last.fm users that had received private messages values of novelty and serendipity were also on the positive side (advertising the user test) through the Last.fm messaging system. with 3.667 and 3.625, respectively. The confidence intervals were The new recommendation system was also advertised on various 0.436 (alpha = 0.05) for novelty and 0.350 (alpha = 0.05) for Last.fm groups whose interests were in finding new music or serendipity. The overall rating of the recommender system had a those who were unsatisfied with current recommender systems mean value of 3.458 with a confidence interval of 0.263 (alpha = and their quite obvious recommendations. However, because the 0.05). In general, the results show that the proposed system has users had to answer two surveys for two different sets, some positive ratings and could be refined to produce better results. appeared to have quit abruptly after finishing the first set. As a The proposed system was rated higher in both novelty and result, only 11 users out of 24 completed the second survey. serendipity, compared to the second set of recommendations, The private messages were sent to random Last.fm users who which was a set of recommendations that was intended to imitate satisfied two conditions: 1) the user used the “loved” function existing systems such as Last.fm. with his/her playlist, 2) The last time the user logged in was not For this study, the parameters of the system were set with values more than two weeks ago from the day the private messages were that we thought produced the desired results after several sent. Despite the advertisements and private messages, the iterations of the algorithm. However, a full study focused on response rate was extremely low (< 10 %). The results are shown finding the optimal values for the parameters would be an in Figures 6-8. excellent follow-up study and would greatly enhance the recommendations of the system. The score for the novelty of recommended items could have been higher, because the algorithm did not check whether the recommended songs existed in the user's library before being offered. Thus, the user would see some artists that they were aware of. As implied above, it is quite easy to increase the percentage of novel items in the entire recommendation list: simply check whether the artist exists in the user's library and if it does, exclude it from the recommendations. However, this step was excluded from the algorithm deliberately to increase the confidence of the users on the proposed system. The basis for this Figure 6. Comparison of the relevance ratings for the two sets was [20], in which the authors found that users liked to see familiar items in the recommendations, which ultimately led to an increase of user confidence in the system. Checking to see if the user is familiar with the recommended item would produce more "dense" novel recommendations. Regarding the novelty of items, an unforeseen problem was revealed after the user test. One user commented, "I have most of the bands recommended on my computer, I just haven't given them much of a listen to. Grizzly Bear in particular..." The problem here is whether, in this user's case, Grizzly Bear is a novel recommendation. The user states that s/he did not listen to many of the recommended artists, although those artists were in his/her library. Because the algorithm depends on the playcount of the songs in a user's library it is totally blind to tracks that reside Figure 7. Comparison of the novelty ratings for the two sets in the library but have a playcount of 0. Thus, it recommends songs that it believes to be novel to the user, when it could in fact exist in the library already. Unsurprisingly, the novelty and [2] Nielsen Soundscan, State of the Industry. National serendipity ratings from this user were low (a score of 2 for each), Association of Recording Merchandisers, 2008 but the rating on the overall system was positive (a score of 4). [3] Goldberg, D., Nichols, D., Oki, B. M., and Terry, D. Using Clarifying such issues on what a novel item is would help Collaborative Filtering to Weave an Information Tapestry. improve the algorithm and the user's perception of the system. Commun. ACM, 35(12):61-70, 1992. ISSN 0001-0782. [4] Resnick P., Iacovou, N., Suchak, M., Bergstrom, P., and 7. FUTURE RESEARCH Riedl, J. Grouplens: An Open Architecture for Collaborative The most urgent and important future work on this particular Filtering of Netnews. In CSCW 1994, pages 175-186. study would be to find the ideal parameter settings to produce the desired recommendations. Due to the available time frame for this [5] Shardanand, U. and Maes. P. Social Information Filtering: study, much of the algorithm analysis including the settings of the Algorithms for Automating “word of mouth”. In CHI `95, parameters, were done manually, simply by iterating through pages 210-217. different settings and observing the results. By finding the [6] Hill, W., Stead, L., Rosenstein, M., and Furnas, G. optimized values on parameters such as Expert Head Size, User Recommending and Evaluating Choices in Virtual Tail Size, and Item Weights, the quality of the recommendations Community of Use. In CHI `95, pages 194-201. in novelty and relevance would be greatly enhanced. [7] Celma, O. and Lamere, P. If you like the Beatles you might Work on expanding the flexibility of the algorithm can also be like…: a tutorial on music recommendation. ACM done, creating additional parameters that bring changes to the Multimedia, pages 1157-1158, ACM, 2008. recommendations. More parameters would mean that the [8] Hu, X and Downie, J. S. Exploring mood metadata: algorithm could be suited for each user's needs, bringing the Relationship with genre, artist, and usage metadata. , possibility of creating an evermore-personalized set of September 2007. recommendations. [9] Eck, D., Lamere, P., Bertin-Mahieux, T., and Green, S. The overall system itself could be further developed to integrate Automatic generation of social tags for music content-based analysis for better results. Although the proposed recommendation. In Advances in Neural Information method is at its infancy, we believe that the only way to improve Processing Systems 20. MIT Press, 2008. it further (after it has fully developed independently) will be to [10] Symeonidis, P., Ruxanda, M. M., Nanopoulos, A., and incorporate a content-based algorithm to improve on its remaining Manolopoulos, Y. Ternary semantic analysis of social tags weaknesses as an algorithm that is based on user profiles. for personalized music recommendation. ISMIR, pages 219. 8. CONCLUSION [11] Celma, O. Foafing the music: Bridging the semantic gap in In this paper, a novel approach to recommending unfamiliar artists music recommendation. In Proceedings of the 5th relative to each user was proposed in order to tackle the problem International Semantic Web Conference, pages 927-934, of the high density of obvious items in the list of Springer, 2006. recommendations found in today's recommender systems. The key [12] Celma, O. and Herrera, P. A new approach to evaluating concept in this approach was that novel items did not always have novel recommendations. In RecSys `08: pages 179-186, New to be items that reside in the long tail of the popularity York, 2008. distribution. Although novel or unfamiliar items, more often than [13] Celma, O. and Cano, P. From hits to niches?: or how popular not, do indeed reside in the long tail of the popularity distribution, artists can bias music recommendation and discovery. In it is important to acknowledge that even well-known artists could NETFLIX '08: Proceedings of the 2nd KDD Workshop on be unknown to users who are (a) interested in different genres and Large-Scale Recommender Systems and the Netflix Prize (b) are in different cultures and/or countries. Competition, pages 1-8, New York, NY, USA, 2008. A system that produced recommendations was implemented and [14] Celma, O. Music Recommendation and Discovery in the was available online for users to use and rate. The Long Tail. PhD thesis. recommendations were produced using data collected from Last.fm. Results of the user surveys show that the proposed [15] Pampalk, E. and Goto, M. Musicrainbow: A new user system succeeds in providing novel recommendations to users, interface to discover artists using audio-based similarity and while keeping those items also relevant. This study shows the web-based labeling. ISMIR, pages 367-370, 2006. potential of such an approach to recommending novel items, while [16] Duda, R. O. and Hart, P. E. Pattern classification and scene maintaining a collaborative filtering algorithm without the support analysis. New York, 1973. from content-based algorithms. [17] Stone, M. Cross-validatory choice and assessment of statistical predictions. Roy. Stat. Soc., 36:111-147, 1974. 9. ACKNOWLEDGEMENTS [18] Herlocker, J. L., Konstan, J. A., and Riedl, J. T. Evaluating The authors would like to thank Professor Sangki ‘Steve’ Han at collaborative filtering recommendations. In Computer the Graduate School of Culture Technology, KAIST and Sheayun Supported Cooperative Work, pages 241-250, 2000. Lee for their valuable comments and feedback. [19] Schafer, J. B., Frankowski, D., Herlocker, J., and Sen, S. Collaborative filtering recommender systems, 2007. 10. REFERENCES [1] Chris Anderson. The Long Tail: Why the Future of Business [20] Singha, S., Rashmi, K. S., and Sinha, R. Beyond algorithms: Is Selling Less of More. Hyperion, 2006. ISBN 1401302378. An HCI perspective on recommender systems, 2001