=Paper= {{Paper |id=None |storemode=property |title=Music Recommendation in the Personal Long Tail: Using a Social-based Analysis of a User’s Long-Tailed Listening Behavior |pdfUrl=https://ceur-ws.org/Vol-633/wom2010_paper9.pdf |volume=Vol-633 }} ==Music Recommendation in the Personal Long Tail: Using a Social-based Analysis of a User’s Long-Tailed Listening Behavior== https://ceur-ws.org/Vol-633/wom2010_paper9.pdf

Music Recommendation in the Personal Long Tail:
Using a Social-based Analysis of a Userʼs Long-Tailed Listening Behavior
Kibeom Lee Woon Seung Yeo Kyogu Lee
Graduate School of Culture Graduate School of Culture Department of Digital Contents
Technology, KAIST Technology, KAIST Convergence, Seoul National
Daejeon, Korea Daejeon, Korea University
Seoul, Korea
kiblee@kaist.ac.kr woon@kaist.ac.kr
kglee@snu.ac.kr

ABSTRACT sales have moved from physical album sales to digital sales from
online stores. Currently, these services offer millions of tracks to
The online music industry has been growing at a fast pace, users, the catalog growing rapidly in size compared to the size
especially during the recent years. Even music sales have moved when the services were first announced. For instance, Amazon
from physical sales to digital sales, paving the way for millions of offered over 2 million songs to users when the music service
digital music becoming available for all users. However, this launched, but now offers over 11.8 million songs as of 2010.
produces information overload, where there are so many items Some notable online music stores, including Amazon, are
available due to, virtually, no storage limitations, it becomes Amazon MP3 (11,000,000+ songs), iTunes Store (12,000,000+
difficult for users to find what they are looking for. There have songs) and Rhapsody (9,000,000+ songs). Apart from music
been many approaches in recommending music to users to tackle stores, there are also music streaming services that offer millions
information overload. One successful approach is collaborative of songs, such as Lala (8,000,000 songs), Spotify (8,000,000
filtering, which is currently widely used in commercial services. songs), and Last.fm (7,000,000 songs).
Although collaborative filtering produces very satisfying results, it
becomes prone to popularity bias, recommending items that are These large numbers of songs available to users are a result of the
correct recommendations but quite "obvious". In this paper, a new Long Tail business model [1], contrary to only products that were
recommendation algorithm is proposed that is based on in demand being sold in stores. However, as a result, although
collaborative filtering and focuses on producing novel paradoxical, users have ended up listening to less music now that
recommendations. The algorithm produces novel, yet relevant, so much is available, simply because it is hard to find new and
recommendations to users based on analyzing the users' and the relevant music. For instance, digital track sales surpassed the 1
entire population's listening behaviors. An online user test shows billion sales mark in 2008. However, the Top 200 digital tracks
that the system is able to produce relevant and novel alone accounted for 17% of the entire track sales (184 million
recommendations and has greater potential with some minor sales) [2].
adjustments in parameters.
2. RELATED WORK
Categories and Subject Descriptors
I.1.2 [Computing Methodologies]: Algorithms – Nonalgebraic 2.1 Collaborative Filtering-based
algorithms, analysis of algorithms Recommender Systems
One of the earliest recommender systems based on collaborative
General Terms filtering is Tapestry [3]. Stemming from the need to handle
Algorithms increasing numbers of emails, Tapestry used explicit opinions of
people in a relatively small group, such as an office workgroup, to
filter out incoming email for a given user. However, a drawback
Keywords of this system was that users had to be familiar with the
Recommender systems, collaborative filtering, music preferences and opinions of other people in their network, which
recommendation is why Tapestry worked on small networks like the office.
A more general collaborative filtering approach was developed by
1. INTRODUCTION Resnick et al. called GroupLens [4]. The basic idea behind
With advances in the Internet, lower hardware costs, increasing GroupLens, which aimed to help users find news articles amongst
peer-to-peer networks, and the popularity of high-storage portable the vast available numbers, was that "people who agreed in the
media players, the online music industry has been growing past will probably agree again". Using this heuristic, the
rapidly, especially during the past few years. Gradually, music GroupLens system was able to predict the ratings of certain news
WOMRAD 2010 Workshop on Music Recommendation and Discovery, articles by a given user. An advantage that this provided was that
colocated with ACM RecSys 2010 (Barcelona, SPAIN) the collaborative filtering could be scaled, unlike Tapestry,
Copyright (c). This is an open-access article distributed under the terms because a user was not required to actually know other users that
of the Creative Commons Attribution License 3.0 Unported, which had similar preferences to him. This was done by the system,
permits unrestricted use, distribution, and reproduction in any medium, which gathered information on the ratings of users, naturally
provided the original author and source are credited.
creating another advantage of users being anonymous inside the proposed a method for generating social tags for music that lack
whole system. such tags [9]. Audio features of songs were analyzed and mapped
to tags, using a set of boosted classifiers. These were then utilized
Research related to, and including, the above studies focused on on untagged songs, populating them with the associated social
filtering a vast amount of text, which were in forms of emails, tags depending on the musical content. This enables unpopular
news, and messages, to those that were worth reading. Items
songs and/or new songs that have no social tags to be used in
would be given to the user with their prediction scores, aiding the music recommenders that use a social algorithm. It also tackles
user in which item to read next. The next wave of studies focused the cold start problem, a problem found in collaborative filtering-
on a more direct approach in recommending items. based recommender systems. Symeonidis et. al analyzed social
Ringo was a system developed to provide personalized music tags in order to tackle the problem of the multimodal use of music
recommendations [5]. It maintained a user's profile, a history of [10]. They developed a framework that modeled users, tags, and
ratings on various artists that were essentially explicit labelings on items, altogether. This was then used in recommending musical
which artists the user does or does not enjoy listening to. These items (artists, songs, and albums) to users by performing latent
profiles were matched by the system to calculate semantic analysis and dimensionality reduction according to each
recommendations on which artists had the highest probabilities of user's multimodal perception of music. Levy and Sandler inspect
being liked by the user. the seemingly ad hoc and informal language of tagging as a high-
volume source of semantic metadata for music. Results show that
While Ringo was focused on music items, Bellcore's tags establish a low-dimensional semantic space, being extremely
recommender system focused on movies [6]. Like Ringo, it used a polished at the track level, especially by artist and genre. Using
database of movie ratings by users and matched rating profiles to these results, the authors also introduce an interface for users to
provide recommendations by finding similar users and the movies browse by mood, through a two-dimensional subspace that
that they had watched and rated positively. Tests on the reliability represents musical emotion.
of the recommender system showed that three out of every four
recommendations would be rated highly by the user, and also Celma introduces a system that recommends music and the
showed that the system produced extremely more accurate relevant information associated with the recommended music
recommendations compared to nationally-known movie critics. [11]. The proposed system uses the Friend of a Friend and RSS
vocabularies for creating recommendations, taking in
While there were numerous advances and algorithms related to consideration the user's musical tastes and listening habits. The
collaborative filtering since then, the most well-known FOAF project provides protocols and a language to describe
collaborative filtering system today, however, is probably the homepage-like content and social networks, ultimately providing
system used in Amazon.com, an electronic commerce company the proposed system with the user's profile. The RSS vocabulary
that sells books, movies, music, etc. Amazon.com offers provides the system with syndicated content, which includes data
recommendations on items that are similar to the item being such as new album releases, album reviews, podcast sessions,
purchased, rather than finding similar users and then upcoming gigs, etc. Thus, the proposed system improves the
recommending the items those users have purchased. This existing recommendation systems by understanding the users
method, which is called item-to-item collaborative filtering, scales through psychological factors (personality, demographic
to extremely large datasets and generates satisfiable results. preferences, socioeconomics, situation, social relationships) and
explicit music preferences.
2.2 Collaborative Filtering-based
Recommender Systems for Music 3. LIMITATIONS OF COLLABORATIVE
Although the collaborative filtering-based approaches above were FILTERING
designed on specific items, the algorithms can be generalized and
applied to music recommendation. Hence, the results of such 3.1 Popularity Bias
algorithms applied to music are not much different than applied to Collaborative filtering-based recommender systems produce good
the original items. results and are used widely in commercial services such as
Amazon.com and Last.fm. However, collaborative filtering has
Apart from recommender systems that use data on the ratings some common limitations that occur naturally due to its roots
and/or purchases of items, there are other collaborative filtering- lying in the wisdom of crowds. One of the largest problems of
based recommender systems that take advantage of metadata collaborative filtering is popularity bias [12, 13].
produced by users that are found in music.
This happens when a popular item is associated with many other
[7] presents some examples of metadata used in such algorithms, related items. Users that interact with these items are then
which include reviews, lyrics, blogs, social tags, bios, and recommended the popular item. The system recommends the
playlists. Examples of commercial services that use such popular item often, leading to item purchases (or any other form
approaches are Rate Your Music (reviews), The Hype Machine of positive input from user) and as this item is purchased more, it
(blogs), last.fm (social tags), and playlist.com (playlists). is also recommended more. This loop, in which the rich become
Social tags, a representative product of online collaboration, has richer, creates popularity bias.
been used heavily in music recommendation systems. Hu and Naturally, as a result of the above feedback loop, the
Downie explored the mood metadata associated with songs and recommender system tends to bias its recommendations towards
their relationships with music genre, artist, and usage metadata popular items. Thus, the recommendations lose their novelty [12,
[8]. They found that the genre-mood relationships and artist-mood 13] and make it extremely difficult to recommend lesser-known
relationships showed consistencies, showing the potential of being artists.
utilized in automated mood classification tasks. Eck et. al
In Amazon.com, in which collaborative filtering is heavily used, 4.1 Concept of Recommendation Algorithm
the popularity bias can be seen when viewing the
recommendations that are offered when searching for popular 4.1.1 Changing Perspectives on Novel
items. For instance, the 98 recommendations that appear when Recommendations
searching for Harry Potter includes The Da Vinci Code, To Kill a While the goal of recommenders in general is to provide
Mockingbird and 28 other Harry Potter books and DVDs. In the recommendations that are novel and relevant to the user, as stated
case of music, searching for The Beatles' Revolver album results beforehand social-based recommendations, although relevant, fail
in 33 albums from The Beatles, out of a total of 97 in providing novel recommendations to users. In contrast, content-
recommendations, as shown in Figure 1. The other recommended based recommender systems work better in providing novel
items show well-known artists that user's, who are interested in recommendations because they are not affected by popularity or
The Beatles, will most likely have heard of already such as The any other social influence [15].
Rolling Stones, Led Zeppelin, and Neil Young. These Another method to provide novel recommendations to users is to
recommended artists are correct recommendations but fail to be use the long tail popularity distribution of the artists [7]. This idea
novel recommendations. can be applied to both content-based and social-based algorithms.
Content-based algorithms can use the long tail distribution to
recommend similar items based on content-analysis and also
found in the tail portion of the distribution. For social-based
algorithms, or collaborative filtering, the idea can be applied by
first obtaining the full list of recommendations and then removing
the recommendations that lie in the head portion of the
distribution. This would result in recommendations being novel to
the user, since it is unlikely that artists residing in the tail portion
of the distribution are known to the user.
However, although strictly recommending artists from the long
tail and avoiding recommending those that are obvious (those that
are located in the head portion of the distribution) have a high
probability of being novel recommendations, we need to take in
consideration that novel recommendations are relative to the user.
Figure 1. Recommendations from Amazon.com, which In other words, it is naive to assume that the user will be aware of
are all quite "obvious" recommendations, although certain artists just because they are in the head portion of the long
they are correct recommendations. tail distribution. Thus, the fact that even popular artists have a
Due to this popularity bias, a large portion of the recommended possibility of being novel recommendations to certain users must
not be overlooked.
items result in obvious recommendations that may be relevant to
easy-going, casual listeners, but not so helpful for enthusiastic 4.1.2 User Listening Behavior
music listeners, who have a high probability of already being As shown in Figure 2, which shows a random Last.fm user's
knowledgeable on the artists being recommended. playlist in descending order of playcount, the listening behavior
The number of high quality, or "correct", recommended items that shows a distribution that is similar to that of long-tail
are produced with collaborative filtering is verified by [14]. distributions. Users tend to listen to an extremely small portion of
However, the problem of popularity bias was also verified as the their playlists often while the remaining songs seldom get played.
amount of novel recommendations given to a user was the lowest Due to the data available, which is the top 500 played songs in the
for collaborative filtering in an experiment comparing user's playlist, all of the songs in the graph are played at least
collaborative filtering, content-based, and hybrid methods [14]. once.
Thus, it was confirmed that collaborative filtering results in less
percentage of novel songs but of higher quality.

4. ALGORITHM
In this section, we provide an algorithm that is based on
collaborative filtering, yet overcomes popularity bias, a natural
problem that arises from CF. Also, the algorithm focuses on Figure 2. The listening behavior of a user and his/her entire
providing recommendations that are novel to the user, while also playlist. Although not exact, the graph shows a long-tailed
remaining relevant. distribution where the majority of tracks are seldom played.
To implement this algorithm, user data from Last.fm, an Internet
service that provides users with streaming music via radio
4.1.3 Defining Experts and Novices
stations, was used. Reasons for selecting Last.fm was the readily
Using this long-tailed distribution of users' listening behaviors, the
available developer API and the various, massive amount of data
users can be divided into two groups: experts and novices. Here,
that was available such as user playlists, playcounts for artists and
users are considered "experts" regarding the songs/artists that they
individual songs, artist information, song information, and most
listen to often, i.e. songs/artists that lie in the head portion of the
importantly, the worldwide popularity of the site.
long-tailed listening behavior. On the other hand, users are
considered "novices" regarding the songs that reside in the tail
portion.
4.1.4 The Mystery of Unpopular “Loved” Songs
Last.fm provides users with an option to mark songs "loved"
(Figure 3). This kind of feedback from users explicitly shows that
a user enjoys a particular song. One would expect that these
"loved" songs would all lie in the head portion of the listening
behavior distribution. However, these songs that are marked
"loved" can be found scattered throughout the entire distribution.
Here, a paradox can be found: Why are some songs marked
"loved" lying at the tail end of the playcount distribution? One
would assume that a "loved" song would have a high playcount,
but a quick inspection shows that this is not the case. Thus, an
assumption that is made here, a key assumption in this algorithm,
is that songs are marked "loved", yet remain in the tail, because
the user is unfamiliar with that song/artist/genre, i.e. is a novice,
but happened to stumble upon that particular song and liked it.

Figure 4. The overview of the algorithm showing the
concept of novices and experts.

By using the listening behavior of experts to provide
recommendations to novices, the recommended items will be
novel to the user, contrasting to other recommendation systems
that simply recommended artists/songs from the tail of the
popularity distribution of items. In other words, while remaining
novel to the specific user, the recommended items may or may not
be in the far, unpopular end of the popularity distribution. In fact,
even popular items that reside in the head of the popularity
Figure 3. The "tail" portion of a random user’s playlist. distribution may be recommended, but the user may not be aware
There are two songs marked "loved" by the user, but have of the recommended item since the recommendations were based
only been played three times. on the user's tail portion of her listening behavior distribution, in
Among the 21,688 users whose data was used for the algorithm, which the user was considered a novice.
78.3%, or 16,973 users, used the "love" function provided in
In addition to being novel recommendations, the recommended
Last.fm. Among the 16,973 users who utilized the "love" function,
items will also be relevant to the user since the recommendations
77.8% of the users had "loved" songs in the tail portion of their
were found using songs that the user had marked "loved",
playlist's song distribution sorted by playcount.
explicitly stating the user's view on that particular item, and then
Upon closer inspection of the random user in Figure 3, the using collaborative filtering to find experts on those "loved" songs
songs/artists in the "head" portion came from various genres such to find relevant recommendations.
as electronic, hip-hop, and reggae. What they did have in
common, however, was that they were all German artists, 4.2 Data
including the user herself. Looking at the songs that were marked User data was collected in order to test the algorithm and evaluate
"loved" but were not played often, we can see that they too come the results of the recommendations from early March to late April
from different genres, but are both artists from the U.S. in 2010. Data was collected from the Last.fm website using a
custom web crawler and the Last.fm API. The user data that was
The previously mentioned assumption that fuels this algorithm collected included the songs that the user had listened to overall,
was made after observing such occurrences in users' playlists. meaning the songs that the user listened to from the day he/she
According to our assumption, we assume that the user, who is registered at Last.fm up until the day the data was collected. It
German, is a novice in artists from the U.S. and stumbled across also included the playcount for each song, song title, artist name,
several songs that she liked. However, she did not get to venture user ID, rank, and whether it was marked "loved" or not. The data
similar songs and/or artists because she was unaware of which that was collected is summarized below in Table 1.
artists/songs were similar.
Table 1. Summary of amount of data collected
4.1.5 The Big Picture Data Count
Once the basic assumptions are made and the new definition of
Users 21,681
novices and experts are established, the concept of the
recommendation algorithm can be explained. As shown in Figure Unique Songs 2,001,324
4, recommendations can be made to novices of certain song sets Songs from All Playlists 9,073,681
using the information that can be obtained by a group of experts
that have those song sets in the head portion of their listening
behavior distribution.
4.2.1 Last.fm API strength of the match between the songs in the expert's "head" and
All the collected information, except the playlist history, was song set S.
gathered via the Last.fm API. Although the algorithm could have
queried the information in real-time, it was decided that having begin Recommendations REC (aGivenUser U1);
local data would facilitate in quicker results. After fetching the do
data, we had song titles and corresponding artist names of
approximately 2 million songs. Result R1 := retrieveListeningBehaviorDistribution(U1);

In addition to the user and song data collected with the Last.fm SongSet S1 := getSongsInLongTail(R1);
API, artist popularity was also measured indirectly via the API. S1_loved := filterLovedSongs(S1);
Because the Last.fm API did not provide the artist ranking for i := 2 to n (n: number of users) step 1 do
directly through the API, we had to collect the number of
Listeners and Plays, which were offered through the API. By Result Ri := retrieveListeningBehaviorDistribution(Ui)
having the Listeners and Plays of a given artist, we would be able SongSet Si := getSongsInHead(Ri);
to determine the overall ranking of popularity of the artists. This
if (Si ∩ S1 ≠ ∅) do
will be further explained in the next section.
CandidateSongSet CSi := (Si ∪ S1) – (Si ∩ S1);
4.2.2 User Data Crawler
Unfortunately, the Last.fm API query for a given user's listening incrementWeight(CSi);
history returns the top 50 songs ordered by playcount. This was REC += CSi; od
not adequate enough since the algorithm needed the entire playlist
od
in order to utilize the long tail of the playcount distribution.
od
In order to solve this problem, a custom crawler was implemented
to collect the users' listening history (referred to as ‘playlist’ in printRecommendations();
this paper) and playcount information. Although this returned a end; 1. Pseudoalgorithm for proposed recommender
Listing
maximum of 500 results (Last.fm displays only top 500 songs in system.
the playlist), the data was adequate to be divided into the short
head and long tail and used in the algorithm. These recommendation candidates are accumulated in the global
song set REC, and the weight of the candidate are incremented as
Data on a total of 21,681 random users were crawled. The they are recommended to REC. Finally, the recommendations are
playlists and the according information were also stored for each given to the user in the order of their weights.
user, resulting in 21,681 playlists with a total of 9,073,681 songs.
Because playlists from different users contain lots of duplicate 4.4 Parameters
entries, the number of unique songs that were crawled, as stated The algorithm is quite flexible as it has many parameters that can
above, was 2,001,324 unique songs. be changed, which greatly influences the recommended items to
the user. Parameters that play a crucial role in the overall quality
4.3 Algorithm of the recommendations include:
As shown in Listing 1, the user that will receive the • The size of the “head” of experts
recommendations, whom we will call "novice" according to the • The size of the “tail” of novices
algorithm's concept, is given as input to the algorithm. Then, the • Weights of recommended items
listening behavior for the novice is retrieved using data available
at Last.fm. As long as the user is not a new user and has been
listening to his/her playlist, the playcount distribution of his/her
4.4.1 Expert Parameter
playlist is more than likely to show a long-tailed distribution, in The parameter that influences the outcome most is the size of the
which a small set of songs have been listened with a heavily "head" portion of the expert's listening behavior distribution. For
example, if the value for this parameter is set to "10", a user is
biased frequency while the remaining songs listened only
occasionally. Since we are interested in the songs/artists that the considered an expert only if the top ten songs that s/he listened to
given user is a novice on (i.e. songs marked “loved” in the long contains any number of songs from the set of songs that are
tail), we discard the head portion of the distribution and from the marked "loved" in the novice's "tail" portion of his/her listening
remaining songs, which are songs in the tail portion, we discard distribution. In other words, this parameter determines the
qualification strictness on which users are considered experts.
all songs except those that are explicitly labeled "loved" by the
novice. These remaining songs, denoted by ‘S_1’, will be the song The lower the value, the harder it is for a given user to be
set that will be used to create recommendations. considered an expert. Also, as the value is lower, the resulting
Next, using the listening behavior of the other users from our recommendations are more novel, in contrast to when the values
database, we find those that listen to the songs in song set S. In are higher, in which the resulting recommendations become those
other words, we find the "experts" on song set S by finding users that are well-known. As the value is set higher, the
recommendations represent those that are from the existing music
that have a subset of song set S in the head portion of their
listening behavior distribution. If such users exist, we compare the recommendations that are offered using traditional collaborative-
filtering methods.
songs in the “head” of their playcount distribution with song set S
and use the remaining, non-overlapping songs as recommendation 4.4.2 Novice Parameter
candidates and assign the weight for those items according to the The parameter that can be varied for the novice users is the size of
the "tail" portion of the novice's listening behavior distribution.
Opposite of the expert parameter, the novice parameter sets the user would also be one that was pre-calculated was extremely
range of songs in the user's playlist that the user is a novice on. low. When the user returns, he/she is presented with two sets of
Using loved songs that lay near the "head" portion may result in recommendations.
songs that the user is aware of, leading to the recommendations
being less novel to the novice. However, this parameter does not Recommendation Set 1 was the results of the algorithm with the
Expert Parameter, the parameter that determines the size of the
have as much influence as the expert parameter has because once
the novice parameter is set, the entire range of songs are not used, "head" portion of the expert, set to 5. A value of 5 for the Expert
but only those that are explicitly marked "loved" by the user. Parameter means that the algorithm is being very strict about
which users are qualified to be experts. This produces dense novel
items. Recommendation Set 2 was the results with the Expert
4.4.3 Weights of Recommended Items Parameter set to 10. A value of 10 tends to mix novel
A formal set of rules and equations to assign weights to the recommendations and well-known recommendations, so is more
recommended items can greatly change the songs that will be of a general setting that aims to resemble recommendations from
presented to the user as recommendations. This is important Last.fm. After the user views the recommendations, a survey page
because it is inappropriate to present the entire collection of songs was available to provide explicit feedback on the quality of the
that result from the algorithm, as the number may vary depending recommendations given to them.
on the two parameters above. Among the final song set that
contains hundreds of candidate songs for recommendations, only
a subset, namely the top N songs are presented to the user. Thus,
assigning the appropriate weights for these candidates can
ultimately influence the outcome of the recommended items.
Currently, the algorithm uses a simple approach in which the
weight is equal to the number of times a song is a member of both
the head of an expert and tail of the novice.

5. USER TEST & EVALUATION
There are many ways to evaluate a recommender system, both
offline and online. A common online method to evaluate a
recommender system is to generate test sets to be evaluated later
[16]. Another popular method is to use cross-validation, in which
the data is partitioned and used as test sets [17].

5.1 Difficulties in Evaluating Novel
Recommendations
However, offline evaluations are not appropriate for recommender Figure 5. Screenshot of the recommended items at the user-
systems where the recommendations of novel items are important. test website. Each facet of the recommended items are linked
This is because when a truly novel item is actually recommended to pages at Last.fm for supplementary information
to a user, meaning that the user does not already know about this
item, it is extremely difficult for the user to evaluate the unknown Since the goal of the algorithm is to provide novel
item without providing any additional information [18]. Because recommendations, there had to be an easy way for the user to
of this, measuring novelty in the recommended items is a rather evaluate the recommended items, since it is assumed that if the
challenging task, leaving no option but to carry out live user recommended items are indeed novel, then the user has no
studies where the users explicitly indicate whether the provided knowledge about the item. Thus, each recommended item was
recommendations were novel or not [19]. hyperlinked to the according page in Last.fm, as shown in Figure
5. Through these links, users were able to evaluate the
Thus, in order to measure the novelty and relevance of the recommended items that were novel to them by visiting the linked
recommended items, an online user test was carried out using a pages. Last.fm provides related information regarding specific
fully functional website, including a section for explicit user songs, which include music videos, song previews, and even a
feedback regarding the recommendations given to the users. radio for the song's artist. By utilizing these pages, users were able
to listen to the songs that were recommended to them.
5.2 Design
A fully functional website was created in order to perform an
online evaluation of the recommendations for random users. On
5.3 Survey
the website, a user has to sign-up and input his/her Last.fm ID. On the survey page, a set of five questions were given to the user,
After receiving a new ID, the server runs the recommendation each regarding one of the two sets of recommendation results that
algorithm on that particular Last.fm ID. Meanwhile, the user was were produced by the algorithm. The questions were answered on
requested to come back shortly afterwards, while the a five-point Likert item. The final question was a subjective
recommendations were being processed. The algorithm had to be question, asking for any comments or feedbacks on the
run in real-time online because of the nature of it being heavily recommendations. The questions used in the survey are shown in
dependent on the user information. Also, pre-calculating the Table 2.
recommendations for users in the local database offline and then
providing them online was unrealistic as the probability that a new
Table 2. Questions used in the user survey.
Q. 1 How would you rate the relevance of items?
Q. 2 How would you rate the novelty of the recommended
items?
Q. 3 How would you rate the serendipity of the recommended
items?
Q. 4 How would you rate the recommendations overall?
Q. 5 Provide any comments/feedback about the
recommendations that were given to you.

Figure 8. Comparison of the overall ratings for the two sets.
6. RESULTS & DISCUSSION
A user survey was carried out online accompanying the online The results of the user test on the recommendations produced by
music recommendation service because of the difficulties in the proposed algorithm are generally positive. The mean value for
measuring novelty. A total of 24 users tested the the relevance of the items was 3.417 (on a 5 point scale) with a
recommendations offered to them on the website. These users confidence interval of 0.390 (with alpha value of 0.05). The mean
were random Last.fm users that had received private messages values of novelty and serendipity were also on the positive side
(advertising the user test) through the Last.fm messaging system. with 3.667 and 3.625, respectively. The confidence intervals were
The new recommendation system was also advertised on various 0.436 (alpha = 0.05) for novelty and 0.350 (alpha = 0.05) for
Last.fm groups whose interests were in finding new music or serendipity. The overall rating of the recommender system had a
those who were unsatisfied with current recommender systems mean value of 3.458 with a confidence interval of 0.263 (alpha =
and their quite obvious recommendations. However, because the 0.05). In general, the results show that the proposed system has
users had to answer two surveys for two different sets, some positive ratings and could be refined to produce better results.
appeared to have quit abruptly after finishing the first set. As a The proposed system was rated higher in both novelty and
result, only 11 users out of 24 completed the second survey. serendipity, compared to the second set of recommendations,
The private messages were sent to random Last.fm users who which was a set of recommendations that was intended to imitate
satisfied two conditions: 1) the user used the “loved” function existing systems such as Last.fm.
with his/her playlist, 2) The last time the user logged in was not For this study, the parameters of the system were set with values
more than two weeks ago from the day the private messages were that we thought produced the desired results after several
sent. Despite the advertisements and private messages, the iterations of the algorithm. However, a full study focused on
response rate was extremely low (< 10 %). The results are shown finding the optimal values for the parameters would be an
in Figures 6-8. excellent follow-up study and would greatly enhance the
recommendations of the system.
The score for the novelty of recommended items could have been
higher, because the algorithm did not check whether the
recommended songs existed in the user's library before being
offered. Thus, the user would see some artists that they were
aware of. As implied above, it is quite easy to increase the
percentage of novel items in the entire recommendation list:
simply check whether the artist exists in the user's library and if it
does, exclude it from the recommendations. However, this step
was excluded from the algorithm deliberately to increase the
confidence of the users on the proposed system. The basis for this
Figure 6. Comparison of the relevance ratings for the two sets was [20], in which the authors found that users liked to see
familiar items in the recommendations, which ultimately led to an
increase of user confidence in the system. Checking to see if the
user is familiar with the recommended item would produce more
"dense" novel recommendations.
Regarding the novelty of items, an unforeseen problem was
revealed after the user test. One user commented, "I have most of
the bands recommended on my computer, I just haven't given
them much of a listen to. Grizzly Bear in particular..." The
problem here is whether, in this user's case, Grizzly Bear is a
novel recommendation. The user states that s/he did not listen to
many of the recommended artists, although those artists were in
his/her library. Because the algorithm depends on the playcount of
the songs in a user's library it is totally blind to tracks that reside
Figure 7. Comparison of the novelty ratings for the two sets in the library but have a playcount of 0. Thus, it recommends
songs that it believes to be novel to the user, when it could in fact
exist in the library already. Unsurprisingly, the novelty and [2] Nielsen Soundscan, State of the Industry. National
serendipity ratings from this user were low (a score of 2 for each), Association of Recording Merchandisers, 2008
but the rating on the overall system was positive (a score of 4). [3] Goldberg, D., Nichols, D., Oki, B. M., and Terry, D. Using
Clarifying such issues on what a novel item is would help Collaborative Filtering to Weave an Information Tapestry.
improve the algorithm and the user's perception of the system. Commun. ACM, 35(12):61-70, 1992. ISSN 0001-0782.
[4] Resnick P., Iacovou, N., Suchak, M., Bergstrom, P., and
7. FUTURE RESEARCH Riedl, J. Grouplens: An Open Architecture for Collaborative
The most urgent and important future work on this particular Filtering of Netnews. In CSCW 1994, pages 175-186.
study would be to find the ideal parameter settings to produce the
desired recommendations. Due to the available time frame for this [5] Shardanand, U. and Maes. P. Social Information Filtering:
study, much of the algorithm analysis including the settings of the Algorithms for Automating “word of mouth”. In CHI `95,
parameters, were done manually, simply by iterating through pages 210-217.
different settings and observing the results. By finding the [6] Hill, W., Stead, L., Rosenstein, M., and Furnas, G.
optimized values on parameters such as Expert Head Size, User Recommending and Evaluating Choices in Virtual
Tail Size, and Item Weights, the quality of the recommendations Community of Use. In CHI `95, pages 194-201.
in novelty and relevance would be greatly enhanced. [7] Celma, O. and Lamere, P. If you like the Beatles you might
Work on expanding the flexibility of the algorithm can also be like…: a tutorial on music recommendation. ACM
done, creating additional parameters that bring changes to the Multimedia, pages 1157-1158, ACM, 2008.
recommendations. More parameters would mean that the [8] Hu, X and Downie, J. S. Exploring mood metadata:
algorithm could be suited for each user's needs, bringing the Relationship with genre, artist, and usage metadata. ,
possibility of creating an evermore-personalized set of September 2007.
recommendations.
[9] Eck, D., Lamere, P., Bertin-Mahieux, T., and Green, S.
The overall system itself could be further developed to integrate Automatic generation of social tags for music
content-based analysis for better results. Although the proposed recommendation. In Advances in Neural Information
method is at its infancy, we believe that the only way to improve Processing Systems 20. MIT Press, 2008.
it further (after it has fully developed independently) will be to [10] Symeonidis, P., Ruxanda, M. M., Nanopoulos, A., and
incorporate a content-based algorithm to improve on its remaining Manolopoulos, Y. Ternary semantic analysis of social tags
weaknesses as an algorithm that is based on user profiles. for personalized music recommendation. ISMIR, pages 219.
8. CONCLUSION [11] Celma, O. Foafing the music: Bridging the semantic gap in
In this paper, a novel approach to recommending unfamiliar artists music recommendation. In Proceedings of the 5th
relative to each user was proposed in order to tackle the problem International Semantic Web Conference, pages 927-934,
of the high density of obvious items in the list of Springer, 2006.
recommendations found in today's recommender systems. The key [12] Celma, O. and Herrera, P. A new approach to evaluating
concept in this approach was that novel items did not always have novel recommendations. In RecSys `08: pages 179-186, New
to be items that reside in the long tail of the popularity York, 2008.
distribution. Although novel or unfamiliar items, more often than
[13] Celma, O. and Cano, P. From hits to niches?: or how popular
not, do indeed reside in the long tail of the popularity distribution,
artists can bias music recommendation and discovery. In
it is important to acknowledge that even well-known artists could
NETFLIX '08: Proceedings of the 2nd KDD Workshop on
be unknown to users who are (a) interested in different genres and
Large-Scale Recommender Systems and the Netflix Prize
(b) are in different cultures and/or countries.
Competition, pages 1-8, New York, NY, USA, 2008.
A system that produced recommendations was implemented and [14] Celma, O. Music Recommendation and Discovery in the
was available online for users to use and rate. The Long Tail. PhD thesis.
recommendations were produced using data collected from
Last.fm. Results of the user surveys show that the proposed [15] Pampalk, E. and Goto, M. Musicrainbow: A new user
system succeeds in providing novel recommendations to users, interface to discover artists using audio-based similarity and
while keeping those items also relevant. This study shows the web-based labeling. ISMIR, pages 367-370, 2006.
potential of such an approach to recommending novel items, while [16] Duda, R. O. and Hart, P. E. Pattern classification and scene
maintaining a collaborative filtering algorithm without the support analysis. New York, 1973.
from content-based algorithms.
[17] Stone, M. Cross-validatory choice and assessment of
statistical predictions. Roy. Stat. Soc., 36:111-147, 1974.
9. ACKNOWLEDGEMENTS [18] Herlocker, J. L., Konstan, J. A., and Riedl, J. T. Evaluating
The authors would like to thank Professor Sangki ‘Steve’ Han at collaborative filtering recommendations. In Computer
the Graduate School of Culture Technology, KAIST and Sheayun Supported Cooperative Work, pages 241-250, 2000.
Lee for their valuable comments and feedback.
[19] Schafer, J. B., Frankowski, D., Herlocker, J., and Sen, S.
Collaborative filtering recommender systems, 2007.
10. REFERENCES
[1] Chris Anderson. The Long Tail: Why the Future of Business [20] Singha, S., Rashmi, K. S., and Sinha, R. Beyond algorithms:
Is Selling Less of More. Hyperion, 2006. ISBN 1401302378. An HCI perspective on recommender systems, 2001