<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Content-based music recommendation based on user preference examples</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dmitry Bogdanov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martín Haro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ferdinand Fuhrmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emilia Gómez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Perfecto Herrera Music Technology Group</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Algorithms</institution>
          ,
          <addr-line>Measurement, Human Factors</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universitat Pompeu Fabra Roc Boronat</institution>
          ,
          <addr-line>138, 08018 Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recommending relevant and novel music to a user is one of the central applied problems in music information research. In the present work we propose three content-based approaches to this task. Starting from an explicit set of music tracks provided by the user as evidence of his/her music preferences, we infer high-level semantic descriptors, covering di erent musical facets, such as genre, culture, moods, instruments, rhythm, and tempo. On this basis, two of the proposed approaches employ a semantic music similarity measure to generate recommendations. The third approach creates a probabilistic model of the user's preference in the semantic domain. We evaluate these approaches against two recommenders using state-of-the-art timbral features, and two contextual baselines, one exploiting simple genre categories, the other using similarity information obtained from collaborative ltering. We conduct a listening experiment to assess familiarity, liking and further listening intentions for the provided recommendations. According to the obtained results, we found our semantic approaches to outperform the low-level timbral baselines together with the genre-based recommender. Though the proposed approaches could not reach a performance comparable to the involved collaborative ltering system, they yielded acceptable results in terms of successful novel recommendations. We conclude that the proposed semantic approaches are suitable for music discovery especially in the long tail. H.3.3 [Information Storage And Retrieval]: Information Search and Retrieval|information ltering, selection process, retrieval models; H.5.5 [Information Interfaces And Presentation]: Sound and Music Computing|modeling, systems General Terms</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>WOMRAD 2010 Workshop on Music Recommendation and Discovery,
colocated with ACM RecSys 2010 (Barcelona, SPAIN)
Copyright c . This is an open-access article distributed under the terms
of the Creative Commons Attribution License 3.0 Unported, which permits
unrestricted use, distribution, and reproduction in any medium, provided
the original author and source are credited.
recommender systems, user modeling, evaluation, music
recommendation, content-based, collaborative ltering</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>Rapid growth of digital technologies, the Internet, and the
multimedia industry has provoked a huge information
overload and a necessity of e ective information ltering
systems, and in particular recommendation systems. In the
case of the digital music industry, current major Internet
stores contain millions of tracks, which complicates search,
retrieval, and discovery of music relevant for a user. At
present, the majority of industrial systems provide means
for contextual manual search based on information about
artist names, album or track titles, and additional
semantic properties, which are mostly limited to genres. Using
this information music collections are becoming browsable
by textual queries and tags.</p>
      <p>
        Besides, current research within the music information
retrieval (MIR) community achieved relative success in the
task of measuring music similarity [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], striving for
facilitation of manual search, and automatization of music
recommendation. To this extent, music tracks can be
represented in a certain feature space lled in with
contextual information, extracted from available metadata, user
ratings [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], and social tags [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] (i.e. the contextual
approach), or with information, extracted from audio
content itself [
        <xref ref-type="bibr" rid="ref16 ref17 ref21 ref4 ref6">4, 6, 16, 17, 21</xref>
        ] (i.e. the content based approach).
Thus, it becomes possible to de ne many similarity
measures (or distances1) between tracks in a music collections,
and therefore to browse collections and to recommend music
using queries-by-example. Still the majority of the
contentbased distances employ solely rough timbral information,
such as Mel frequency cepstral coe cients (MFCCs), and
sometimes temporal information. Additionally, current
systems provide basic means for personalization, obtaining a
user's pro le in form of consuming statistics, music
ratings, and other types of behavioral information, and
operating with this information generally in a collaborative
ltering manner [
        <xref ref-type="bibr" rid="ref2 ref8 ref9">2, 8, 9</xref>
        ]. While more sophisticated
personalization approaches which explore the nature of preference
behavior using both contextual information and audio
content information are necessary, they are still in their
infancy [13{15, 19, 22] and require more research attention.
      </p>
      <p>
        Generally, we can discern two types of user interaction
1We will pragmatically use the term distance to refer to any
dissimilarity measurement between tracks.
with a music retrieval system: (i) music search, when a user
has an initial idea of what he/she wants, and operates with
metadata to query for a speci c artist, album, genre, etc., or
provides a query-by-example in the case of similarity-based
retrieval, and (ii) music discovery, when a user does not know
his/her exact needs and prefers to browse an available music
collection on purpose to discover music which is relevant in
respect to his/her musical preferences. Querying by example
requires a user to explicitly de ne the \direction of search",
and is not perfectly suited for discovery. On the other hand,
querying by broad semantic categories (such as genres) can
provide an excessive amount of potentially relevant data,
containing thousands of tracks. While for both types of
interaction contextual information can be used, it is found
that contextual approaches perform well on popular items,
but fail in the long tail due to the lack of available user
ratings, social tags, and metadata for unpopular items [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Instead, content-based information extracted from audio can
help to overcome this problem.
      </p>
      <p>We focus the present work on content-based music
recommendation, concerning both relevance and novelty (i.e.
discovery) aspects. We do not consider the issue of balancing
both aspects according to a user's current needs. Instead,
we present a way to infer user preferences from audio
content, and a number of recommendation approaches, which
are challenged to provide both relevant and novel
recommendations to a user. We propose a procedure to generate such
recommendations based on an explicit set of music tracks
de ned by a given user as evidence of his/her musical
preferences. Up to our knowledge this recommendation approach
has never been evaluated before. We ask the user to provide
such a preference set (Sec. 2.1) in order to extract low-level
audio features as well as infer high-level semantic
information from the audio of each of the tracks (Sec. 2.2). We then
consider three di erent approaches operating on a semantic
domain to summarize the retrieved descriptions and
generate music recommendations. Two of them have a music
similarity measure in their core (Secs. 2.3.1, and 2.3.2), while
the third approach applies a probabilistic model to infer the
underlying structure of the user's preferences (Sec. 2.3.3).
Alternatively, in order to evaluate the generated
recommendations, we employ two approaches, which apply the same
ideas on low-level timbral features (Secs. 2.3.4, and 2.3.5),
and two contextual ones including a state-of-the-art
collaborative ltering recommendation system (Sec. 2.3.6), and a
naive genre-based recommender baseline (Sec. 2.3.7). We
evaluate all considered approaches by gathering music data
from 12 participants (Sec. 3.1), and carrying out a listening
experiment to assess familiarity, liking and further
listening intentions of the provided recommendations (Sec. 3.2),
and present the obtained results (Sec. 3.3). Finally, we draw
conclusions about the proposed procedure and discuss future
research directions (Sec. 4).
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>METHODOLOGY</title>
    </sec>
    <sec id="sec-4">
      <title>Preference Examples Collection</title>
      <p>
        As a rst step, we ask the user to gather the minimal
set of music tracks su cient to grasp or convey her/his
music preferences [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] (the user's preference set). We do not
promise or mention giving music recommendations in the
future, which could bias the selection of representative
music. The user provides a folder with the selected tracks in
audio format (e.g. mp3), and all the needed information to
unambiguously identify and retrieve each track (i.e. artist,
piece title, edition, etc.). For the content-based approaches
which we will consider, single music pieces are informative
by themselves without any additional context, such as artist
names and track titles. Still we ask the user to provide this
context to be able to make comparison with contextual
approaches. We also ask the user for additional information,
including personal data (gender, age, interest for music,
musical background), a description of the strategy followed to
select the music pieces, and the way he/she would describe
his/her musical preferences. This information will help us
for further analysis.
2.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Audio Content Analysis</title>
      <p>
        We now describe the procedure of obtaining meaningful
low-level and high-level descriptions of each music track from
the user's preference set within the used audio content
analysis system. We follow [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to obtain such descriptions. To
this extent, for each track we calculate a low-level feature
representation using an in-house audio analysis tool2. In
total it provides over 60 commonly used low-level audio
features, characterizing global properties of the given tracks,
including timbral, temporal, and tonal features among others.
They include inharmonicity, odd-to-even harmonic energy
ratio, tristimuli, spectral centroid, spread, skewness,
kurtosis, decrease, atness, crest, and roll-o factors, MFCCs,
spectral energy bands, zero-crossing rate, spectral and tonal
complexities, transposed and untransposed harmonic pitch
class pro les, key strength, tuning, chords, beats per minute
and onsets.
      </p>
      <p>
        We do not use the described low-level features explicitly in
the approaches we will consider, except for MFCCs, used to
construct two of the baseline systems. Instead, we use them
to infer semantic descriptors. For that reason, we perform a
regression by suitably trained classi ers producing di erent
semantic dimensions such as genre, culture, moods, and
instrumentation. We use standard multi-class support vector
machines (SVMs) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], employ 14 ground truth music
collections (including full tracks and excerpts) and execute 14
classi cation tasks corresponding to these data. The regression
results form a high-level descriptor space, which contains
the probability estimates for each class of each SVM
classier. With the described procedure we obtain 56 high-level
descriptors, including categories of genre, culture, moods,
instruments, rhythm and tempo. For more detailed
information regarding the list of low-level features, the collections
used for regression, and SVM implementation see [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and
references therein.
2.3
      </p>
    </sec>
    <sec id="sec-6">
      <title>Recommendation Approaches</title>
      <p>We now consider di erent approaches to music
recommendation, which are based on the retrieved descriptions of the
user's preference set. The approaches we propose include
three methods working on semantic descriptors. In
comparison, we consider two low-level baseline approaches working
on MFCCs, and two contextual ones.</p>
      <p>All approaches are used to retrieve 20 music tracks from a
given music collection as the recommendations for the user
except one of the contextual approaches (Sec. 2.3.6), which
operates on Last.fm3 music collection.
2http://mtg.upf.edu/technologies/essentia
3http://last.fm
2.3.1</p>
      <sec id="sec-6-1">
        <title>Semantic distance from the mean (SEM-MEAN) 2.3.4</title>
      </sec>
      <sec id="sec-6-2">
        <title>Timbral distance from all tracks (MFCC-ALL)</title>
        <p>
          As the simplest approach, we propose the representation
of the user as a single point in the semantic descriptor space.
As such, we compute the mean point for the user's
preference set. We employ the semantic distance, presented and
validated in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. It has been shown to perform with positive
user satisfaction, being comparable to well-known low-level
timbral distances, based on MFCCs, while operating in a
high-level semantic space. More concretely, the distance
operates directly on the retrieved semantic descriptors, and
is de ned as a weighted Pearson correlation distance [
          <xref ref-type="bibr" rid="ref1 ref6">1, 6</xref>
          ].
Given a music collection, we rank the tracks according to
the semantic distance to the user point (i.e. the mean point
of the user's preference set) and return 20 nearest tracks as
recommendations.
2.3.2
        </p>
      </sec>
      <sec id="sec-6-3">
        <title>Semantic distance from all tracks (SEM-ALL)</title>
        <p>Alternatively, we do not simplify the user representation
to one point but instead consider all tracks from the user's
preference set. We use the same semantic distance as for
SEM-MEAN. For each track from the user's preference set,
we compute the distances to the tracks in a given music
collection, and mark 20 nearest tracks as candidates. We
then rank all selected candidates according to the obtained
distances, omit possible duplicates, and return the tracks
corresponding to the lowest 20 distances as
recommendations. In this case, we take into account all possible areas of
preferences, explicitly speci ed by the user, while searching
for the most similar tracks.
2.3.3</p>
      </sec>
      <sec id="sec-6-4">
        <title>Semantic Gaussian mixture model (SEM-GMM)</title>
        <p>
          Finally, we propose the representation of the user as a
probability density of his/her preferences on the semantic
space. For that purpose, we use the retrieved semantic
descriptors, and employ a Gaussian mixture model (GMM) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ],
which estimates a probability density as a weighted sum of
a given number of simple Gaussian densities (components).
We initialize the GMM by k-mean clustering, and train the
model using the expectation-maximization algorithm. The
number of centers for the k-means are estimated by Bayesian
information criterion [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. For computational reasons, we
consider a number of components in the range between 1
and 20. Once we have our model trained, we compute
probability density for each of the tracks in a given music
collection. We rank the tracks according to the obtained density
values, and return 20 most probable tracks as
recommendations under the assumption of a uniform distribution of the
tracks in the universe within the semantic space.
        </p>
        <p>The advantage of SEM-GMM approach is that the model
takes the relevance of the semantic attributes within the
user's preferences into account, accenting areas preferred by
the user in the semantic space. Thus, the recommended
tracks would generally comprise of the most
characteristic semantic properties, inferred from the user's preference
set. Meanwhile, SEM-ALL is blind to the underlying
semantic structure of preferences, and SEM-MEAN only
provides very rough approximation. Still, in the case when the
user's tracks are evenly distributed in the semantic space,
SEM-GMM may have insu cient expressive power due to
the assigned limit of Gaussian components, discriminating
certain preference areas. Nonetheless we assume gaussianity
of the user's preference patterns.</p>
        <p>
          For comparison purposes and as our rst baseline we
modify the SEM-ALL approach to use a common low-level
timbral distance [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] instead of the semantic one. To this
extent, we use MFCCs and model each music track as a single
Gaussian with full covariance matrix. A closed form
symmetric approximation of the Kullback-Leibler divergence is
then used as a distance. Thereby, we can regard the
MFCCALL approach as a counterpart of the distance-based
approach to music recommendation proposed by Logan [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] in
which the Earth-Mover's Distance between MFCC clusters
is used as a distance measure.
2.3.5
        </p>
      </sec>
      <sec id="sec-6-5">
        <title>Timbral Gaussian mixture model (MFCC-GMM)</title>
        <p>Alternatively, as in the SEM-GMM approach, we
construct a probabilistic model using a GMM. Instead of the
semantic descriptors, we use a population of mean MFCC
vectors (one vector per track) to train the model.
2.3.6</p>
      </sec>
      <sec id="sec-6-6">
        <title>Collaborative filtering with Last.fm (LASTFM)</title>
        <p>
          In addition to the described content-based approaches, we
consider a contextual baseline approach based on music
similarity inferred from collaborative ltering information. We
did not have at hand any data of this kind on our own,
and therefore we opted for the usage of black box
recommendations, provided by Last.fm. It is an established
music recommender with an extensive number of users, and a
large music collection, providing means for both monitoring
listening statistics and social tagging [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>We manually generate a list of recommendations browsing
Last.fm. The procedure we follow for that purpose partially
emulates human user behavior while discovering new
music. During the retrieval procedure we did not open any
account for Last.fm, therefore we consider such
recommendations unbiased to possible personalization, which can be
provided for the registered accounts. We randomly preselect
20 music tracks from the user's preference set, and query
the Last.fm website for each of the preselected tracks. To
this extent, for each query track, we search a corresponding
Last.fm track page4. If the track page is found, we pass to
the "Similar Music" page5. This page provides a ranked list
of tracks similar to the query track. From the list we select
the rst track which is available for pre-listen online, by a
di erent artist than the query track. Otherwise, if the
corresponding track page is not found, or the "Similar Music"
page is not available for the query track due to insu cient
collaborative ltering data (e.g., when the query track is an
unpopular long-tail track with low number of listeners), we
search for the corresponding artist page6 and proceed to the
"Similar Artists" page7. This page provides a ranked list of
artists, similar to the artist of the query track. We apply
an artist lter to the list as the query artist name can have
variations. Thereafter we select the top-ranked artist from
the list, go to the corresponding artist page, and select the
rst track, which is available for pre-listen online, from the
\Top Tracks" section. This section provides two lists of the
most popular tracks by the artist, relying on short-term last
4for example, see http://www.last.fm/music/Mastodon/_/
The+Czar
5http://www.last.fm/music/Mastodon/_/The+Czar/
+similar
6http://www.last.fm/music/Baby+Ford/
7http://www.last.fm/music/Baby+Ford/+similar
week period, or long-term last 6 months period of listening
statistics. We opted for the last 6 months period. If no
prelistens are found, we proceed iteratively to the next similar
artist's top tracks, until we nd one. If no similar artist
contains previewable tracks, we skip the query track.
2.3.7</p>
      </sec>
      <sec id="sec-6-7">
        <title>Random tracks by the same genre (GENRE)</title>
        <p>Finally, as a simple and low-cost contextual baseline, we
provide random recommendations, which rely on genre
categories of the user's tracks. As in the LASTFM approach, we
preselect 20 music tracks from the user's preference set. For
each of the tracks we obtain a genre category of this track
from the Last.fm track page, or artist page. As such, we
select the rst genre tag we encounter, which is presented
in a given music collection (we assume, that all tracks are
tagged with a genre category). Thereafter, we return a
random track of this genre tag from the collection.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>EXPERIMENTS AND RESULTS</title>
    </sec>
    <sec id="sec-8">
      <title>User Data Analysis</title>
      <p>We worked with a group of 12 users (8 males and 4
females). They were aged between 25 and 45 years old
(average = 32:75 years old and standard deviation = 5:17
years old) and showed a very high interest in music (rating
around = 9:58, with = 0:67, where 0 means no interest
in music and 10 means passionate about music). Ten of the
twelve users play at least one musical instrument, including
violin, piano, guitar, singing, synthesizers and ukulele.</p>
      <p>The number of tracks selected by the users to convey their
musical preferences was very varied, ranging from 19 to 178
music pieces ( = 73:25, = 46:07). The time spent for
this task also di ered a lot, ranging from half an hour to
180 hours ( = 30:41, = 54:19).</p>
      <p>It is interesting to analyze the provided verbal
descriptions about the strategy followed to select the music tracks.
Some of the users were selecting one song per artist, while
some others did not apply this restriction. They also
covered various uses of music such as listening, playing, singing
or dancing. Other users mentioned musical genre, mood,
expressivity, musical parameters, lyrics and chronological
order as driving parameters for selecting the tracks.
Furthermore, some users implemented an iterative strategy by
gathering a very large amount of music pieces from their
music collection and performing a further re nement to obtain
the nal selection.</p>
      <p>Finally, each user provided a set of labels to de ne their
musical preferences. Most of them were related to genre,
mood and instrumentation, some of them to rhythm and
few to melody, harmony or expressivity. Other labels were
attached to lyrics, year and duration of the piece. The users'
preferences covered a wide range of musical styles (from
classical to country, jazz, rock, pop, electronic, folk) and musical
properties (e.g. acoustic vs. synthetic, calm vs. danceable,
tonal and dissonant).
3.2</p>
    </sec>
    <sec id="sec-9">
      <title>Recommendation Evaluation</title>
      <p>In order to evaluate the considered approaches, we
performed subjective listening tests on our 12 subjects. The
entire process used an in-house collection of 100K music
excerpts (30 sec.) by 47K artists (approximately 2 tracks per
artist) covering a wide range of musical dimensions (di
erent genres, styles, arrangements, geographic locations, and
epochs). For each user we generated 7 recommendation
playlists, using each of the three proposed approaches and
two low-level plus two contextual baseline approaches. Each
playlist consisted of 20 music tracks, returned by the
respective approach speci cs (Sec. 2.3). No playlist contained
more than one song from the same artist. All playlists were
merged into a single list of 140 tracks, with all the tracks
randomly ordered to avoid any response bias because of
presentation order or because of recommendation approach. The
le names were anonymized, and all metadata was deleted
from the les as well, to make contextual identi cation of
the tracks impossible. Also the participants were not aware
of the amount of recommendation approaches, their names
and their rationales.</p>
      <p>A questionnaire was given for the subjects to express
different subjective impressions related to the recommended
music. A \familiarity" rating ranged from the identi cation
of artist and title (4) to absolute unfamiliarity (0), with
intermediate steps for knowing the title (3), the artist (2),
or just feeling familiar with the music (1). A \liking" rating
measured the enjoyment of the presented music with 0 and 1
covering negative liking, 2 being a kind of neutral position,
and 3 and 4 representing increasing liking for the musical
excerpt. A rating of \listening intentions" measured
preference, but in a more direct and behavioral way than the
\liking" scale, as an intention is closer to action than just
the abstraction of liking. Again this scale contained 2
positive and 2 negative steps plus a neutral one. Finally, an even
more direct rating was included with the name \gimmemore"
allowing just 1 or 0 to respectively indicate a request for, or
a reject of, more music like the one presented. The users
were also asked to provide title and artist for those tracks
rated high in the familiarity scale. We manually corrected
this scale when the given artist/title was wrong (hence a
familiarity rating of \3" or, more frequently, \4", was
sometimes lowered to 1. These corrections represented just 3%
of the total familiarity judgments.
3.3</p>
    </sec>
    <sec id="sec-10">
      <title>Results</title>
      <p>
        Considering the subjective scales used, a good
recommendation system should provide high-liking/listening
intentions/request for the greater part of retrieved tracks and
in particular for low-familiarity tracks. Therefore, we
recoded the user's ratings into 3 main categories, referring to
the type of the recommendation: hits, fails and trusts. Hits
were those tracks having a low familiarity rating (&lt; 2) and
a high (&gt; 2) liking rate. Fails were those tracks having a
low (&lt; 3) liking rating. Trusts were those tracks that got a
high familiarity (&gt; 1) and a high (&gt; 2) liking rate. Trusts,
provided their overall amount is low, can be useful for a user
to feel that the recommender is understanding his/her
preferences [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] (i.e., a user could be satis ed by getting a trust
track from time to time, but annoyed if every other track is
a trust). Using the liking, the intentions and the
\gimmemore" Boolean rating we respectively computed three di erent
recommendation outcome measures. Then we combined the
three into a nal recommendation outcome that required
absolute coincidence of them in order to consider it to be a hit,
a fail or a trust. A 18.3% of all the recommendations were
then considered as \unclear" (e.g., a case that, using the
liking, it was a hit, but using the other two indexes it was a
fail), and were excluded from further analyzes. An
interesting additional result is that many of the unclear outcomes
correspond to high-liking ratings that turned into 0 in the
gimmemore scale. This pattern was more frequent for the
recommendations generated using the GMM-MFCC (6.6%)
than for any other approaches, being the GENRE the least
changed (2.9%). Contrastingly, the opposite change
(lowliking becoming positive "gimmemore") was nearly absent
in the ratings.
      </p>
      <p>The percent of each category per recommendation
approach is presented in Table 1. An inspection of it reveals
that the approach yielding more hits (41.2%) and trusts
(25.4%) is LASTFM (not surprisingly the trusts found with
other approaches were scarce, below 4%). The three
approaches based on semantic descriptors (SEM-ALL,
SEMMEAN and SEM-GMM) yielded more than 30% of hits,
and the remaining ones could not supply more than 25%.
The existence of an association between recommendation
approach and the outcome of the recommendation could be
accepted, according to the result of the Pearson chi-square
test ( 2(18) = 351:7, p &lt; 0:001).</p>
      <p>Additionally, three separate between-subjects ANOVA were
performed in order to test the e ects of the recommendation
approaches on the three subjective ratings. The e ect was
con rmed in all of them (F (6; 1365) = 55:385, p &lt; 0:001
for the liking rating, F (6; 1365) = 48:89, p &lt; 0:001 for the
intentions rating, and F (6; 1365) = 43:501, p &lt; 0:001 for the
\gimmemore" rating). Pairwise comparisons using Tukey's
test revealed the same pattern of di erences between the
recommendation approaches, irrespective of the 3 tested
indexes. This pattern highlights the LASTFM approach as
the one getting the highest overall ratings, it also groups
together the MFCC-GMM and MFCC-ALL approaches (those
getting the lowest ratings), and the remaining approaches
also clustered in-between.</p>
      <p>Finally, a measure of the quality of the hits was computed
doing (liking familiarity) intentions. Selecting only the
hits, an ANOVA on the e ect of recommendation method
on this quality measure revealed no signi cant di erences
attributable to the method. Therefore, once a hit is selected,
there is no recommendation method granting better or worst
recommendations than any other. The same pattern was
revealed by solely using the liking as a measure of the quality
of the hits.</p>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSIONS</title>
      <p>In this work we presented three content-based approaches
to music recommendation, which are based on an explicit set
of music tracks provided by a user as evidence of his/her
musical preferences (the user's preference set). Our approaches
work on semantic descriptors (inferred from low-level audio
features in diverse classi cation tasks) covering musical
dimensions such as genre and culture, moods and instruments,
and rhythm and tempo. More concretely, we proposed two
approaches which apply a high-level semantic distance to
retrieve tracks from a given collection. These approaches
compute the distance either from the mean point of the
preference set, or from all tracks in the preference set.
Alternatively, we proposed a model-based approach, which
creates a probabilistic model to infer the underlying structure
of the user's preferences. For that purpose, we employed a
GMM to model the preferences within the semantic domain.
We evaluated the proposed approaches against a number
of baselines in a subjective evaluation with 12 users. As
such baselines, we considered two approaches operating on
low-level timbral features (MFCCs) instead of the proposed
semantic descriptors. The rst approach employs a
stateof-the-art timbral distance, while the second one creates a
GMM within the timbral domain. Moreover, in contrast
to the content-based methods, we included two contextual
recommenders in our evaluation. One of them naively
retrieves random tracks from a given music collection by a
genre criterion. The other employs Last.fm as a source for
collaborative ltering information about music similarity.</p>
      <p>
        The evaluation results revealed the user's preference of
the proposed semantic approaches over the low-level timbral
baselines. This concerns both the compared distance-based
approaches as well as the probabilistic models. Regarding
the semantic distance employed in our approaches, this fact
supports and complements the outcomes from the previous
research on semantic music similarity measures [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], in which
a number of similarity measures were evaluated in a
subjective experiment but on a set of random tracks not necessarily
preferred by participants. In that experiment a comparable
performance of the semantic and low-level timbral distances
was revealed, meanwhile the semantic distance surpassed
the other methods in objective evaluations. Considering
these previous results and the present outcomes, we may
conclude that the high-level semantic description
outperforms the low-level timbral description in the task of music
recommendation.
      </p>
      <p>In contrast, the proposed approaches are found to be
inferior to the considered collaborative ltering recommender in
terms of both the number of successful novel
recommendations (hits) and the trusted recommendations. This result
can be partly explained by the fact that the
recommendations generated by the latter approach used the Last.fm
music collection, which could entail an evaluation bias.
Considering this fact, we can hypothesize a lower performance of
the collaborative ltering approach on our in-house
collection. Still the collaborative ltering approach yielded only
7% more hits than our best proposed semantic method. In
particular, we expect the proposed approaches to be suitable
for music discovery in the long tail which has a lack of
contextual information, and incorrect or incomplete metadata.</p>
      <p>Interestingly, the naive genre-based recommender, while
being worse than our proposed approaches, still outperformed
the timbre-based baselines. This could be partially explained
by the fact that genre was one of the driving criteria for
selecting users' preference sets, and that genre entails more
information and diversity than timbral information extracted
from MFCCs. We also did not nd bene ts of using our
semantic GMM-based approach comparing to the semantic
distance-based approaches, probably due to the insu cient
size of training data (only one mean MFCC vector per track
was computed in our experiments).</p>
      <p>In general, we conclude that though the considered
contentbased approaches to music recommendation do not reach
the satisfaction and novelty degree of the collaborative
ltering approach, the di erence in performance diminishes to
a great extent while using semantic descriptors. We may
hypothesize a better performance, comparable with the
collaborative ltering approach, once the amount and quality
of semantic descriptors is increased. Consequently, future
research will be devoted to the extension of the inherent
semantic descriptor space, used by the proposed approaches,
as well as the improvement of the underlying classi ers, and
the distance measure. Furthermore, we plan to assess the
potential bene t of user pro ling by explicitly given
preference examples in form of music tracks over more broad
contextual categories (favorite artists, albums, genres, and
even activities), and implicit information such as listening
behavior statistics.</p>
    </sec>
    <sec id="sec-12">
      <title>ACKNOWLEDGMENTS</title>
      <p>The authors would like to thank all participants involved
in the evaluation. This research has been partially funded
by the FI Grant of Generalitat de Catalunya (AGAUR).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Abdullah</surname>
          </string-name>
          .
          <article-title>On a robust correlation coe cient</article-title>
          .
          <source>Journal of the Royal Statistical Society. Series D (The Statistician)</source>
          ,
          <volume>39</volume>
          (
          <issue>4</issue>
          ):
          <volume>455</volume>
          {
          <fpage>460</fpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Baltrunas</surname>
          </string-name>
          and
          <string-name>
            <given-names>X.</given-names>
            <surname>Amatriain</surname>
          </string-name>
          .
          <article-title>Towards time-dependant recommendation based on implicit feedback</article-title>
          .
          <source>In Workshop on Context-aware Recommender Systems (CARS'09)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Barrington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Oda</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Lanckriet</surname>
          </string-name>
          .
          <article-title>Smarter then genius? human evaluation of music recommender systems</article-title>
          .
          <source>In 10th International Society for Music Information Retrieval Conference (ISMIR'09)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Barrington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Turnbull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Torres</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Lanckriet</surname>
          </string-name>
          .
          <article-title>Semantic similarity for music retrieval</article-title>
          .
          <source>In International Symposium on Music Information Retrieval (ISMIR'07)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Bishop</surname>
          </string-name>
          .
          <article-title>Pattern recognition and machine learning</article-title>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bogdanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Serra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wack</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Herrera</surname>
          </string-name>
          .
          <article-title>From low-level to high-level: Comparative study of music similarity measures</article-title>
          .
          <source>In International Workshop on Advances in Music Information Research (AdMIRe'09)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Veltkamp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Leman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rhodes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Slaney</surname>
          </string-name>
          .
          <article-title>Content-based music information retrieval: Current directions and future challenges</article-title>
          .
          <source>Proceedings of the IEEE</source>
          ,
          <volume>96</volume>
          (
          <issue>4</issue>
          ):
          <volume>668</volume>
          {
          <fpage>696</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>O.</given-names>
            <surname>Celma</surname>
          </string-name>
          .
          <article-title>Music recommendation and discovery in the long tail</article-title>
          .
          <source>PhD thesis</source>
          , UPF, Barcelona, Spain,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Firan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Nejdl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Paiu</surname>
          </string-name>
          .
          <article-title>The bene t of using tag-based pro les</article-title>
          .
          <source>In Latin American Web Conference</source>
          , pages
          <volume>32</volume>
          {
          <fpage>41</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Haro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Xambo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fuhrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bogdanov</surname>
          </string-name>
          , E. Gomez, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Herrera</surname>
          </string-name>
          .
          <article-title>The musical avatar - a visualization of musical preferences by means of audio content description</article-title>
          .
          <source>In Audio Mostly (AM '10)</source>
          , Pitea, Sweden,
          <year>2010</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jones</surname>
          </string-name>
          and
          <string-name>
            <surname>P. Pu.</surname>
          </string-name>
          <article-title>User technology adoption issues in recommender systems</article-title>
          .
          <source>In Networking and Electronic Commerce Research Conference</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Levy</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sandler</surname>
          </string-name>
          .
          <article-title>Music information retrieval using social tags and audio</article-title>
          .
          <source>IEEE Transactions on Multimedia</source>
          ,
          <volume>11</volume>
          (
          <issue>3</issue>
          ):
          <volume>383</volume>
          {
          <fpage>395</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Myaeng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <article-title>A probabilistic music recommender considering user opinions and audio features</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>43</volume>
          (
          <issue>2</issue>
          ):
          <volume>473</volume>
          {
          <fpage>487</fpage>
          ,
          <string-name>
            <surname>Mar</surname>
          </string-name>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Logan</surname>
          </string-name>
          .
          <article-title>Music recommendation from song sets</article-title>
          .
          <source>In Proc ISMIR, page</source>
          <volume>425</volume>
          {
          <fpage>428</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Lu</surname>
          </string-name>
          and
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Tseng</surname>
          </string-name>
          .
          <article-title>A novel method for personalized music recommendation</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>36</volume>
          (
          <issue>6</issue>
          ):
          <volume>10035</volume>
          {
          <fpage>10044</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>E.</given-names>
            <surname>Pampalk</surname>
          </string-name>
          .
          <article-title>Computational models of music similarity and their application in music information retrieval</article-title>
          .
          <source>PhD thesis</source>
          , Vienna University of Technology, Mar.
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>T.</given-names>
            <surname>Pohle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schnitzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Knees</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Widmer</surname>
          </string-name>
          .
          <article-title>On rhythm and general music similarity</article-title>
          .
          <source>In 10th International Society for Music Information Retrieval Conference (ISMIR'09)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Slaney</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>White</surname>
          </string-name>
          .
          <article-title>Similarity based on rating data</article-title>
          .
          <source>In International Symposium on Music Information Retrieval (ISMIR'07)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Tseng</surname>
          </string-name>
          .
          <article-title>Music recommendation using content and context information mining</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ):
          <volume>16</volume>
          {
          <fpage>26</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>V.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          .
          <source>The Nature of Statistical Learning Theory (Information Science and Statistics)</source>
          . Springer, 2nd edition, Nov.
          <year>1999</year>
          . Published: Hardcover.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>K.</given-names>
            <surname>West</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Lamere</surname>
          </string-name>
          .
          <article-title>A model-based approach to constructing music similarity functions</article-title>
          .
          <source>EURASIP Journal on Advances in Signal Processing</source>
          ,
          <year>2007</year>
          :
          <volume>149</volume>
          {
          <fpage>149</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yoshii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Komatani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ogata</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H. G.</given-names>
            <surname>Okuno</surname>
          </string-name>
          .
          <article-title>Hybrid collaborative and content-based music recommendation using probabilistic model with latent user preferences</article-title>
          .
          <source>In International Conference on Music Information Retrieval (ISMIR'06)</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>