<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ACM RecSys</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Analyzing the Characteristics of Shared Playlists for Music Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dietmar Jannach</string-name>
          <email>dietmar.jannach@tu-dortmund.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iman Kamehkhosh</string-name>
          <email>iman.kamehkhosh@tu-dortmund.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Geoffray Bonnin</string-name>
          <email>geo ray.bonnin@tu-dortmund.de</email>
          <email>ray.bonnin@tu-dortmund.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TU Dortmund</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>10</volume>
      <abstract>
        <p>The automated generation of music playlists { as supported by modern music services like last.fm or Spotify { represents a special form of music recommendation. When designing a \playlisting" algorithm, the question arises which kind of quality criteria the generated playlists should ful ll and if there are certain characteristics like homogeneity, diversity or freshness that make the playlists generally more enjoyable for the listeners. In our work, we aim to obtain a better understanding of such desired playlist characteristics in order to be able to design better algorithms in the future. The research approach chosen in this work is to analyze several thousand playlists that were created and shared by users on music platforms based on musical and meta-data features. Our rst results for example reveal that factors like popularity, freshness and diversity play a certain role for users when they create playlists manually. Comparing such usergenerated playlists with automatically created ones moreover shows that today's online playlisting services sometimes generate playlists which are quite di erent from user-created ones. Finally, we compare the user-created playlists with playlists generated with a nearest-neighbor technique from the research literature and observe even stronger di erences. This last observation can be seen as another indication that the accuracy-based quality measures from the literature are probably not su cient to assess the e ectiveness of playlisting algorithms.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.3.3 [Information Storage and Retrieval]: Information
Search and Retrieval; H.5.5 [Information Interfaces and
Presentation]: Sound and Music Computing</p>
      <sec id="sec-1-1">
        <title>Playlist generation, Music recommendation</title>
        <p>Music, playlist, analysis, algorithm, evaluation</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>The automated creation of playlists or personalized radio
stations is a typical feature of today's online music
platforms and music streaming services. In principle, standard
recommendation algorithms based on collaborative ltering
or content-based techniques can be applied to generate a
ranked list of musical tracks given some user preferences
or past listening history. For several reasons, the
generation of playlists however represents a very speci c music
recommendation problem. Personal playlists are, for
example, often created with a certain goal or usage context (e.g.,
sports, relaxation, driving) in mind. Furthermore, in
contrast to relevance-ranked recommendation lists used in other
domains, playlists typically obey some homogeneity and
coherence criteria, i.e., there are quality characteristics that
are related to the transitions between the tracks or to the
playlist as a whole.</p>
      <p>
        In the research literature, a number of approaches for the
automation of the playlist generation process have been
proposed, see, e.g., [
        <xref ref-type="bibr" rid="ref10 ref11 ref2 ref6 ref8">2, 6, 8, 10, 11</xref>
        ] or the recent survey in
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Some of them for example take a seed song or artist
as an input and look for similar tracks; others try to nd
track co-occurrence patterns in existing playlists. In some
approaches, playlist generation is considered as an
optimization problem. Independent of the chosen technique, a
common problem when designing new playlisting algorithms is
to assess whether or not the generated playlists will be
positively perceived by the listeners. User studies and online
experiments are unfortunately particularly costly in the music
domain. Researchers therefore often use o ine
experimental designs and for example use existing playlists shared by
users on music platforms as a basis for their evaluations. The
assumption is that these \hand-crafted" playlists are of good
quality; typical measures used in the literature include the
Recall [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or the Average Log-Likelihood (ALL) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
Unfortunately, both measures have their limitations, see also
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The Recall measure for example tells us how good an
algorithm is at predicting the tracks selected by the users,
but does not explicitly capture speci c aspects such as the
homogeneity or the smoothness of track transitions.
      </p>
      <p>
        To design better and more comprehensive quality
measures, we however rst have to answer the question of what
users consider to be desirable characteristics of playlists or
what the driving principles are when users create playlists.
In the literature, a few works have studied this aspect using
di erent approaches, e.g., user studies [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ] or analyzing
forum posts [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The work presented in this paper continues
these lines of research. Our research approach is however
di erent from previous works as we aim to identify patterns
in a larger set of manually created playlists that were shared
by users of three di erent online music platforms. To be able
to take a variety of potential driving factors into account in
our analysis, we have furthermore collected various types of
meta-data and musical features of the playlist tracks from
public music databases.
      </p>
      <p>Overall, with our analyses we hope to obtain insights on
the principles which an automated playlist generation
system should observe to end up with better-received or more
\natural" playlists. To test if current music services and
a nearest-neighbor algorithm from the literature generate
playlists that observe the identi ed patterns and make
similar choices as real users, we conducted an experiment in
which we analyzed commonalities and di erences between
automatically generated and user-provided playlists.</p>
      <p>Before reporting the details of our rst analyses, we will
rst discuss previous works in the next section.</p>
    </sec>
    <sec id="sec-3">
      <title>PREVIOUS WORKS</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], Slaney and White addressed the question if users
have a tendency to create very homogeneous or rather
diverse playlists. As a basis for determining the diversity they
relied on an objective measure based on genre information
about the tracks. Each track was considered as a point in
the genre space and the diversity was then determined by
calculating the volume of an ellipsoid enclosing the tracks of
the playlist. An analysis of 887 user-created playlists
indicated that diversity can be considered to be a driving factor
as users typically create playlists covering several genres.
      </p>
      <p>
        Sarro and Casey more recently [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] focused on track
transitions in album playlists and made an analysis to determine
if there are certain musical characteristics that are
particularly important. One of the results of their investigation was
that fade durations and the mean timbre of the beginnings
and endings of consecutive tracks seem to have a strong
inuence on the ordering of the tracks.
      </p>
      <p>
        Generally, our work is similar to [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] in that we
rely on user-created (\hand-crafted") playlists and look at
meta-data and musical features of the tracks to identify
potentially important patterns. The aspects we cover in this
paper were however not covered in their work and our
analysis is based on larger datasets.
      </p>
      <p>
        Cunningham et al., [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], in contrast, relied on another form
of track-related information and looked at the user posts in
the forum of the Art of the Mix web site. According to their
analysis, the typical principles for setting up the playlists
mentioned by the creators were related to the artist, genre,
style, event or activity but also the intended purpose,
context or mood. Some users also talked about the smoothness
of track transitions and how many tracks of one single artist
should be included in playlists. Placing the most
\important" track at the end of a playlist was another strategy
mentioned by some of the playlist creators.
      </p>
      <p>
        A di erent form of identifying playlist creation principles
is to conduct laboratory studies with users. The study
reported in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for example involved 52 subjects and indicated
that the rst and the last tracks can play an important role
for the quality of a playlist. In another study, Andric and
Haus [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] concluded that the ordering of tracks is not
important when the playlist mainly contains tracks which the
users like in general.
      </p>
      <p>
        Reynolds et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] made an online survey that revealed
that the context and environment like the location activity
or the weather can have an in uence both on the listeners'
mood and on the track selection behavior of playlist
creators. Finally, the study presented in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] again con rmed
the importance of artists, genres and mood in the playlist
creation process.
      </p>
      <p>
        In this discussion, we have focused on previous attempts
to understand how users create playlists and what their
characteristics are. Playlist generation algorithms however do
not necessarily have to rely on such knowledge. Instead,
one can follow a statistical approach and only look at
cooccurrences and transitions of tracks in existing playlists and
use these patterns when creating new playlists, see e.g., [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
or [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This way, the quality factors respected by human
playlist creators are implicitly taken into account. Such
approaches, however, cannot be directly applied for many
types of playlist generation settings, e.g., for creating
\thematic" playlists (e.g., Christmas Songs) or for creating
playlists that only contain tracks that have certain musical
features. Pure statistical methods are not aware of these
characteristics and the danger exists that tracks are included
that do not match the purpose of the list and thus lead to
a limited overall quality.
3.
      </p>
    </sec>
    <sec id="sec-4">
      <title>CHARACTERISTICS OF PLAYLISTS</title>
      <p>The ultimate goal of our research is to analyze the
structure and characteristics of playlists in order to better
understand the principles used by the users to create them. This
section is a rst step toward this goal.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Data sources</title>
      <p>As a basis for the rst analyses that we report in this
paper, we used two types of playlist data.
3.1.1</p>
      <sec id="sec-5-1">
        <title>Hand-crafted playlists</title>
        <p>We used samples of hand-crafted playlists from three
different sources. One set of playlists was retrieved via the
public API of last.fm1, one was taken from the Art of the
Mix (AotM) website2, and a third one was provided to us by
8tracks3. To enhance the data quality, we corrected artist
misspellings using the API of last.fm.</p>
        <p>Overall, we analyzed over 10,000 playlists containing about
108,000 di erent tracks of about 40,000 di erent artists. As
a rst attempt toward our goal, we retrieved the features
listed in Table 1 using the public API of last.fm and The
Echo Nest (tEN), and the MusicBrainz database.</p>
        <p>Some dataset characteristics are shown in Table 2. The
\usage count" statistics express how often tracks and artists
appeared overall in the playlists. When selecting the playlists,
we made sure that they do not simply contain album
listings. The datasets are partially quite di erent, e.g., with
respect to the average playlist lengths. The 8tracks dataset
furthermore has the particularity that users are not allowed
to include more than two tracks of one artist, in case they
want to share their playlist with others.</p>
        <p>Figure 1 shows the distributions of playlist lengths. As
can be seen, the distributions are quite di erent across the
datasets. On 8tracks, a playlist generally has to comprise
1http://www.last.fm
2http://www.artofthemix.org
3http://8tracks.com
at least 8 tracks. The lengths of the last.fm playlists seem
to follow a normal distribution with a maximum frequency
value at around 20 tracks. Finally, the sizes of the AotM
playlists are much more equally distributed.
3.1.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Generated playlists</title>
        <p>To assess if the playlists generated by today's online
services are similar to those created by users, we used the public
API of The Echo Nest. We chose this service because it uses
a very large database and allows the generation of playlists
from several seed tracks, as opposed to, for instance, iTunes
Genius or last.fm radios. We split the existing hand-crafted
playlists in half, provided the rst half of the list as seed
tracks to the music service and then analyzed the
characteristics of the playlist returned by The Echo Nest and
compared them to the patterns that we found in hand-crafted
playlists. Instead of observing whether a playlister
generates playlists that are generally similar to playlists created
by hand, our goal here is to break down their di erent
characteristics and observe on what speci c dimensions they
differ. Notice that using the second half as seed would not be
appropriate as the order of the tracks may be important.</p>
        <p>
          We also draw our attention to the ability of the algorithms
of the literature to reproduce the characteristics of
handcrafted playlists. According to some recent research, one of
the most competitive approaches in terms of recall is the
simple k-nearest-neighbors (kNN) method [
          <xref ref-type="bibr" rid="ref2 ref8">2, 8</xref>
          ]. More
precisely, given some seed tracks, the algorithm extracts the k
most similar playlists based on the number of shared items
and recommends the tracks of these playlists. This
algorithm does not require a training step and scans the entire
set of available playlists for each recommendation.
3.2
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Detailed observations</title>
      <p>In the following sections, we will look at general
distributions of di erent track characteristics.
3.2.1</p>
      <sec id="sec-6-1">
        <title>Popularity of tracks</title>
        <p>The goal of the rst analysis here is to determine if users
tend to position tracks in playlists depending on their
popularity. In our analysis, we measure the popularity in terms
of play counts. Play counts were taken from last.fm,
because this is one of the most popular services and the
corresponding values can be considered indicative for a larger
user group.</p>
        <p>For the measurement, we split the playlists into two parts
of equal size and then determined the average play counts on
last.fm for the tracks for each half. To measure to which
extent the user community favors certain tracks in the playlists,
we calculated the Gini index, a standard measure of
inequality4. Table 3 shows the results. In the last column, we
report the statistics for the tracks returned by The Echo Nest
(tEN) and kNN playlisters5. We provided the rst half of
the hand-crafted playlists as seed tracks and the playlisters
had to select the same number of tracks as the number of
remaining tracks.</p>
        <p>The results show that users actually tend to place more
popular items in the rst part of the list in all datasets,
when play counts are considered. The Echo Nest playlister
does not seem to take that form of popularity into account
4We organized the average play counts in 100 bins.
5We determined 10 as the best neighborhood size for our
data sets based on the recall value, see Section 4.</p>
        <sec id="sec-6-1-1">
          <title>Playlists</title>
          <p>Tracks
Artists
Avg. tracks/playlist
Avg. artists/playlist
Avg. genres/playlist
Avg. tags/playlist
Avg. track usage count
Avg. artist usage count
lastfm
1,172
24,754
9,925
26.0
16.8
2.7
473.4
1.2
3.0
measure, we compared the creation year of each playlist with
the average release year of its tracks. We limit our analysis
to the last.fm and 8tracks datasets because we only could
acquire creation dates for these two.
8tracks
last.fm
and recommends on average less popular tracks. These
differences are statistically signi cant according to a Student's
t-test (p &lt; 10 5 for The Echo Nest playlister and p &lt; 10 7
for the kNN playlister). This behavior indicates also that
The Echo Nest is successfully replicating the fact that the
second halves of playlists are supposed to be less popular
than the rst half.</p>
          <p>The Gini index reveals that there is a slightly stronger
concentration on some tracks in the rst half for two of three
datasets and the diversity slightly increases in the second
part. The absolute numbers cannot be directly compared
across datasets, but for the AotM dataset the
concentration is generally much higher, which is also indicated by the
higher \track reuse" in Table 2. Interestingly, The Echo Nest
playlister quite nicely reproduces the behavior of real users
with respect to the diversity of popularity.</p>
          <p>In the lower part of Table 3, we show the results for
the kNN method. Note that these statistics are based on
a di erent sample of the playlists than the previous
measurement. The reason is that both The Echo Nest and the
kNN playlisters cannot produce playlists for all of the rst
halves provided as seed tracks. We therefore considered only
playlists, for which the corresponding algorithm could
produce a playlist.</p>
          <p>Unlike the playlister of The Echo Nest, the kNN method
has a strong trend to recommend mostly very popular items.
This can be caused by the fact that the kNN method by
design recommends tracks that are often found in similar
playlists. Moreover, based on the lower half of Table 3, the
popularity correlates strongly with the seed track popularity.
As a result, the kNN shows a potentially undesirable trend
to reinforce already popular items to everyone. At the same
time, it concentrates the track selection on a comparable
small number of tracks as indicated by the very high value
for the Gini coe cient.
3.2.2</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>The role of freshness</title>
        <p>Next, we analyzed if there is a tendency of users to create
playlists that mainly contain recently released tracks. As a
6On 8tracks, artist repetitions are limited due to license
constraints</p>
        <p>Figure 2 shows the statistics for both datasets. We
organized the data points in bins (x-axis), where each bin
represents an average-freshness level, and then counted how many
playlists fall into these levels. The relative frequencies are
shown on the y-axis. The result are very similar for both
datasets, with a slight tendency to include older tracks for
last.fm. On both datasets, more than half of the playlists
contain tracks that were released on average in the last 5
years, the most frequent average age being between 4 and
5 years for last.fm and between 3 and 4 years for 8tracks.
Similarly, on both datasets, more than 75% of the playlists
contain tracks that were released on average in the last 8
years.</p>
        <p>We also analyzed the standard deviation of the resulting
freshness values and observed that more than half of the
playlists have a standard deviation of less than 4 (years),
while more than 75% have a standard deviation of less than 7
(years) on both datasets. Overall, this suggests that playlists
made by users are often homogeneous with regard to the
release date.</p>
        <p>Computing the freshness for the generated playlists would
require to con gure the playlisters in such a way that they
select only tracks that were not released after the playlists'
creation years. Unfortunately, The Echo Nest does not allow
such a con guration. Moreover, for the kNN approach, the
playlists that are more recent would have to be ignored,
which would lead to a too small sample size and not very
reliable results anymore.
3.2.3</p>
      </sec>
      <sec id="sec-6-3">
        <title>Homogeneity and diversity</title>
        <p>Homogeneity and diversity can be determined in a variety
of ways. In the following, we will use simple measures based
on artist and genre counts. The genres correspond to the
genres of the artists of the tracks retrieved from The Echo
Nest. Basic gures for artist and genre diversity are already
given in Table 2. On AotM, for example, having several
tracks of an artist in a playlist is not very common6. On
last.fm, we in contrast very often see two or more tracks of
one artist in a playlist. A similar, very rough estimate can
be made for the genre diversity. If we ordered the tracks of
a playlist by genre, we would encounter a di erent genre on
last.fm only after having listened to about 10 tracks. On
AotM and 8tracks, in contrast, playlists on average cover
more genres.</p>
        <p>Table 4 shows the diversities of the rst and second halves
of the hand-crafted playlists, and for the automatic
selections using the rst halves as seeds. As a measure of
diversity, we simply counted the number of artists and genres
and divided by the corresponding number of tracks. The
values in Table 4 correspond the averages of these diversity
measures.</p>
        <p>Regarding the diversity of the hand-crafted playlists, the
tables show that users tend to keep a same level of artist and
genre diversity throughout the playlists. We can also notice
that the playlists of last.fm are much more homogeneous.
The diversity values of the automatic selections reveal
several things. First, The Echo Nest playlister tends to always
maximize the artist diversity independently of the diversity
of the seeds; on the contrary, the kNN playlister lowered the
initial artist diversities, except on the last.fm dataset, where
it increased them, though less than The Echo Nest playlister.
Regarding the genre diversity, we can observe an opposite
tendency for both playlisters: The Echo Nest playlister tends
to reduce the genre diversity while the kNN playlister tends
to increase it. Again, these di erence are statistically
significant (p &lt; 0:03 for The Echo Nest playlister and p &lt; 0:006
for the kNN playlister). Overall, the resulting diversities of
the both approaches tend to be rather dissimilar to those of
the hand-crafted playlists.
3.2.4</p>
      </sec>
      <sec id="sec-6-4">
        <title>Musical features (The Echo Nest)</title>
        <p>To understand if people tend to place tracks with speci c
feature values into their playlists, we then computed the
distribution of the average feature values of each playlist.
Figure 4 shows the results of this measurement for the
energy and \hotttnesss" features. For all the other features
(danceability, loudness and tempo), the distributions were
similar to those of Figure 3, which could mean that they are
generally not particularly important for the users.</p>
        <p>When looking at the energy feature, we see that users tend
to include tracks from a comparably narrow energy spectrum
with a low average energy level, even though there exist
more high-energy tracks in general as shown in Figure 3. A
similar phenomenon of concentration on a certain range of
values can be observed for the \hotttnesss" feature. As a
side aspect, we can observe that the tracks shared on AotM
are on average slightly less \hottt" than those of both other
platforms7.</p>
        <p>We nally draw our attention to the feature distributions
of the generated playlists. Figure 5 as an example shows
the distributions of the energy and \hotttnesss" factors for
7The results for the \hotttnesss" we report here correspond
to the values at the time when we retrieved the data using
the API of The Echo Nest, and not to those at the time when
the playlists were created. This is not important as we do
not look at the distributions independently, but compare
them to the distributions in Figure 3.
0.1
0.09
0.08
y0.07
n
eu0.06
q
fre0.05
e
iv0.04
t
a
leR0.03
0.02
0.01</p>
        <p>0
0.25
0.2
y
c
n
eu0.15
q
e
fr
itve 0.1
a
l
e
R
0.05
0
0
1st half
2nd half
tEN
kNN10
the rst halves and second halves of the playlists of all three
datasets, together with the distributions of the tracks
selected by The Echo Nest and kNN playlisters.</p>
        <p>The gure shows that The Echo Nest playlister tends to
produce a distribution that is quite similar to the
distribution of the seed tracks. The kNN playlister, in contrast,
tends to concentrate the distributions toward the maximum
values of the distributions of the seeds. We could observe
this phenomenon of concentration for all the features on
all three datasets, except for the danceability on the AotM
dataset.
3.2.5</p>
      </sec>
      <sec id="sec-6-5">
        <title>Transitions and Coherence</title>
        <p>We now focus on the importance of transitions between
the tracks, and de ne the coherence of a playlist as the
average similarity between its consecutive tracks. Such
similarities can be computed according to various criteria. We
used the binary cosine similarity of the genres and artists8,
and the Euclidean linear similarity for the numerical track
features of The Echo Nest. Table 5 shows the corresponding
results for the rst and second halves of the hand-crafted
playlists, and for the automatic selections using the rst
halves as seeds.</p>
        <p>We can rst see that for all datasets and for all criteria, the
second halves of the playlists have a lower coherence than
the rst halves. If we assume that the coherence is
representative of the e ort of the users to create good playlists,
then the tracks of the second halves seem to be slightly less
carefully selected than those of the rst halves.
8In the case of artists, this means that the similarity equals
1 if both tracks have the same artist, and 0 else. The
metric thus measures the proportion of cases when the users
consecutively selected tracks from the same artist.</p>
        <p>Another interesting phenomenon is the high artist
coherence values on the last.fm dataset. These values indicate
that last.fm users have a surprisingly strong tendency to
group tracks from the same artist together, which was not
successfully reproduced by the two playlisters. Both
playlisters actually seem to have a tendency to produce always
the same coherence values, independently of the coherence
values of the seed. A last interesting result is the high
coherence of artist genres on the AotM and 8tracks datasets {
the high genre coherence values on last.fm can be explained
by the high artist coherence values.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>4. STANDARD ACCURACY METRICS</title>
      <p>Our analysis so far has revealed some particular
characteristics of user-created playlists. Furthermore, we observed
that the nearest-neighbor playlisting scheme can produce
playlists that are quite di erent to those generated by the
commercial Echo Nest service, e.g., in terms of average track
popularity (Table 3).</p>
      <p>
        In the research literature, \hit rates" (recall) and the
average log-likelihood (ALL) are often used to compare the
quality of playlists generated by di erent algorithms [
        <xref ref-type="bibr" rid="ref11 ref2 ref8">2, 8,
11</xref>
        ]. The goal of our next experiment was to nd out how
The Echo Nest playlister performs on these measures. As
it is not possible to acquire probability values for the tracks
selected by The Echo Nest playlister, the ALL cannot be
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
the Echo
      </p>
      <p>Nest
kNN10
the Echo</p>
      <p>Nest
kNN10</p>
      <p>kNN10
the Echo</p>
      <p>Nest
last.fm
track recall</p>
      <p>AotM
artist recall
genre recall</p>
      <p>8tracks
tag recall
used9. In the following we thus only focus on the precision
and recall.</p>
      <p>The upper part of Figure 6 shows the recall values at
list length 100 for the di erent datasets10. Again, we split
the playlists and used the rst half as seed tracks. Recall
was then computed by comparing the computed playlists
with the \hidden" tracks of the original playlist. We
measured recall for tracks, artists, genres and tags. The results
show that the kNN method quite clearly outperforms the
playlister of The Echo Nest on the recall measures across all
datasets except for the artist recall for the last.fm dataset.
The di erences are statistically signi cant for all the
experiments except for the track and artists recall on last.fm
(p &lt; 10 6) according to a Student's t-test. As expected,
the kNN method leads to higher absolute values for larger
datasets as more neighbors can be found.</p>
      <p>The lower part of Figure 6 presents the precision results.
The precision values for tracks are as expected very low and
close to zero which is caused by the huge set of possible
tracks and the list length of 100. We can however observe a
higher precision for the kNN method on the AotM dataset
(p &lt; 10 11), which is the largest dataset. Regarding artist,
genre and tag prediction, The Echo Nest playlister lead to
a higher precision (p &lt; 10 3) than the kNN playlister on all
datasets.
9Another possible measure is the Mean Reciprocal Rank
(MRR). Applied to playlist generation, one limitation of
this metric is that it corresponds to the assumption that
the rank of the test track or artist to predict should be as
high as possible in the recommendation list, although many
other tracks or artist may be more relevant and should be
ranked before.
10We could not measure longer list lengths as 100 is the
maximum playlist length returned by The Echo Nest.</p>
      <p>With respect to the evaluation protocol, note that we only
measured precision and recall when the playlister was able to
return a playlist continuation given the seed tracks. This was
however not always the case for both techniques. In Table 6,
we therefore report the detailed coverage gures, which show
that the kNN method was more often able to produce a
playlist. If recall is measured for all seed playlists, the
differences between the algorithms are even larger. When
measuring precision for all playlists, the di erences between the
playlisters become very small.</p>
      <sec id="sec-7-1">
        <title>Dataset tEN</title>
        <p>last.fm 28.33
AotM 42.75
8tracks 35.3</p>
        <p>Overall, measuring precision and recall when comparing
generated playlists with those provided by users in our view
represents only one particular form of assessing the quality
of a playlist generator and should be complemented with
additional measures. Precision and recall as measured in our
experiments for example do not consider track transitions.
There is also no \punishment" if a generated playlist contains
individual non- tting tracks that would hurt the listener's
overall enjoyment.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>5. PUBLIC AND PRIVATE PLAYLISTS</title>
      <p>Some music platforms and in particular 8tracks let their
users create \private" playlists which are not visible to
others and public ones that for example are shared and used
for social interaction like parties, motivation for team sport
or romantic evening. The question arises if public playlists
have di erent characteristics than those that were created
for personal use only, e.g., because sharing playlists to some
extent can also serve the purpose of creating a public image
of oneself.</p>
      <p>We made an initial analysis on the 8tracks dataset.
Table 7 shows the average popularity of the tracks in the 8tracks
playlists depending on whether they were in \public" or
\private" playlists (the rst category contains 2679 playlists and
the second 451). As can be seen, the tracks of the private
playlists are much more popular on average than the tracks
in the public playlists. Moreover, as indicated by the
corresponding Gini coe cients, the popular tracks are almost
equally distributed across the playlists. Furthermore,
Figure 7 shows the corresponding freshness values. We can see
that the private playlists generally contained more recent
tracks than public playlists.</p>
      <sec id="sec-8-1">
        <title>Public playlists Private playlists</title>
      </sec>
      <sec id="sec-8-2">
        <title>Play counts 870k 935k</title>
      </sec>
      <sec id="sec-8-3">
        <title>Gini index 0.20 0.06</title>
        <p>These results can be interpreted at least in two di erent
ways. First, users might create some playlists for their
personal use to be able to repeatedly listen to the latest popular
tracks. They probably do not share these playlists because
Private playlists
0</p>
        <p>5 10 15 20 25
Average freshness of 8tracks playlists (years)
30
sharing a list of current top hits might be of limited value
for other platform members who might be generally more
interested in discovering not so popular artists and tracks.
Second, users might deliberately share playlists with less
popular or known artists and tracks to create a social image
on the platform.</p>
        <p>Given these rst observations, we believe that our
approach has some potential to help us better understand some
elements of user behavior on social platforms in general,
i.e., that people might not necessarily only share tracks that
match their actual taste.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>6. SUMMARY AND OUTLOOK</title>
      <p>The goal of our work is to gain a better understanding
of how users create playlists in order to be able to design
future playlisting algorithms that take these \natural"
characteristics into account. The rst results reported in this
paper indicate, for example, that features like track
freshness, popularity aspects, or homogeneity of the tracks are
relevant for users, but not yet fully taken into account by
current algorithms that are considered to create high-quality
playlists in the literature. Overall, the observations also
indicate that additional metrics might be required to assess
the quality of computer-generated playlists in experimental
settings that are based on historical data such as existing
playlists or listening logs.</p>
      <p>Given the richness of the available data, many more
analyses are possible. Currently, we are exploring \semantic"
characteristics to automatically identify the underlying theme
or topic of the playlists. Another aspect not considered so
far in our research is the popularity of the playlists. For
some music platforms, listening counts and \like" statements
for playlists are available. This additional information can
be used to further di erentiate between \good" and \bad"
playlists and help us obtain more ne-granular di erences
with respect to the corresponding playlist characteristics.
Last, we plan to extend our experiments and analysis by
considering other music services, in particular last.fm radios,
and other playlisting algorithms, in particular algorithms
that exploit content information.
7.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Andric</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Haus</surname>
          </string-name>
          .
          <article-title>Estimating Quality of Playlists by Sight</article-title>
          .
          <source>In Proc. AXMEDIS</source>
          , pages
          <volume>68</volume>
          {
          <fpage>74</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bonnin</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          .
          <article-title>Evaluating the Quality of Playlists Based on Hand-Crafted Samples</article-title>
          .
          <source>In Proc. ISMIR</source>
          , pages
          <volume>263</volume>
          {
          <fpage>268</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bonnin</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          .
          <article-title>Automated generation of music playlists: Survey and experiments</article-title>
          .
          <source>ACM Computing Surveys</source>
          ,
          <volume>47</volume>
          (
          <issue>2</issue>
          ),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Turnbull</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <article-title>Playlist Prediction via Metric Embedding</article-title>
          .
          <source>In Proc. KDD</source>
          , pages
          <volume>714</volume>
          {
          <fpage>722</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bainbridge</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Falconer</surname>
          </string-name>
          . `
          <article-title>More of an Art than a Science': Supporting the Creation of Playlists and Mixes</article-title>
          .
          <source>In Proc. ISMIR</source>
          , pages
          <volume>240</volume>
          {
          <fpage>245</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Flexer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schnitzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gasser</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Widmer</surname>
          </string-name>
          .
          <article-title>Playlist Generation Using Start and End Songs</article-title>
          .
          <source>In Proc. ISMIR</source>
          , pages
          <volume>173</volume>
          {
          <fpage>178</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Hansen</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Golbeck</surname>
          </string-name>
          . Mixing It Up:
          <article-title>Recommending Collections of Items</article-title>
          .
          <source>In Proc. CHI</source>
          , pages
          <volume>1217</volume>
          {
          <fpage>1226</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hariri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mobasher</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          .
          <article-title>Context-Aware Music Recommendation Based on Latent Topic Sequential Patterns</article-title>
          .
          <source>In Proc. RecSys</source>
          , pages
          <volume>131</volume>
          {
          <fpage>138</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kamalzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Baur</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mo</surname>
          </string-name>
          <article-title>ller. A Survey on Music Listening and Management Behaviours</article-title>
          .
          <source>In Proc. ISMIR</source>
          , pages
          <volume>373</volume>
          {
          <fpage>378</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lehtiniemi</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Seppa</surname>
          </string-name>
          <article-title>nen. Evaluation of Automatic Mobile Playlist Generator</article-title>
          .
          <source>In Proc. MC</source>
          , pages
          <volume>452</volume>
          {
          <fpage>459</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>McFee</surname>
          </string-name>
          and
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Lanckriet</surname>
          </string-name>
          .
          <article-title>The Natural Language of Playlists</article-title>
          .
          <source>In Proc. ISMIR</source>
          , pages
          <volume>537</volume>
          {
          <fpage>542</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Reynolds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Barry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Burke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Coyle</surname>
          </string-name>
          .
          <article-title>Interacting With Large Music Collections: Towards the Use of Environmental Metadata</article-title>
          .
          <source>In Proc. ICME</source>
          , pages
          <volume>989</volume>
          {
          <fpage>992</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Sarro</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Casey</surname>
          </string-name>
          .
          <article-title>Modeling and Predicting Song Adjacencies In Commercial Albums</article-title>
          .
          <source>In Proc. SMC</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Slaney</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>White</surname>
          </string-name>
          .
          <article-title>Measuring Playlist Diversity for Recommendation Systems</article-title>
          .
          <source>In Proc. AMCMM</source>
          , pages
          <volume>77</volume>
          {
          <fpage>82</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>