<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Large Scale Discovery of Seasonal Music From User Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cameron Summers</string-name>
          <email>csummers@gracenote.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Phillip Popp</string-name>
          <email>ppopp@gracenote.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Gracenote Emeryville</institution>
          ,
          <addr-line>CA</addr-line>
          <country country="US">United States</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The consumption history of online media content such as music and video o ers a rich source of data from which to mine information. Trends in this data are of particular interest because they re ect user preferences as well as associated temporal contexts that can be exploited in systems such as recommendation or search. This paper classies songs associated with a holiday temporal context using a large, realworld dataset of user listening data. Results show strong performance of classi cation of Christmas music with Gaussian Mixture Models.</p>
      </abstract>
      <kwd-group>
        <kwd>music</kwd>
        <kwd>seasonality</kwd>
        <kwd>machine learning</kwd>
        <kwd>time series</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Consumption of media content such as music and video often exhibits patterns
when associated with a temporal context. Identifying and understanding these
contexts can improve the quality of recommendations as shown by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and provide
useful explanations for the recommendations that are made, improving the user
experience [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Contexts such as holidays often in uence domains beyond music
listening, linking music recommendation with other recommendations systems.
The importance of holiday contexts in music can be readily observed in industry
where ags such as Christmas are often used [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, the task of manually
labeling speci c content as connected to a holiday is challenging because these
connections have a distributed nature - varying by geographic region, language,
and time - and expert curation is time intensive and costly. We investigate the
feasibility of labeling these connections by classi cation with user listening data.
      </p>
      <p>
        Previous research has studied the dynamics and classi cation of time series
signals. In the web search domain, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] showed that queries could be classi ed by
their change in popularity over time using features in the signal. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] classi ed
seasonal web search queries using Holt-Winters decomposition on a small data
set to improve time-sensitivity in search results. In music listening signals, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] show how analysis of temporal dynamics of music listening are useful for
recommendations systems and look speci cally at seasonality. However, to our
knowledge there is no published work that attempts to exploit the temporal
analysis of music listening data for automated labeling of holiday music content.
      </p>
      <p>Approach</p>
    </sec>
    <sec id="sec-2">
      <title>Methods and Materials</title>
      <p>Listen counts of a track will exhibit a di ering and detectable pattern around a
period of time if it has an association with that period, such as a Christmas track
around December 25th. This pattern can be exploited by training a classi er
using features of this signal. The features in this study are listening rates of a
track i for day j in a window of time localized around the target context
Rij =</p>
      <p>PW
l=1
PU
k=1 cijk
PU
k=1 cilk
(1)
where c is an element of C, and C 2 RT W U where T is the number of
tracks, W is the number of time periods, and U is the number of users. To control
for the signi cant di erences in the overall popularity of tracks in a large data
set, we normalize the listen counts of each track across the selected periods.</p>
      <p>For classi cation, we chose the Gaussian Mixture Model (GMM) with full
covariance matrix because it is fast to train and the listening rates resemble a
normal distribution. A GMM is trained using tracks from the target holiday in
a training portion of the data set, and classi cation is performed on the test set
using the likelihood of the data given the model.
2.2</p>
      <p>Dataset
Number of Records
Number of Users
Number of Tracks
Date Range
4,819,992,847
1,648,796
13,227,376</p>
      <p>January 2012 - February 2013</p>
      <p>This study uses an internal Gracenote dataset of online radio listening records
in North America with some basic statistics of the dataset shown in Table 2.2.
Each record of the dataset represents one listen of a track by one user and
provides User ID, Date, Time, and Track ID. From the Track ID some associated
metadata such as track name and album name is used for keyword search and
post-experiment analysis. It is necessary to use a large dataset to get good
classi cation results as shown in section 2.3. Other public datasets similar to Table
2.2 such as \Last.fm Dataset - 1K users" dataset available at http://www.dtic.
upf.edu/~ocelma/MusicRecommendationDataset/lastfm-1K.html are too small.
2.3</p>
      <p>Experiment - Christmas
We chose Christmas as the target for seasonal music identi cation because of its
popularity and large volume of associated music. We hypothesize that a classi er
trained with features in section 2.1 can identify Christmas tracks. We generated
an initial set of Christmas tracks by searching for \Christmas" keyword in the
Large Scale Discovery of Seasonal Music
track name and album name - totaling 87,554 Christmas tracks or 0.7% of the
entire track population - and maintained a second list of tracks without the
keyword. This is not a comprehensive list of Christmas tracks, but is generally
free of non-Christmas tracks. Expert curation of a comprehensive set is infeasible
with such a large dataset, and using tags from external sources or a more complex
text search is error prone.</p>
      <p>We chose a consecutive 15 day span centered on December 25th, Christmas,
as the listening rate inputs to the classi er. Training and classi cation (60%
train, 40% test) using Gaussian Mixture Models were performed on subsets of
the dataset given by tracks with more than some minimum total listens in the
whole dataset. To validate performance of the Christmas model, ROC and AUC
score were calculated on the test set and are in Figure 1.</p>
      <p>1.0
0.8
e
taR0.6
e
v
ii
t
s
o
P
eu0.4
r
T
0.2</p>
      <p>Christmas Model ROC
0.00.0
0.2
0.4 0.6</p>
      <p>False Positive Rate
The model performed quite well even though the experiment used an incomplete
list of Christmas tracks. At the highest threshold, an inspection of tracks with
&gt;1500 Listens (AUC = 0.986)
&gt;500 Listens (AUC = 0.973)
&gt;200 Listens (AUC = 0.957)
&gt;100 Listens (AUC = 0.938)
&gt;10 Listens (AUC = 0.819)
&gt;1 Listens (AUC = 0.753)
0.8
high probability according to the Christmas model without the \Christmas"
keyword shows that many are other Christmas songs well-known in North America
such as \The First Noel" and \Santa Claus Is Coming To Town." This suggest
that the model is not just identifying tracks with the \Christmas" keyword, but
would likely accurately classify a more complete list of Christmas tracks.</p>
      <p>One notable observation is the change in AUC as the threshold for total
minimum listens of track is lowered. Classi cation su ers when including unpopular
tracks. This is likely due to the natural variance in the listen counts of tracks
with fewer listens. Normalizing smaller listen counts has a disproportionate e ect
on computation of listen rates.</p>
      <p>The dataset contains only a single year of data, which is a limitation for
analyzing seasonal temporal contexts. Multiple years of data could provide better
information for classi cation and show changing listening preferences over time.
This is a topic of future work.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>This study demonstrated on a large, real-world dataset that user listening data
could be utilized to detect seasonal music content for Christmas. Classi cation
with a Gaussian Mixture Model showed that the listen rates are sensitive to
variance in unpopular tracks and quality results require detection to be performed
on a large database of listening records.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Dongmin</surname>
          </string-name>
          , et al.:
          <article-title>Context-aware recommendation by aggregating user context</article-title>
          .
          <source>Commerce and Enterprise Computing</source>
          ,
          <year>2009</year>
          .
          <source>CEC'09. IEEE Conference on. IEEE</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , Weiquan, and
          <string-name>
            <surname>Benbasat</surname>
          </string-name>
          , I.:
          <article-title>Recommendation agents for electronic commerce: E ects of explanation facilities on trusting beliefs</article-title>
          .
          <source>Journal of Management Information Systems 23.4</source>
          (
          <year>2007</year>
          ):
          <fpage>217</fpage>
          -
          <lpage>246</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. The Echo Nest Blog, http://blog.echonest.com/post/35845347430/ christmas-comes
          <article-title>-early-to-the-echo-nest</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kulkarni</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teevan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svore</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Dumais</surname>
          </string-name>
          , S.T.:
          <article-title>Understanding temporal query dynamics</article-title>
          .
          <source>In Proc. WSDM</source>
          , pages
          <fpage>167</fpage>
          -
          <lpage>176</lpage>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          , China, (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Shokouhi: Detecting seasonal queries by time-series analysis</article-title>
          .
          <source>In Proc. SIGIR</source>
          , pages
          <fpage>1171</fpage>
          -
          <lpage>1172</lpage>
          , Beijing, China, (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Park</surname>
            , Ho,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kahng</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Temporal dynamics in music listening behavior: A case study of online music service</article-title>
          .
          <source>Computer and Information Science (ICIS)</source>
          ,
          <source>2010 IEEE/ACIS 9th International Conference on. IEEE</source>
          , (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Carneiro</surname>
            , Teixeira,
            <given-names>M.J.</given-names>
          </string-name>
          :
          <article-title>Towards the discovery of temporal patterns in music listening using Last.fm pro les</article-title>
          .
          <source>Dissertation</source>
          , (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hidasi</surname>
          </string-name>
          , Balzs, and
          <string-name>
            <surname>Tikk</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Context-aware recommendations from implicit data via scalable tensor factorization</article-title>
          .
          <source>arXiv preprint arXiv:1309.7611</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>