<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Can you judge a music album by its cover?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Petar Petrovski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Lisa Gentile</string-name>
          <email>annalisag@informatik.uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data and Web Science Group, University of Mannheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this work we explore the potential role of music album cover arts for the task of predicting the overall rating of music albums and we investigate if one can judge a music album by its cover alone. We present the results of our participation to the Linked Data Mining Challenge at the Know@LOD 2016 Workshop, which suggest that the the cover album alone might not be su cient for the rating prediction task.</p>
      </abstract>
      <kwd-group>
        <kwd>Classi cation</kwd>
        <kwd>Image Embeddings</kwd>
        <kwd>DBpedia</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>each album as \good" when the critics' score for it is greater than 80 and \bad"
when lesser than 60.</p>
      <p>To answer our question, we learn a SVM model that classi es music albums
either as \good" or \bad" and we train the model only using the cover art of each
album. The feature extraction from the cover art is based on image embeddings.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Classi cation of music albums</title>
      <p>Our proposed approach for album classi cation consists of three main steps.
Given a collection of music albums, we rst obtain the image of their cover art.
Then, using o -the-shelf tools we obtain a feature vector representation of the
images. We then learn a classi er to label each album as \good" or \bad", only
exploiting the feature space obtained from its cover art. Figure 1 depicts the
proposed pipeline.
In this work we use the Know@LOD 2016 challenge dataset. It consists of 1,600
music albums. Each item provides:
{ album name
{ artist name
{ album release date
{ DBpedia4 URI for the album
{ the classi cation of the album as \good" or \bad".</p>
      <p>The organisers split the dataset as 80% (1,280 instances) for training, and
20% (320 instances) for the test.</p>
      <p>
        In our experiment we deliberately only want to exploit the cover art of the
albums, therefore we only use only the DBpedia URIs of the albums to obtain
their cover images. First, by using Rapidminer LOD extension [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we retrieve
the dbp:cover5 property for each album. The property contains the path to the
      </p>
      <sec id="sec-2-1">
        <title>4 http://dbpedia.org 5 The pre x dbp: stands for the namespace http://dbpedia.org/property/.</title>
        <p>image of the cover art. Then, by using the Mediawiki API6, we download all the
images.</p>
        <p>The resulting image set consists of 1558 images, with 2 images missing (the
path obtained from dbp:cover did not correspond to any Wikipedia resource).</p>
        <p>The dataset is available at https://github.com/petrovskip/know-lod2016,
together with the extracted feature vectors and process used (explained in section
3).
2.2</p>
        <p>Classi cation approach.</p>
        <p>
          We learn a SVM model that classi es music albums either as \good" or \bad".
Starting from our image set, we use image embeddings to obtain a feature space.
Feature set. We use the Ca e deep learning framework [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] to obtain image
embeddings. Together with deep learning algorithms, Ca e also provides a
collection of reference models, which are ready to use for certain labelling tasks.
Speci cally, we used the bvlc model7 by Krizhevsky et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. It consists of a
neural network of ve convolutional layers, and three fully-connected layers. The
model has been trained on the ImageNet collection, a dataset of 1.2 million
labelled images from the ILSVRC2012 challenge8. Each layer of the model provides
a representation of the image features, with associated weights. The last layer
is the one that outputs the labels for each image. This model outputs as labels
1000 di erent classes, according to the training ImageNet collection. Since we
are not interested in these 1000 labels, but we only want to classify images as
\good" or \bad", we classify our image set with this model, but we only use the
output layer before the last (the second fully-connected layer), from which we
obtain (i) images features and their weighting. The resulting feature vector has
a length of 4,096. These features represent visual components of the images, e.g.
colours, shapes etc.
        </p>
        <p>Learning process We use all obtained features (without any ne-tuning) to train
the C-SVM classi er (a wrapper implementation of libSVM classi er on
Rapidminer) with a linear kernel and the default parameters.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiment</title>
      <p>We evaluated our approach using 10-fold cross validation on the training set and
we obtained an accuracy of 58.03%. The resulting confusion matrix is presented
in Table 1. The accuracy of the test set reported by the challenge system is of
60.3125%.</p>
      <p>The low accuracy of our approach seems to suggest that the image alone is not
a good predictor of the overall rating of the musical album. Nevertheless it would</p>
      <sec id="sec-3-1">
        <title>6 https://www.mediawiki.org/wiki/API:Main page 7 bvlc reference ca enet from ca e.berkeleyvision.org 8 http://image-net.org/challenges/LSVRC/2012/</title>
        <p>true good true bad class precision
pred. good 385 282 57.72%
pred. bad 254 356 58.36%
class recall 60.25% 55.80%
be interesting to investigate if ne-tuning of the features or the combination pf
other factors could lead to better results.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>This paper presents our submission to the Linked Data Mining Challenge at
the Know@LOD 2016 workshop. We proposed an approach that classi es music
albums into \good" or \bad" based solely on their cover art. We trained a SVM
classi er with the feature vector calculated from the album's cover art to solve
the prediction problem of music album classi cation. While our approach has
some interesting results, our experiment hints that only using album covers as
features is not the best t for the task of the challenge.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bolshakov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gelbukh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <source>Natural Language Processing and Information Systems. Natural Language Processing and Information Systems</source>
          <year>1959</year>
          ,
          <volume>103</volume>
          {
          <fpage>114</fpage>
          (
          <year>2001</year>
          ), http://www.springerlink.com/index/10.1007/3-540-45399-7
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Horsburgh</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Craw</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Massie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Learning pseudo-tags to augment sparse tagging in hybrid music recommender systems</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>219</volume>
          ,
          <fpage>25</fpage>
          {
          <fpage>39</fpage>
          (
          <year>2015</year>
          ), http://dx.doi.org/10.1016/j.artint.
          <year>2014</year>
          .
          <volume>11</volume>
          .004
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shelhamer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donahue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karayev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guadarrama</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
          </string-name>
          , T.:
          <article-title>Ca e: Convolutional architecture for fast feature embedding</article-title>
          .
          <source>In: Proceedings of the ACM International Conference on Multimedia</source>
          . pp.
          <volume>675</volume>
          {
          <fpage>678</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>Imagenet classi cation with deep convolutional neural networks</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>1097</volume>
          {
          <issue>1105</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. L beks, J.,
          <string-name>
            <surname>Turnbull</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>You can judge an artist by an album cover: Using images for music annotation</article-title>
          .
          <source>Multimedia, IEEE (99)</source>
          ,
          <volume>1</volume>
          {
          <issue>1</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pichl</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zangerle</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
          </string-name>
          , G.:
          <article-title>Combining spotify and twitter data for generating a recent and public dataset for music recommendation</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          <volume>1313</volume>
          ,
          <issue>35</issue>
          {
          <fpage>40</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ristoski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Mining the web of linked data with rapidminer</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <volume>35</volume>
          ,
          <issue>142</issue>
          {
          <fpage>151</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>