<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hierarchical Multilabel Classification and Voting for Genre Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Train Classifier For Main Genres</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Benjamin Murauer, Maximilian Mayerl, Michael Tschuggnall, Eva Zangerle, Martin Pichl, Günther Specht University of Innsbruck</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper summarizes our contribution (team DBIS) to the AcousticBrainz Genre Task: Content-based music genre recognition from multiple sources as part of MediaEval 2017. We utilize a hierarchical set of multilabel classifiers to predict genres and subgenres and rely on a voting scheme to predict labels across datasets.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        In the MediaEval AcousticBrainz Genre Task, the goal is to classify
tracks into main and subgenres, using content-based features
computed with Essentia [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and collected by AcousticBrainz [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Four
separate training and test sets of tracks were provided, stemming
from four diferent sources (AllMusic, Discogs, Lastfm, and
Tagtraum). The task features two subtasks, which difer in the amount
of data that can be used for solving them: In subtask 1, only training
data from the same source as the current test data may be used for
the classification; in subtask 2, all provided datasets can be utilized
for training. However, the evaluation is performed on a per-dataset
basis. Further details can be found in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>CLASSIFICATION AND CHALLENGES</title>
      <p>There are multiple factors that make the posed task dificult to
solve, particularly the large amount of data to handle and the
multilabel nature of the classification problem make the tasks highly
challenging. Subtask 2 is further complicated by the fact that genre
and subgenre labels are hardly consistent across the four provided
training sets, hence providing a heterogeneous set of labels.</p>
      <p>In the following, we firstly sketch our approach to mitigate these
dificulties. Next, we detail the classification approaches we used
throughout subtasks 1 and 2 and lastly, present the obtained results.
We make our implementation available for reproducibility and for
promoting research in this direction1.</p>
      <p>Reducing the amount of data. To reduce the amount of data and
make the task computationally feasible within the limited time
frame, we at first skipped detailed features describing low level
energy bands of the energy spectrum and verified on a preliminary
basis that the respective central moments are suficient in terms
of classification accuracy. This allowed us to reduce the number
of features used for training the genre classifiers to 395 (from over
3,000 features originally provided). The full list of features can be
found in our GitHub repository1.
1https://github.com/dbis-uibk/MusicGenreClassification</p>
      <p>Multilabel classification. The fact that any track may feature
multiple genres and subgenres complicates the classification problem,
since not all classification algorithm inherently support multilabel
classification. We solved this problem by applying the
one-vs.-therest strategy, efectively training a separate binary classifier for
every label.</p>
      <p>
        Diferent genre labels across data sets. As subtask 2 allows to
combine all datasets for training, the (vastly) difering genre labels
used in the four available training sets posed a challenge. We
tackled this problem by computing a direct mapping between the main
class labels of all training sets aiming to find equivalent genre labels
across all datasets. Therefore, we applied the Levenshtein string
distance measure [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] (as previously used for e.g., entity matching [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ])
to find all labels with a distance of at most 1. This slightly fuzzy
matching approach allows us to neglect minor syntactic diferences
in the labels (e.g., hip hop vs. hiphop). Preliminary experiments and
manual inspection showed that this allows to increase the number
of matching labels while still avoiding false positives. We did not
match sub-genres, as our experiments showed that those diverged
to a far greater extent.
      </p>
      <p>
        Classification Algorithms. We implemented our solution with
two diferent classification methods 2: (1) a linear C-support vector
machine [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and (2) an extra-trees classifier [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In addition,
multilayer neural networks, that are known to work well for this task
(c.f. [
        <xref ref-type="bibr" rid="ref4 ref5 ref8">4, 5, 8</xref>
        ]), and extreme gradient boosting [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] showed promising
preliminary results, but were deemed infeasible due to the
computational resources required to train full-scale models.
2.1
Train separate subgenre classifier for each main genre
      </p>
      <p>Predict main genres for each track</p>
      <p>Predict subgenres for each main genre predicted for each track
Fallback: Assign most popular main genre to tracks with no predicted label</p>
      <p>
        The workflow underlying our approach for subtask 1 is outlined
in Figure 1. First, we train one classifier for main genres and a
separate classifier for each main genre’s subgenres. After that, we
2We relied on the python library scikit-learn [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for implementing the machine
learning parts of the tasks.
B. Murauer, M. Mayerl, M. Tschuggnall, E. Zangerle, M. Pichl, G. Specht
utilize the main genre classifier to predict the main genres of every
track in the test set. Following that, for every track in the test set
and every main genre predicted for that track, the corresponding
subgenre classifier is used to predict the subgenre labels for the
track. Lastly, as it is possible in multilabel classification that no
label is assigned to a track (i.e., if every binary classifier predicts
a ’no’ for its respective label), we apply a ‘most popular genre’
fallback approach and assign the most common main genre label
for the respective dataset to ensure that each track is assigned a
main genre.
      </p>
      <p>To exploit the possibility to submit five submission runs, this
basic approach was implemented with the following configurations
of classification algorithms for main and subgenre, which are also
listed in Table 1:
• Run #1 uses a SVM with C = 1.0 and no class weight balancing
for the main genre classifier; an extra-trees classifier with 50
trees, p| f eatures | features considered when searching for the
best split and balanced class weights for the subgenre classifiers.
• Run #2 uses a SVM with C = 1.0 and balanced class weights for
the main genre classifier; an extra-trees classifier with 50 trees,
p| f eatures | features considered when searching for the best
split and balanced class weights for the subgenre classifiers.
• Run #3 includes a SVM with C = 10.0 and balanced class weights
for the main- and subgenre classifiers.</p>
      <p>The C value for the SVMs was selected after a grid search on
a smaller test set of 10,000 randomly sampled tracks. The chosen
amount of features and trees for the extra trees classifier was a trade
of between classification runtime and accuracy, as more features
would possibly have provided more accuracy. For runs #4 and #5,
the results of run #3 were used.
2.2</p>
    </sec>
    <sec id="sec-3">
      <title>Subtask 2</title>
      <p>For subtask 2, the set of all provided datasets could be utilized to
classify each of the four test sets. We chose to implement this using
a voting mechanism. First, SVMs as main genre classifiers were
trained as in subtask 1, independently for every training set. These
classifiers were then used to predict the main genres of a given
track as follows:
(1) Predict the main genres of the track with all four classifiers.
(2) Utilize the genre mapping as described above to map the
predicted genres to the genre labels of the current test set
(otherwise, the predicted labels would not be compatible and hence,
create false positives). Thereby, classification results where no
class label was contained in (or could be mapped to) the test
set were discarded.
(3) For every genre predicted by any of the four classifiers, count
the number of classifiers that predicted this genre (using two
diferent weighing schemes) and weigh this by the number of
classifiers that produced a usable result.</p>
      <p>To arrive at the final set of main genres for every track, we
applied two diferent variants, which can be seen in Table 1 for runs
#4 and #5: (1) weigh every prediction equally and retain genres
predicted by at least 50% of the usable classifiers—for example, if
three of the four classifiers predict the label rock/pop, that label
was predicted by 75% of the classifiers and is retained (run #4);
(2) double the weight of the prediction of the classiefir trained
specifically on the training set corresponding to the current test set
and retain genres predicted by at least 60% of the usable classifiers
(e.g., if we did predictions for the Lastfm test set and the Lastfm
and Discogs classifiers predicted rock/pop, then that label was
assigned three votes out of five (i.e., 60%) and retained (run #5)).
This puts more emphasis on the predictions of the training set and
hence, classifier that is trained on the naturally best training data
(stemming from the same data source as the current test set).</p>
      <p>Prediction of subgenres and handling of tracks with no predicted
labels was handled the same way as in subtask 1. For this subtask,
support vector machines were used as classifiers, with C = 10.0 and
balanced class weights as determined in preliminary experiments.
3</p>
    </sec>
    <sec id="sec-4">
      <title>RESULTS AND OUTLOOK</title>
      <p>The results of the evaluation of our approach for subtasks 1 and
2 can be found in Tables 2 and 3, respectively. Table 2 shows the
results of run #3, which provided the best overall performance
in both subtasks. Table 3 contains the results of run #5, which
performed better in some measures (in bold font) compared to run
#3 in subtask 2. Due to space limitations, the results of the other
runs are omitted.</p>
      <p>Possible improvements of the presented approaches include
different classifying methods such as deep neural networks and a
more detailed feature selection process. These steps were rendered
impossible due to time constraints and technical limitations of the
available hardware.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Dmitry</given-names>
            <surname>Bogdanov</surname>
          </string-name>
          , Alastair Porter,
          <string-name>
            <given-names>Juliàn</given-names>
            <surname>Urbano</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Hendrik</given-names>
            <surname>Schreiber</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recognition from Multiple Sources</article-title>
          .
          <source>In Proc. of the MediaEval 2017 Workshop</source>
          , Dublin, Ireland, Sept.
          <fpage>13</fpage>
          -
          <lpage>15</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Dmitry</given-names>
            <surname>Bogdanov</surname>
          </string-name>
          , Nicolas Wack, Emilia Gómez, Sankalp Gulati,
          <string-name>
            <given-names>Perfecto</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mayor</surname>
          </string-name>
          , Gerard Roma, Justin Salamon,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Zapata</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Xavier</given-names>
            <surname>Serra</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>ESSENTIA: an Audio Analysis Library for Music Information Retrieval</article-title>
          .
          <source>In International Society for Music Information Retrieval Conference (ISMIR'13)</source>
          . Curitiba, Brazil,
          <fpage>493</fpage>
          -
          <lpage>498</lpage>
          . http://hdl.handle.
          <source>net/10230/32252</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Tianqi</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Guestrin</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          .
          <source>In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM</source>
          ,
          <volume>785</volume>
          -
          <fpage>794</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Sander</given-names>
            <surname>Dieleman</surname>
          </string-name>
          , Philemon Brakel, and
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Schrauwen</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Audio-based Music Classification with a Pretrained Convolutional Network.</article-title>
          .
          <source>In In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR</source>
          <year>2011</year>
          ).
          <fpage>669</fpage>
          -
          <lpage>674</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Shyamala</given-names>
            <surname>Doraisamy</surname>
          </string-name>
          , Shahram Golzari,
          <string-name>
            <given-names>Noris</given-names>
            <surname>Mohd</surname>
          </string-name>
          . Norowi, Md Nasir Sulaiman, and Nur Izura Udzir.
          <year>2008</year>
          .
          <article-title>A Study on Feature Selection and Classification Techniques for Automatic Genre Classification of Traditional Malay Music</article-title>
          . In
          <source>In Proceedings of the 9th International Society for Music Information Retrieval Conference (ISMIR</source>
          <year>2008</year>
          ).
          <fpage>331</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Elmagarmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Ipeirotis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Verykios</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Duplicate Record Detection: A Survey</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>19</volume>
          ,
          <issue>1</issue>
          (Jan
          <year>2007</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . https://doi.org/10.1109/TKDE.
          <year>2007</year>
          . 250581
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Pierre</given-names>
            <surname>Geurts</surname>
          </string-name>
          , Damien Ernst, and
          <string-name>
            <given-names>Louis</given-names>
            <surname>Wehenkel</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Extremely randomized trees</article-title>
          .
          <source>Machine learning 63, 1</source>
          (
          <year>2006</year>
          ),
          <fpage>3</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Arijit</given-names>
            <surname>Ghosal</surname>
          </string-name>
          , Rudrasis Chakraborty, Bibhas Chandra Dhara, and Sanjoy Kumar Saha.
          <year>2015</year>
          .
          <article-title>Perceptual feature-based song genre classification using RANSAC</article-title>
          .
          <source>International Journal of Computational Intelligence Studies</source>
          <volume>4</volume>
          ,
          <issue>1</issue>
          (
          <year>2015</year>
          ),
          <fpage>31</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Vladimir</surname>
            <given-names>I</given-names>
          </string-name>
          <string-name>
            <surname>Levenshtein</surname>
          </string-name>
          .
          <year>1966</year>
          .
          <article-title>Binary codes capable of correcting deletions, insertions, and reversals</article-title>
          .
          <source>In Soviet physics doklady</source>
          , Vol.
          <volume>10</volume>
          .
          <fpage>707</fpage>
          -
          <lpage>710</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Fabian</surname>
            <given-names>Pedregosa</given-names>
          </string-name>
          , Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          , Ron Weiss, Vincent Dubourg, and others.
          <source>2011</source>
          .
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <string-name>
            <surname>Oct</surname>
          </string-name>
          (
          <year>2011</year>
          ),
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Alastair</surname>
            <given-names>Porter</given-names>
          </string-name>
          , Dmitry Bogdanov, Robert Kaye, Roman Tsukanov, and
          <string-name>
            <given-names>Xavier</given-names>
            <surname>Serra</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>AcousticBrainz: a community platform for gathering music information obtained from audio</article-title>
          .
          <source>In 16th International Society for Music Information Retrieval Conference (ISMIR</source>
          <year>2015</year>
          ). Malaga, Spain,
          <fpage>786</fpage>
          -
          <lpage>792</lpage>
          . http://dblp.org/rec/html/conf/ismir/PorterBKTS15
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Ting-Fan</surname>
            <given-names>Wu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chih-Jen Lin</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ruby C Weng</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Probability estimates for multi-class classification by pairwise coupling</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>5</volume>
          ,
          <string-name>
            <surname>Aug</surname>
          </string-name>
          (
          <year>2004</year>
          ),
          <fpage>975</fpage>
          -
          <lpage>1005</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>