<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ICSI in MediaEval 2017 Multi-Genre Music Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kijung Kim</string-name>
          <email>kijung@berkeley.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaeyoung Choi</string-name>
          <email>jaeyoung@icsi.berkeley.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>International Computer Science Institute</institution>
          ,
          <addr-line>Berkeley, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of California</institution>
          ,
          <addr-line>Berkeley, CA</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>[5] Gilles Louppe</institution>
          ,
          <addr-line>Louis Wehenkel, Antonio Sutera, and Pierre Geurts</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>We present our approach and result for the MediaEval 2017 AcousticBrainz Content-based music genre recognition task. Experimental results show that the best results come from random forest with partial feature selection. Stochastic Gradient Descent Classifier: Run 1 and 2 Run 1 consisted of each song having a concatenated feature vector of all features minus the ones mentioned above with the SGDClassifier. To accommodate for large data, batch training of size 80,000 was used. Run 2's feature formulation and the model is the same as Run 1. The diference is in the prediction process. The procedure was to look at the results for each song and mainly go with the genre prediction. For example, given main genre A has subgenres B,C and main genre D has subgenres E, F, if the classifiers classified a song as main genre A with subgenres C,D, and F but does not classify it as main genre D, because main genre D was not predicted, the predictions will ignore subgenre F, and the final prediction will be genre A with subgenres C,D. In short, subgenre predictions were ignored if their main "parent" genres were not predicted. This approach was to decrease the chance of false positives for subgenres. In short, we made a system of hierarchy and weighed genre predictions higher than subgenre predictions. 1https://multimediaeval.github.io/2017-AcousticBrainz-Genre-Task/results/</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>The 2017 Content-based music genre recognition from multiple
sources Task [1] consists of two subtasks: single source
classification and multiple source classification. We focused on the first
subtask, which the goal was to predict genres using a single source
of ground truth with broad genre categories as class labels. In the
following sections, we describe our feature formulation, models
and experiments in details.</p>
    </sec>
    <sec id="sec-2">
      <title>TECHNICAL APPROACH</title>
      <p>The proposed framework can be divided into three phases: (1)
feature formulation , (2) standardization, and (3) model selection
and predictions.</p>
      <p>(1) Feature Formulation The dataset provides each song
with three groups of pre-extracted features: tonal, rhythm, and
low-level. A feature vector for each song was formed as a
concatenation of all the individual features from each group. For features
with specifics labels such as mean, max, and min, they were simply
concatenated together. For the sake of simplicity, categorical
features were not considered. The excluded features are: "key_key",
"key_scale", "chords_key", and "chords_scale". The "beats_position"
was excluded as the feature for each song has variable length, and
we assumed that the features "bpm" and "beats counts" were
suficient. This resulted in a 2647-dimensional feature vector for each
song.</p>
      <p>(2) Standardization We randomly sampled a subset of 100,000
songs for each dataset, formulated the feature vector, and computed
the mean and standard deviation for all indicies in the feature
vector. Then, at the test phase, each feature was standardized using
the pre-computed mean and standard deviation.</p>
      <p>(3) Model Selection and Predictions From scikit-learn [6],
two classifiers used in our approach were the Stochastic Gradient
Descent (SGD) classifier with hinge loss and Random Forest
classifier with 16 estimators [2]. A binary classifier was trained for
each genre/subgenre, the results were conglomerated together and
prediction for each genre/subgenre was made independently.</p>
      <p>The first two Runs consisted of concatenating all provided
features (except the ones mentioned above) and using the SGD
classiifer.
2.1
2.2</p>
    </sec>
    <sec id="sec-3">
      <title>RandomForest with Parital Feature</title>
    </sec>
    <sec id="sec-4">
      <title>Selection: Run 3, 4 and 5</title>
      <p>For the next three Runs, we used random forest classifier (RFC)
with partial feature selection. We used the feature importance [4]
from the trained random forest classifier. We first took a subset of
the train data (around 100,000 songs), formulated a concatenated
feature vector for each song, and fit the features to the RFC for each
genre and subgenre. Then, we used the ranked feature importance
list from the classifier to select the x% best features, which resulted
in diferent best features for each genre and subgenre [5]. From
there, we trained all-for-one RFC’s using the top x% features for
each genre and subgenre with its own x% best features, and used a
subset of the train data (around 150,000 songs). Finally, prediction
of the genres were made based on a conglomeration of all the RFC’s.</p>
      <p>Run 3, 4, and 5 used the top 25%, 50%, and 75% of the features
from the ranked list of feature importance from the trained RFC,
that resulted in a 661, 1323, and 1985 dimensional feature vector
per song, respectively.
3</p>
    </sec>
    <sec id="sec-5">
      <title>RESULTS AND ANALYSIS</title>
      <p>In this section, we report accumulated results on the sub-task
based on our two diferent approaches. 1 Our results are reported
in Figure 1. The test set is composed of three diferent databases
(Discogs, Lastfm, Tagtraum), and we took the average of precision,
recall, and f-score to obtain single number.</p>
      <p>We observe that the approaches based on the Random Forest
Classifiers (Runs 3, 4, 5) outperform the SGD Classifier approaches</p>
      <p>Ptrackall Rtrackall Ftrackall Ptrackgen Rtrackgen Ftrackgen Ptracksub Rtracksub Ftracksub Plabelall Rlabelall Flabelall Plabelgen Rlabelgen Flabelgen Plabelsub Rlabelsub Flabelsub
0.65
0.6083
0.3337
0.3621
0.3493
0.4746
0.3715
0.2619
0.2711
0.277
0.5375
0.3272
0.2655
0.3201
0.2968
(Runs 1, 2). In particular, we note that the recall in Runs 1, 2 is
especially high while the precision is especially low, which meant
the classifiers predicted each song with almost every label. For Runs
3, 4, and 5, we observe a significantly lower recall with a better
precision.</p>
      <p>For Runs 3, 4, and 5, we observe a trend that by adding additional
features, the recall improves at the cost of precision. However, Run
5 disproves such trend, as it shows that Run 5 gives only better
recall for per-label results while showing worse result in all metrics
in per-track results.</p>
      <p>Runs 1 and 2 clearly sufered from oversampling, which lead
the classifiers in most genres to predict positive, which resulted in
high recall and low precision. Runs 3,4, and 5 did not sufer like
Runs 1 and 2, but upon observing precision, recall, and f-scores for
each genre, the classifiers did far worse on non-popular genres and
subgenres, which lead to overall lower precision and recall.</p>
      <p>The shortcomings came, for Runs 1 and 2, from errors in
sampling. These sampling errors were technical, as they originated
from code. For Runs 3,4, and 5, the shortcomings came from a lack
of a system to combine results from diferent classifiers. For one,
we could have exploited the probabilities generated from the model
for each prediction to ascertain a threshold for each genre and
subgenre. This would have helped especially for sparse subgenres.</p>
      <p>For future works, for Runs 1, 2, it would be interesting to see if
taking less top % of the features from the feature importance list
for each genre and subgenre will improve precision. Also, it may be
worth trying majority voting by training several diferent random
forest classifiers using the feature importance list. Lastly, it would
be interesting to try the imbalanced-learn package [3] which is
compatible with scikit-learn and may fix the class imbalance for
Runs 1 and 2.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported in part by AWS Research Grants.</p>
      <p>(2017).</p>
      <p>Imbalanced-learn: A python toolbox to tackle the curse of imbalanced
datasets in machine learning. Journal of Machine Learning Research
18, 17 (2017), 1–5.
practice. arXiv preprint arXiv:1407.7502 (2014).
2013. Understanding variable importances in forests of randomized
trees. In Advances in neural information processing systems. 431–439.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Schreiber. 2017. The</given-names>
            <surname>Mediaeval 2017 AcousticBrainz Genre</surname>
          </string-name>
          <article-title>Task: Content-based music genre recognition from multiple sources</article-title>
          .
          <source>Proc. of the MediaEval 2017 Workshop</source>
          , Dublin, Ireland, Sept.
          <fpage>13</fpage>
          -
          <lpage>15</lpage>
          ,
          <year>2017</year>
          [2]
          <string-name>
            <given-names>Ben</given-names>
            <surname>Hoyle</surname>
          </string-name>
          , Markus Michael Rau, Roman Zitlau, Stella Seitz, and
          <string-name>
            <given-names>Jochen</given-names>
            <surname>Weller</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Feature importance for machine learning redshifts applied to SDSS galaxies</article-title>
          .
          <source>Monthly Notices of the Royal Astronomical</source>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          ),
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>