=Paper=
{{Paper
|id=Vol-1984/Mediaeval_2017_paper_6
|storemode=property
|title=The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recognition from Multiple Sources
|pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_6.pdf
|volume=Vol-1984
|authors=Dmitry Bogdanov,Alastair Porter,Julián Urbano,Hendrik Schreiber
|dblpUrl=https://dblp.org/rec/conf/mediaeval/BogdanovPUS17
}}
==The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recognition from Multiple Sources==
<pdf width="1500px">https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_6.pdf</pdf>
<pre>
        The MediaEval 2017 AcousticBrainz Genre Task:
 Content-based Music Genre Recognition from Multiple Sources
                           Dmitry Bogdanov1 , Alastair Porter1 , Julián Urbano2 , Hendrik Schreiber3
                                             1 Music Technology Group, Universitat Pompeu Fabra, Spain
                                  2 Multimedia Computing Group, Delft University of Technology, Netherlands
                                                         3 tagtraum industries incorporated

                   dmitry.bogdanov@upf.edu,alastair.porter@upf.edu,urbano.julian@gmail.com,hs@tagtraum.com

ABSTRACT                                                                    • We provide information about the hierarchy of genres and sub-
This paper provides an overview of the AcousticBrainz Genre Task              genres within each annotation source. Systems can take advan-
organized as part of the MediaEval 2017 Benchmarking Initiative for           tage of this knowledge.
Multimedia Evaluation. The task is focused on content-based music           • MIR research is often performed on small music collections. We
genre recognition using genre annotations from multiple sources               provide a very large dataset with two million recordings an-
and large-scale music features data available in the AcousticBrainz           notated with genres and subgenres. However, we only provide
database. The goal of our task is to explore how the same music               precomputed features, not audio.
pieces can be annotated differently by different communities follow-
ing different genre taxonomies, and how this should be addressed            2    TASK DESCRIPTION
by content-based genre recognition systems. We present the task             The task invites participants to predict the genre and subgenre of
challenges, the employed ground-truth information and datasets,             unknown music recordings given automatically computed music
and the evaluation methodology.                                             features of those recordings. We provide four datasets of such music
                                                                            features taken from the AcousticBrainz database [8] together with
                                                                            four different ground truths created using four different music
1    INTRODUCTION                                                           metadata websites as sources. Their genre taxonomies vary in class
                                                                            spaces, specificity and breadth. Each source has its own definition
Content-based music genre recognition is a popular task in Music
                                                                            for its genre labels, i.e., the same label may carry a different meaning
Information Retrieval research [11]. The goal is to build systems able
                                                                            when used by another source. Most importantly, annotations in each
to predict genre and subgenre of unknown music recordings (tracks
                                                                            source are multi-label: there may be multiple genre and subgenre
or songs) using music features of those recordings automatically
                                                                            annotations for the same music recording. It is guaranteed that
computed from audio. Such research can be supported by our re-
                                                                            each recording has at least one genre label, while subgenres are not
cent developments in the context of the AcousticBrainz [1] project,
                                                                            always present.
which facilitates access to large datasets of music features [8] and
                                                                               Participants must train model(s) using the provided development
metadata [9]. AcousticBrainz is a community database containing
                                                                            sets and then predict genre and subgenre labels for the test sets.
music features extracted from audio files. Users who contribute to
                                                                            The task includes two subtasks:
the project run software on their computers to process their per-
sonal audio collections and submit music features to the Acoustic-          • Subtask 1: Single-source Classification. This subtask explores
Brainz database. Based on these features, additional metadata in-             conventional systems, each one trained on a single dataset. Par-
cluding genres can then be mined for recordings in the database.              ticipants submit predictions for the test set of each dataset sepa-
   We propose a new genre recognition task using datasets based               rately, using their respective class spaces (genres and subgenres).
on AcousticBrainz for MediaEval 2017. This task is different from a           These predictions will be produced by a separate system for each
typical genre recognition task in the following ways:                         dataset, trained without any information from the other sources.
                                                                              This subtask will serve as a baseline for Subtask 2.
• It allows us to explore how the same music can be annotated               • Subtask 2: Multi-source Classification. This subtask explores
  differently by different communities who follow different genre             the combination of several ground-truth sources to create a single
  taxonomies, and how this can be addressed when developing                   classification system. We use the same four test sets. Participants
  and evaluating genre recognition systems.                                   submit predictions for each test set separately, again following
• Genre recognition is often treated as a single category classifica-         each corresponding genre class space. These predictions may be
  tion problem. Our data is intrinsically multi-label, so we propose          produced by a single system for all datasets or by one system for
  to treat genre recognition as a multi-label classification problem.         each dataset. Participants are free to make their own decision
• Previous research typically used a small number of broad genre              about how to combine the training data from all sources.
  categories. In contrast, we consider more specific genres and
  subgenres. Our data contains hundreds of subgenres.
                                                                            3 DATA
                                                                            3.1 Genre Annotations
Copyright held by the owner/author(s).
MediaEval’17, 13-15 September 2017, Dublin, Ireland                         We provide four datasets containing genre and subgenre annota-
                                                                            tions extracted from four different online metadata sources:
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                                                   D. Bogdanov et al.

         Table 1: Overview of the development datasets.                                       The approximate split ratios are 70% for training and 15% for
                                                                                           testing (another 15% was reserved for future evaluation purposes).
      Dataset                 AllMusic      Discogs      Lastfm      Tagtraum              Table 1 provides an overview of the resulting development sets.
                                                                                           Details on the genre/subgenre taxonomy and their distribution in
      Type                    Explicit      Explicit      Tags          Tags
      Annotation level        Release       Release       Track         Track              the development sets in terms of number of recordings and release
                                                                                           groups are reported online.3 Recordings are partially intersected
      Recordings              1,353,213     904,944      566,710       486,740             (annotated by all four ground truths) in the development and test
      Release groups           163,654      118,475      115,161        69,025
                                                                                           sets. The full intersection of all development sets contains 247,716
      Genres                      21           15           30            31               recording, while the intersection of the two largest sets, AllMusic
      Subgenres                  745          300          297           265               and Discogs, contains 831,744 recordings.
      Genres/track               1.33         1.37         1.14          1.13                 All data are published in JSON and TSV formats; details about
      Subgenres/track            3.15         1.69         1.28          1.72
                                                                                           format are available online.4 Each recording in the development
                                                                                           sets is identified by a MusicBrainz ID (MBID)5 , which can be used
                                                                                           by participants to gather related data. Importantly, our split allows
• AllMusic [2] and Discogs [5] are based on editorial metadata                             to avoid the “album effect” [6], which leads to a potential overes-
  databases maintained by music experts and enthusiasts. These                             timation of the performance of a system when a test set contains
  sources contain explicit genre/subgenre annotations of music                             recordings from the same albums as the development set. The devel-
  releases (albums) following a predefined genre taxonomy. To                              opment sets additionally include information about release groups
  build the datasets we assumed that release-level annotations                             of each recording, which may be useful for participants in order
  correspond to all recordings in AcousticBrainz for that release.                         to avoid this effect when developing their systems. Partitioning
• Lastfm [7] is based on a collaborative music tagging platform                            scripts were provided to create training-validation splits ensuring
  with large amounts of genre labels provided by its users for music                       these characteristics in the data.
  recordings. Tagtraum [12] is similarly based on genre labels col-
  lected from users of the music tagging application beaTunes [3].
                                                                                           4    SUBMISSIONS AND EVALUATION
  We have automatically inferred a genre/subgenre taxonomy and
  annotations from these labels following the algorithm proposed                           Participants are expected to submit predictions for both subtasks.
  in [10] and a manual post-processing.                                                    We allow a maximum of five evaluation runs, each including both
                                                                                           subtasks, and reporting whether they used the whole development
We provide information about genre/subgenre tree hierarchies for
                                                                                           dataset or only parts for every submission.
every ground truth.1
                                                                                              The evaluation is carried out for each dataset separately. We
                                                                                           do not use hierarchical measures because the hierarchies in the
3.2      Music Features                                                                    Lastfm and Tagtraum datasets are not explicit. Instead, we compute
We provide music features precomputed from audio for every music                           precision, recall and F-score at different levels:
recording. All features are taken from the AcousticBrainz database
and were extracted from audio using Essentia, an open-source                               • Per recording, all labels.
library for music audio analysis [4]. The provided features are                            • Per recording, only genre labels.
explained online.2 Only statistical characterization of time frames                        • Per recording, only subgenre labels.
is provided (bag of features), that is, no frame-level data is available.                  • Per label, all recordings.
                                                                                           • Per genre label, all recordings.
                                                                                           • Per subgenre label, all recordings.
3.3      Development and Test Datasets
In total we provide four development (training+validation) and four                          The ground truth does not necessarily contain subgenre an-
test datasets associated with the four genre ground truths. They                           notations for some recordings, so we only considered recordings
were created by a random split of the full data ensuring that:                             containing subgenres for the evaluation at the subgenre level. An ex-
                                                                                           ample can be found online in the summaries of random baselines.6
• no recordings in the test sets are present in any of the develop-
                                                                                           We also provided evaluation scripts for development purposes.
  ment sets;
• no recordings in the test sets are from the same release groups
  (e.g., albums, singles, EPs) present in the development sets;                            5    CONCLUSIONS
• the same genre and subgenre labels are present in both develop-                          Bringing the AcousticBrainz Genre Task to MediaEval we hope to
  ment and test sets for each ground truth;                                                benefit from contributions and expertise of a broader machine learn-
• genre and subgenre labels are represented by at least 40 and 20                          ing and multimedia retrieval community. We refer to the MediaEval
  recordings from 6 and 3 release groups in development and test                           2017 proceedings for further details on the methods and results of
  sets, respectively.                                                                      teams participating in the task.

1 The resulting genre metadata is licensed under CC BY-NC-SA4.0 license, except
                                                                                           3 https://multimediaeval.github.io/2017-AcousticBrainz-Genre-Task/data_stats/
for data extracted from the AllMusic database, which is released for non-commercial
                                                                                           4 https://multimediaeval.github.io/2017-AcousticBrainz-Genre-Task/data/
scientific research purposes only. Any publication of results based on the data extracts
                                                                                           5 https://musicbrainz.org/doc/MusicBrainz_Identifier
of the AllMusic database must cite AllMusic as the source of the data.
2 http://essentia.upf.edu/documentation/streaming_extractor_music.html                     6 https://multimediaeval.github.io/2017-AcousticBrainz-Genre-Task/baseline/
AcousticBrainz Genre Task                                                       MediaEval’17, 13-15 September 2017, Dublin, Ireland


ACKNOWLEDGMENTS
We thank all contributors to AcousticBrainz. This research has re-
ceived funding from the European Union’s Horizon 2020 research
and innovation programme under grant agreement No 688382 (Au-
dioCommons). We also thank tagtraum industries for providing the
Tagtraum genre annotations.

REFERENCES
 [1] AcousticBrainz. 2017. (2017). https://acousticbrainz.org
 [2] AllMusic. 2017. (2017). https://allmusic.com
 [3] beaTunes. 2017. (2017). https://www.beatunes.com
 [4] D. Bogdanov, N. Wack, E. Gómez, S. Gulati, P. Herrera, O. Mayor, G.
     Roma, J. Salamon, J.R. Zapata, and X. Serra. 2013. Essentia: An Audio
     Analysis Library for Music Information Retrieval. In International
     Society for Music Information Retrieval (ISMIR’13) Conference. Curitiba,
     Brazil, 493–498.
 [5] Discogs. 2017. (2017). https://discogs.com
 [6] A. Flexer and D. Schnitzer. 2009. Album and Artist Effects for Audio
     Similarity at the Scale of the Web. In Sound and Music Computing
     Conference (SMC’09).
 [7] Lastfm. 2017. (2017). https://last.fm
 [8] A Porter, D Bogdanov, R Kaye, R Tsukanov, and X Serra. 2015. Acoustic-
     Brainz: a community platform for gathering music information ob-
     tained from audio. In International Society for Music Information Re-
     trieval (ISMIR’15) Conference. Málaga, Spain, 786–792.
 [9] A. Porter, D. Bogdanov, and X. Serra. 2016. Mining metadata from the
     web for AcousticBrainz. In International workshop on Digital Libraries
     for Musicology (DLfM’16). ACM, 53–56.
[10] H. Schreiber. 2015. Improving genre annotations for the Million Song
     Dataset. In International Society for Music Information Retrieval (IS-
     MIR’15) Conference. Málaga, Spain, 241–247.
[11] B. L. Sturm. 2014. The State of the Art Ten Years After a State of the
     Art: Future Research in Music Information Retrieval. Journal of New
     Music Research 43, 2 (2014), 147–172.
[12] Tagtraum. 2017. (2017). http://www.tagtraum.com

</pre>