=Paper=
{{Paper
|id=Vol-1984/Mediaeval_2017_paper_6
|storemode=property
|title=The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recognition from Multiple Sources
|pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_6.pdf
|volume=Vol-1984
|authors=Dmitry Bogdanov,Alastair Porter,Julián Urbano,Hendrik Schreiber
|dblpUrl=https://dblp.org/rec/conf/mediaeval/BogdanovPUS17
}}
==The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recognition from Multiple Sources==
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recognition from Multiple Sources Dmitry Bogdanov1 , Alastair Porter1 , Julián Urbano2 , Hendrik Schreiber3 1 Music Technology Group, Universitat Pompeu Fabra, Spain 2 Multimedia Computing Group, Delft University of Technology, Netherlands 3 tagtraum industries incorporated dmitry.bogdanov@upf.edu,alastair.porter@upf.edu,urbano.julian@gmail.com,hs@tagtraum.com ABSTRACT • We provide information about the hierarchy of genres and sub- This paper provides an overview of the AcousticBrainz Genre Task genres within each annotation source. Systems can take advan- organized as part of the MediaEval 2017 Benchmarking Initiative for tage of this knowledge. Multimedia Evaluation. The task is focused on content-based music • MIR research is often performed on small music collections. We genre recognition using genre annotations from multiple sources provide a very large dataset with two million recordings an- and large-scale music features data available in the AcousticBrainz notated with genres and subgenres. However, we only provide database. The goal of our task is to explore how the same music precomputed features, not audio. pieces can be annotated differently by different communities follow- ing different genre taxonomies, and how this should be addressed 2 TASK DESCRIPTION by content-based genre recognition systems. We present the task The task invites participants to predict the genre and subgenre of challenges, the employed ground-truth information and datasets, unknown music recordings given automatically computed music and the evaluation methodology. features of those recordings. We provide four datasets of such music features taken from the AcousticBrainz database [8] together with four different ground truths created using four different music 1 INTRODUCTION metadata websites as sources. Their genre taxonomies vary in class spaces, specificity and breadth. Each source has its own definition Content-based music genre recognition is a popular task in Music for its genre labels, i.e., the same label may carry a different meaning Information Retrieval research [11]. The goal is to build systems able when used by another source. Most importantly, annotations in each to predict genre and subgenre of unknown music recordings (tracks source are multi-label: there may be multiple genre and subgenre or songs) using music features of those recordings automatically annotations for the same music recording. It is guaranteed that computed from audio. Such research can be supported by our re- each recording has at least one genre label, while subgenres are not cent developments in the context of the AcousticBrainz [1] project, always present. which facilitates access to large datasets of music features [8] and Participants must train model(s) using the provided development metadata [9]. AcousticBrainz is a community database containing sets and then predict genre and subgenre labels for the test sets. music features extracted from audio files. Users who contribute to The task includes two subtasks: the project run software on their computers to process their per- sonal audio collections and submit music features to the Acoustic- • Subtask 1: Single-source Classification. This subtask explores Brainz database. Based on these features, additional metadata in- conventional systems, each one trained on a single dataset. Par- cluding genres can then be mined for recordings in the database. ticipants submit predictions for the test set of each dataset sepa- We propose a new genre recognition task using datasets based rately, using their respective class spaces (genres and subgenres). on AcousticBrainz for MediaEval 2017. This task is different from a These predictions will be produced by a separate system for each typical genre recognition task in the following ways: dataset, trained without any information from the other sources. This subtask will serve as a baseline for Subtask 2. • It allows us to explore how the same music can be annotated • Subtask 2: Multi-source Classification. This subtask explores differently by different communities who follow different genre the combination of several ground-truth sources to create a single taxonomies, and how this can be addressed when developing classification system. We use the same four test sets. Participants and evaluating genre recognition systems. submit predictions for each test set separately, again following • Genre recognition is often treated as a single category classifica- each corresponding genre class space. These predictions may be tion problem. Our data is intrinsically multi-label, so we propose produced by a single system for all datasets or by one system for to treat genre recognition as a multi-label classification problem. each dataset. Participants are free to make their own decision • Previous research typically used a small number of broad genre about how to combine the training data from all sources. categories. In contrast, we consider more specific genres and subgenres. Our data contains hundreds of subgenres. 3 DATA 3.1 Genre Annotations Copyright held by the owner/author(s). MediaEval’17, 13-15 September 2017, Dublin, Ireland We provide four datasets containing genre and subgenre annota- tions extracted from four different online metadata sources: MediaEval’17, 13-15 September 2017, Dublin, Ireland D. Bogdanov et al. Table 1: Overview of the development datasets. The approximate split ratios are 70% for training and 15% for testing (another 15% was reserved for future evaluation purposes). Dataset AllMusic Discogs Lastfm Tagtraum Table 1 provides an overview of the resulting development sets. Details on the genre/subgenre taxonomy and their distribution in Type Explicit Explicit Tags Tags Annotation level Release Release Track Track the development sets in terms of number of recordings and release groups are reported online.3 Recordings are partially intersected Recordings 1,353,213 904,944 566,710 486,740 (annotated by all four ground truths) in the development and test Release groups 163,654 118,475 115,161 69,025 sets. The full intersection of all development sets contains 247,716 Genres 21 15 30 31 recording, while the intersection of the two largest sets, AllMusic Subgenres 745 300 297 265 and Discogs, contains 831,744 recordings. Genres/track 1.33 1.37 1.14 1.13 All data are published in JSON and TSV formats; details about Subgenres/track 3.15 1.69 1.28 1.72 format are available online.4 Each recording in the development sets is identified by a MusicBrainz ID (MBID)5 , which can be used by participants to gather related data. Importantly, our split allows • AllMusic [2] and Discogs [5] are based on editorial metadata to avoid the “album effect” [6], which leads to a potential overes- databases maintained by music experts and enthusiasts. These timation of the performance of a system when a test set contains sources contain explicit genre/subgenre annotations of music recordings from the same albums as the development set. The devel- releases (albums) following a predefined genre taxonomy. To opment sets additionally include information about release groups build the datasets we assumed that release-level annotations of each recording, which may be useful for participants in order correspond to all recordings in AcousticBrainz for that release. to avoid this effect when developing their systems. Partitioning • Lastfm [7] is based on a collaborative music tagging platform scripts were provided to create training-validation splits ensuring with large amounts of genre labels provided by its users for music these characteristics in the data. recordings. Tagtraum [12] is similarly based on genre labels col- lected from users of the music tagging application beaTunes [3]. 4 SUBMISSIONS AND EVALUATION We have automatically inferred a genre/subgenre taxonomy and annotations from these labels following the algorithm proposed Participants are expected to submit predictions for both subtasks. in [10] and a manual post-processing. We allow a maximum of five evaluation runs, each including both subtasks, and reporting whether they used the whole development We provide information about genre/subgenre tree hierarchies for dataset or only parts for every submission. every ground truth.1 The evaluation is carried out for each dataset separately. We do not use hierarchical measures because the hierarchies in the 3.2 Music Features Lastfm and Tagtraum datasets are not explicit. Instead, we compute We provide music features precomputed from audio for every music precision, recall and F-score at different levels: recording. All features are taken from the AcousticBrainz database and were extracted from audio using Essentia, an open-source • Per recording, all labels. library for music audio analysis [4]. The provided features are • Per recording, only genre labels. explained online.2 Only statistical characterization of time frames • Per recording, only subgenre labels. is provided (bag of features), that is, no frame-level data is available. • Per label, all recordings. • Per genre label, all recordings. • Per subgenre label, all recordings. 3.3 Development and Test Datasets In total we provide four development (training+validation) and four The ground truth does not necessarily contain subgenre an- test datasets associated with the four genre ground truths. They notations for some recordings, so we only considered recordings were created by a random split of the full data ensuring that: containing subgenres for the evaluation at the subgenre level. An ex- ample can be found online in the summaries of random baselines.6 • no recordings in the test sets are present in any of the develop- We also provided evaluation scripts for development purposes. ment sets; • no recordings in the test sets are from the same release groups (e.g., albums, singles, EPs) present in the development sets; 5 CONCLUSIONS • the same genre and subgenre labels are present in both develop- Bringing the AcousticBrainz Genre Task to MediaEval we hope to ment and test sets for each ground truth; benefit from contributions and expertise of a broader machine learn- • genre and subgenre labels are represented by at least 40 and 20 ing and multimedia retrieval community. We refer to the MediaEval recordings from 6 and 3 release groups in development and test 2017 proceedings for further details on the methods and results of sets, respectively. teams participating in the task. 1 The resulting genre metadata is licensed under CC BY-NC-SA4.0 license, except 3 https://multimediaeval.github.io/2017-AcousticBrainz-Genre-Task/data_stats/ for data extracted from the AllMusic database, which is released for non-commercial 4 https://multimediaeval.github.io/2017-AcousticBrainz-Genre-Task/data/ scientific research purposes only. Any publication of results based on the data extracts 5 https://musicbrainz.org/doc/MusicBrainz_Identifier of the AllMusic database must cite AllMusic as the source of the data. 2 http://essentia.upf.edu/documentation/streaming_extractor_music.html 6 https://multimediaeval.github.io/2017-AcousticBrainz-Genre-Task/baseline/ AcousticBrainz Genre Task MediaEval’17, 13-15 September 2017, Dublin, Ireland ACKNOWLEDGMENTS We thank all contributors to AcousticBrainz. This research has re- ceived funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688382 (Au- dioCommons). We also thank tagtraum industries for providing the Tagtraum genre annotations. REFERENCES [1] AcousticBrainz. 2017. (2017). https://acousticbrainz.org [2] AllMusic. 2017. (2017). https://allmusic.com [3] beaTunes. 2017. (2017). https://www.beatunes.com [4] D. Bogdanov, N. Wack, E. Gómez, S. Gulati, P. Herrera, O. Mayor, G. Roma, J. Salamon, J.R. Zapata, and X. Serra. 2013. Essentia: An Audio Analysis Library for Music Information Retrieval. In International Society for Music Information Retrieval (ISMIR’13) Conference. Curitiba, Brazil, 493–498. [5] Discogs. 2017. (2017). https://discogs.com [6] A. Flexer and D. Schnitzer. 2009. Album and Artist Effects for Audio Similarity at the Scale of the Web. In Sound and Music Computing Conference (SMC’09). [7] Lastfm. 2017. (2017). https://last.fm [8] A Porter, D Bogdanov, R Kaye, R Tsukanov, and X Serra. 2015. Acoustic- Brainz: a community platform for gathering music information ob- tained from audio. In International Society for Music Information Re- trieval (ISMIR’15) Conference. Málaga, Spain, 786–792. [9] A. Porter, D. Bogdanov, and X. Serra. 2016. Mining metadata from the web for AcousticBrainz. In International workshop on Digital Libraries for Musicology (DLfM’16). ACM, 53–56. [10] H. Schreiber. 2015. Improving genre annotations for the Million Song Dataset. In International Society for Music Information Retrieval (IS- MIR’15) Conference. Málaga, Spain, 241–247. [11] B. L. Sturm. 2014. The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval. Journal of New Music Research 43, 2 (2014), 147–172. [12] Tagtraum. 2017. (2017). http://www.tagtraum.com