DNN in the AcousticBrainz Genre Task 2017 Nicolas Dauban IRIT, Université de Toulouse, CNRS, Toulouse, France nicolas.dauban@irit.fr ABSTRACT 3.2 Architecture of the network This paper presents a method of genre classification using deep To establish the final topology of the neural network, different neural networks for the AcousticBrainz genre classification task of architectures have been tried. The final architecture comprises 8 MediaEval 2017. full connected layers of 2000 neurons with ReLUs as activation function (Figure 1). The input vector had size of all the data used (for the 9 statistics on the 40 MEL bands, the input was a vector of 1 INTRODUCTION size 360), and an output with the total number of genre and sub- The AcousticBrainz Genre Task 2017 is a music genre recognition genres (e.g. 315 for the Discogs subset) with a sigmoid activation (MGR) task organised by MediaEval [1] where participants had to function. With the sigmoid output function, it was common that make predictions on genre and subgenres, based on audio features the network did not predict any genre for some tracks, because not extracted from Essentia [2]. The target labels are provided by 4 a single output neuron had a value superior to 0.5 for those tracks. different sources: Discogs, Allmusic, Tagtraum and Lastfm. In order to have at least one genre prediction per track if no genre was predicted, the genre with the maximum output value is chosen as the predicted one. For the submission, the network was trained 2 RELATED WORK with 40 epochs. I participated in this challenge during my internship on content- based music recommendation. During this internship, I worked on music genre recognition using music features with deep neural network [3]. This approach has been tested on the GTZAN [4] corpus and then on the MagnaTagATune [5] corpus. The accuracy results obtained were about 90% on the GTZAN corpus and about 80% with MagnaTagATune. As these results were satisfying, we decided to take part of the challenge to try the neural network approach. 3 APPROACH As the time evolution of the music features was not supplied, the convolutional approach - which needs correlations between successive frames of the input image - has been quickly aborted. Figure 1: DNN architecture Thus, we used a classic deep neural network instead of a convolu- tional one. The neural network has been implemented using the Theano framework “Lasagne” 1 . We tried to use different features 4 RESULTS AND ANALYSIS and different architectures of neural networks. Only one submission was made for each dataset, for the task 1 only. 3.1 Features The Table 1 and Table 2 contain the results per tracks for all During the development phase, the following features have been labels and for genre labels: tried: Mel bands, spectral rolloff, zero crossing rate, spectral en- Table 1: Results per track, for all labels tropy, Harmonic Pitch Class Profile (HPCP), tempo, danceability, key strength, dissonance, tuning diatonic strength. The choice of those features has been made by relying on their definitions and by Precision Recall F-score choosing those which seem to be the more relevant to characterize Allmusic 0.3751 0.1455 0.1813 the different music genres. In our preliminary experiments, Mel Discogs 0.3043 0.1243 0.1664 bands yielded the best results, thus, only Mel bands were used as LastFM 0.1845 0.0813 0.1073 input for our final submission. Tagtraum 0.2892 0.1214 0.1649 1. https://lasagne.readthedocs.io/en/latest/ The results for sub-genres and per label are not shown here be- cause they were all between 0% and 10%. All the results are provided on the AcousticBrainz Genre Task results page 2 . Copyright held by the owner/author(s). MediaEval’17, 13-15 September 2017, Dublin, Ireland 2. https://multimediaeval.github.io/2017-AcousticBrainz-Genre-Task/results/ MediaEval’17, 13-15 September 2017, Dublin, Ireland N. Dauban Table 2: Results per track, for all labels RÉFÉRENCES [1] Dmitry Bogdanov, Alastair Porter, Julián Urbano, and Hendrik Schreiber. The Precision Recall F-score mediaeval 2017 acousticbrainz genre task: Content-based music genre recognition from multiple sources. In Working Notes Proceedings of the MediaEval Workshop, Allmusic 0.3962 0.3334 0.3513 Dublin, Ireland, September 13-15, 2017. Discogs 0.3221 0.2674 0.2812 [2] Dmitry Bogdanov, Nicolas Wack, Emilia Gómez, Sankalp Gulati, Perfecto Herrera, Oscar Mayor, Gerard Roma, Justin Salamon, José R Zapata, Xavier Serra, et al. LastFM 0.1937 0.1764 0.1813 Essentia: An audio analysis library for music information retrieval. In ISMIR, Tagtraum 0.3231 0.3055 0.311 pages 493–498, 2013. [3] Christine Senac, Thomas Pellegrini, Florian Mouret, and Julien Pinquier. Music feature maps with convolutional neural networks for music genre classification. In Proceedings of ACM Workshop on Content Based Multimedia Indexing (CBMI). During the development phase, the network obtained a 60% F- ACM, June 19-21, 2017. Score on genre-only on Discogs. During this phase, the data has [4] George Tzanetakis and P Cook. Gtzan genre collection. Music Analysis, Retrieval been split using the python script provided by organizers to ensure and Synthesis for Audio Signals, 2002. [5] Edith Law, Kris West, Michael I Mandel, Mert Bay, and J Stephen Downie. Eva- album filtering: 80% to train the network and 20% to test it. However, luation of algorithms using games: The case of music tagging. In ISMIR, pages the best F-score obtained on genre labels of our final submission 387–392, 2009. was 35% per track maybe due to a lack of generalization power of our network. 5 DISCUSSION We plan to explore our initial idea to use combinations of dif- ferent acoustic feature types as we previously showed that these were as successful as using Mel bands. Furthermore, some modifi- cations on the network can be done: adding batch-normalization after each dense layer, residual blocks, try some different initializa- tion and decay strategies for the learning rate, and also lower the threshold of the output sigmoid in order to give more predictions.