-

BirdCLEF 2015 submission: Unsupervised feature learning from audio

Dan Stowell

dan.stowell@qmul.ac.uk 0 0 Centre for Digital Music, Queen Mary University of London

We describe our results submitted to BirdCLEF 2015 for classifying among 999 tropical bird species. Our test attained a MAP score of over 30% in the o cial results. This note is not a self-contained paper, since our system was largely the same as used in BirdCLEF 2014 and described in detail elsewhere. The method uses raw audio without segmentation and without using any auxiliary metadata. and successfully classi es among 999 bird categories.

Our unsupervised feature learning scales well with increasing data size: linearly, as described in the main paper. However, in our case, due to the compute resources available in the time leading up to the competition deadline we were not able to submit more than one run, nor to apply model averaging.

Our own tests using a two-fold split of the training data con rmed an observation that we made in [ 3 ]: adding more layers gives a bene t up to a certain limit, which appears to be related to the size of the available data set. In our tests (Figure 2) the available data appeared insu cient to support a three-layer variant, hence we submitted a two-layer run.

Feature learning Spectrograms Classification Spectrograms High-pass filtering & RMS normalisation High-pass filtering & RMS normalisation Spectral median noise reduction PCA whitening Spherical k-means Learnt bases Spectral median noise reduction Feature transformation Temporal summarisation Training labels Train/test (Random Forest) Decisions

For this 2015 challenge (across 999 bird species with 33,203 audio les) our nal MAP score was 30.2% (considering only foreground species), and 26.2% (including background species). These results are a few percentage points lower than the results for the similar systems submitted to the 2014 challenge, as one might expect given that the number of species to identify had been increased from 501 to 999.

Acknowledgments We would like to thank the people and projects which made available the data used for this research|the Xeno Canto website and its many volunteer contributors|as well as the SABIOD research project for instigating the contest, and the CLEF contest hosts.

This work was supported by EPSRC Early Career Fellowship EP/L020505/1. 20 10 0 lifeclef2015 Classifier: binary relevance s m 4kflplkfl 8 4 ce elp s m s m 4kflplkflplkfl 8 4 8 4 ce elp s m

1. Cappellato , L. , Ferro , N. , Jones , G. , San Juan, E. (eds.): CLEF 2015 Labs and Workshops, Notebook Papers . CEUR Workshop Proceedings (CEUR-WS.org) ( 2015 ), http://ceur-ws. org/ Vol- 1391 /

2. Lakshminarayanan , B. , Roy , D.M. , Teh , Y.W. : Mondrian forests: E cient online random forests . arXiv preprint arXiv:1406.2673 ( 2014 )

3. Stowell , D. , Plumbley , M.D.: Automatic large-scale classi cation of bird sounds is strongly improved by unsupervised feature learning . PeerJ 2 , e488 ( 2014 )