Linked Data Collection and Analysis Platform of
                Audio Features

              Yuri Uehara, Takahiro Kawamura, Shusaku Egami,
              Yuichi Sei, Yasuyuki Tahara, and Akihiko Ohsuga

                   Graduate School of Information Systems,
              University of Electro-Communications, Tokyo, Japan
         {uehara.yuri,kawamura,egami.shusaku}@ohsuga.is.uec.ac.jp
                     {seiuny,tahara,ohsuga}@uec.ac.jp


      Abstract. Audio features extracted from music are commonly used in
      music information retrieval (MIR), but there is no open platform for
      data collection and analysis of audio features. Therefore, we build the
      platform for the data collection and analysis for MIR research. On the
      platform, we represent the music data with Linked Data. In this paper,
      we first investigate the frequency of the audio features used in previous
      studies on MIR for designing the Linked Data schema. Then, we build
      a platform, that automatically extracts the audio features and music
      metadata from YouTube URIs designated by users, and adds them to
      our Linked Data DB. Finally, the sample queries for music analysis and
      the current record of music registrations in the DB are presented.

      Keywords: Linked Data, audio features, music information retrieval


1   Introduction

Recently, there are a large number of studies on music. Music Information Re-
trieval (MIR) deals with music on computers and has been studied in various
ways [1]. In these studies, audio features extracted from music are frequently
used, however, there is no open platform for collecting data including the audio
features for music analysis. Therefore, we propose the platform for MIR research
in this paper.
    On the platform, we used Linked Data format, since it is suitable for complex
searches for audio features and songs-related metadata. Note that this platform
is designed for music-related researchers and developers, who intend to ana-
lyze music information and create their own applications, e.g., recommendation
mechanism. Use of a listener is beyond the scope of this paper.


2   Schema Design of Music Information
In this section, designing Linked Data schema, including audio features and
music metadata is described.
2.1      Selection of audio features

Audio features refer to the characteristics of the music, such as Tempo represent-
ing the speed of the track, the features used in MIR studies vary. For example,
Osmalskyj et al. used Tempo and Loudness to identify cover songs [3]. Luo et
al. used the audio features Pitch, Zero crossing rate, etc. to detect of common
mistakes in novice violin playing [4].
    Thus, we investigated the frequency of the audio features used in previous
MIR studies. We collected 114 papers published in the International Society of
Music Information retrieval (ISMIR)1 in 2015, which is the top conference in
the field of MIR. Then, we selected some of the audio features according to the
policies: Features appeared just once in the publications can be ignored, etc.


2.2      Design of the schema

We defined original properties for selected audio features, excluding Key and
Mode, since there were no existing properties for them, or the properties are
not appropriate for our purpose. Then, we classified properties of audio features
into some classes for making them easy to use. Table 1 shows the classes and
properties corresponding to the audio features.

                       Table 1. Class and property of audio features
 Class     Property Audio Features       Explanation                                                Count
 Tempo    tempo      Tempo             speed                                                         28
 Key      key        Key, Mode         tonality, diﬀerence of major and minor chord                  5
 Timbre   zerocross Zero crossing rate the rate at which the signal changes from positive to neg-    3
                                       ative or back
          rolloﬀ     Roll oﬀ           ratio of bass which accounts for 85 percent of the total       3
          brightness Brightness        ratio of high-range (more than 1500Hz)                         2
 Dynamics rmsenergy RMS energy         the average of the volume (root mean square                    2
          lowenergy Low energy         ratio of sound low in volume                                   2


    We designed the music schema with the video id (URI) of YouTube. In Fig.
1, the id: dvgZkm1xWPE indicates a song “Viva La Vida” by Coldplay. In the
graph, the id node links to the classes of audio features and then links to each
audio feature. Also, we added some degrees for categorizing numerical values in
the features. The tempo has tmarks based on tempo values2 , which is a measure
of the speed marks: Slow means 39 or less bpm, Largo means 40 – 49 bpm, etc.
    In addition, we extended the schema of music metadata. In the graph of
metadata, the video id node links to the class of metadata, and then links to
the detailed value, as well as the graph for the audio features. Also, some nodes
such as the artist name are linked to the external DBs like DBpedia.
    There is the graphs of “Viva La Vida” by Coldplay, and thus the audio fea-
tures graph and the metadata graph can be linked with the video id of YouTube.
1
    http://www.ismir.net/
2
    http://www.sii.co.jp/music/try/metronome/01.html
4           Uehara, Y., Kawamura, T., Egami, S., Sei, Y., Tahara, Y, Ohsuga, A.
                                                                                                                                                     http://purl.org/NET/c4dm/
                                                                                                                           AFlatMajor
                       rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
graph, the id node links       to the classes of audio features and then linkskeys.owl#Key
                       rdfs: http://www.w3.org/2000/01/rdf-schema#                                                      to each
audio feature. Also, we        added some degrees for categorizing
                       mus-voc: http://www.ohsuga.is.uec.ac.jp/music/vocabulary#
                       mo: http://purl.org/ontology/mo/
                                                                                                     numerical values 213.35^^xsd:double
                                                                                               rdfs:label

in the features. The lowenergy, the rmsenergy, and the brightness have rdf:type                                          a class
by 0.1, the zerocross has         a class by 100, and the rolloﬀ
                             138.055 ^^xsd:double
                                                                                             has a class by2001000. rdf:value 3564.23^^xsd:double
                                                                                        mo:AFlatMajor
                                                                    3
The tempo has tmarks based on tempo values                     Allegro, which is a measure of the speed
                                                                                                                 mus-voc:
marks: Slow means 39 orrdf:value less bpm, Largo               means
                                                     mus-voc:tmark
                                                                            40 – 49 bpm, Lentoclassification        means 50zerocross            rdf:value
                                                                                             mo:key
– 55 bpm, Adagio means 56 – 62           tempo
                                               bpm,      Andante           means
                                                                 mus-voc:tempo
                                                                                         63   –  75    bpm,        Moderato
                                                                                                                  mus-voc:zerocross                   rolloff
means 76 – 95 bpm, Allegretto means 96 – 119 bpm, Allegro means                                                    120 – 151
                                                                                                          mus-voc:timbre
                                                                                   audiofeatures
bpm, Vivace means 152 – 175 bpm, Presto means 176 – 191 bpm, Prestissimo                                                                  mus-voc:rolloff
                                                                                                                           timbre
means 192 – 208 bpm, Fast means over 209 bpm.                                            mus-voc:dynamics
                                                   mus-voc:features
    In addition, we extended the schema                      of music metadata, that includes notmus-voc:brightness                                            3000
                                                                                  dynamics
only song title, artist name,         etc.
                             dvgZkm1xWPE
                                              in  the    Music Ontology, but also lyricist name,
                                                        mus-voc:lowenergy
                                                                                             mus-voc:rmsenergy
cd name for the complex search for music information. In the                                      graph of metadata          brightness
                                                               lowenergy                                                                        mus-voc:
, the video id node links to the class of metadata, and then                                   links to the detailed
                                                                                           rmsenergy                                            classification

value, as well as the graph for rdf:value the audio features. Also, some nodes                                 such as rdf:value
                                                                                                           mus-voc:
                                                                                                           classification
                                                                                                                              the
                                                                   mus-voc:        rdf:value                                                             0.4
artist name are linked to the external DBs like                         DBpedia.
                                                                   classification
                                                                                                                  0.3
    There is the graphs of 0.55      ^^xsd:double
                                 “Viva      La Vida” by        0.5
                                                                    Coldplay,      0.31 and    thus
                                                                                        ^^xsd:double     the   two       graphs  0.447773 ^^xsd:double
can be linked with the video id of YouTube.

              Fig. 1. Part of audio features in the Linked Data of music songs information
3      Music Information Extracting
               3         Music Information Extracting
The system architecture for our music information extraction is shown in Fig.
2, and its workflow is indicated by the number 1 to 11 as follows.
               The system architecture for our music information extraction is shown in Fig.
               3, and its workflow is indicated by the number 1 to 11.
                             Client                                    Server                                                                  1.     Download the video data from the YouTube video
                                                              
                                                                                                       MATLAB
                                                                                                                                                      URI designated by a user in a web browser.
                                                                            Servlet 1
                                 Add songs
                                                                                                                                     RDB       2.     Call the MATLAB process that analyzes audio
                                                                         - Cache the                Analysis of
                                      Browser
                                                                  1      data of               2    audio features         3
                                                                                                                                                      features in the video file.
                                                                                                                                   MySQL
                                                Input                    YouTube video                                                         3.     Store the obtained audio features in an RDB,
                                      the YouTube video URL                                          MIRtoolbox
                                                                         - Start the                                                                  MySQL.
                                                                         MATLAB
                                                                         program                            4
                                                                                                                                           7   4.     Call the RDF create program.
                                                                   9
                                                                                                                    Servlet 2
                                                                                                                                               5.     Obtain the music information for the video from
                                                                                                                                                      the YouTube website.
                                                                                                   - Acquire the meta data
                                 Songs retrieval
                                                                                RDF DB             - Acquire the audio features                6.     Search the music metadata using Last.fm API.
                                    Browser
                                                                  10                               - Create RDF                                7.     Query the audio features of the video for MySQL.
                                         SPARQL query                           Virtuoso
                                                                                           8       - Add to Virtuoso
                                                                                                                                               8.     Convert the metadata and audio features to RDF
                                                                  11                                                                                  graphs, and store them in an RDF store, Virtuoso.
                                                                                                                6                    5         9.     Notify the completion to the user.
                                                                                                                                               10.    Submit a simple SPARQL query for confirmation.
                                                                                                        Last.fm                 YouTube        11.    Returns the evidence of the inclusion of new sub-
                                                                                                                                                      graphs corresponding to the video.
                                                   Fig. 2. Structure of the system

              Our system obtains videos to analyze audio features from YouTube, and so
1. Download the video data from the YouTube video URI designated by a user in a web browser.
          public users can easily extend the music information on the platform. However,
2. Call the MATLAB process that analyzes audio features in the video file.
3. Store the obtained audio features in an RDB, MySQL.
          we discard the video files after extracting the audio features, and thus we believe
4. Call the RDF create program.
5. Obtain the music information for the video from the YouTube website.
          this process does not cause any legal or moral problems.
6. Search the music metadata using Last.fm API.
 3            The workflow is divided into several phases. The first phase is for analyzing
   http://www.sii.co.jp/music/try/metronome/01.html
          the audio features of the YouTube video, and the second phase is for acquiring
          the metadata of the YouTube video. Then, the third phase converts the metadata
          and audio features to RDF graphs, and the RDF graphs are stored in Virtuoso
          database. Then, the final phase is for the confirmation of newly added graphs.


               4         Example of Music Analysis
               The current number of music registered in the platform is 1073 and the num-
               ber of triples automatically extracted for representing the audio features and
                                        ! The final LTEX source files
                                        ! A final PDF file
                                        ! A copyright form, signed by one author on behalf of all of the authors of the
                                             paper.

                                        ! A readme giving the name and email address of the corresponding author.
                                        [SPARQL Query]
                                        PREFIX mus-voc:<http://www.ohsuga.is.uec.ac.jp/music/
                                        vocabulary#>
                                        PREFIX mo:<http://purl.org/ontology/mo/>

                                       SELECT ?artist_x ?title_x ?brightness_x
                                       WHERE { ?metadata rdfs:label ?title .
                                               ?resource mus-voc:meta ?metadata .
    6    Checklist of Items to be Sent to Volume Editors
                                               ?resource mus-voc:features ?features .
the metadata for that music is 20858. The platform is publicly available at
                                               ?features mus-voc:timbre ?timbre .
                                               ?timbre mus-voc:brightness ?brightnessc .
    Here is a checklist of everything the volume
http://www.ohsuga.is.uec.ac.jp/music/.                   editor
                                               ?brightnessc      requires
                                                             rdf:value      from you:
                                                                       ?brightness.
                                               ?brightnessc_x rdf:value ?brightness_x.
     ! this section, we show the results of some example queries on the platform,
    In   The final LATEX source files          ?timbre_x mus-voc:brightness ?brightnessc_x .
                                               ?features_x mus-voc:timbre ?timbre_x .
and how the music Linked Data can be used for MIR.In the SPARQL Query,
     !   A final PDF file
                                               ?resource_x mus-voc:features ?features_x .
                                               ?resource_x mus-voc:meta ?metadata_x .
we specified the audio feature Brightness of the song “Hello, Goodbye” by The
                                               ?metadata_x rdfs:label ?title_x .
     ! and search other songs, in which the value of the Brightness is similar
Beatles,
         A copyright form, signed by one author        on behalf
                                               ?metadata_x          of all of
                                                            mo:MusicArtist    the authors. of the
                                                                            ?MusicArtist_x
                                               ?MusicArtist_x rdfs:label ?artist_x .
         paper.                                FILTER regex(?title, "Hello Goodby") . }
to the specified song. As the result, we get 5 songs, in which the Brightness has
                                       ORDER BY (
     !   A readme giving the name and
the similar  degree in Table 2.
                                              email address
                                         IF( ?brightness       of the corresponding author.
                                                          < ?brightness_x,
                                             ?brightness_x - ?brightness,
                                             ?brightness - ?brightness_x )
    [SPARQL Query]                     ) LIMIT 5
    PREFIX mus-voc:<http://www.ohsuga.is.uec.ac.jp/music/
    vocabulary#>
    PREFIX mo:<http://purl.org/ontology/mo/>

    SELECT ?artist_x ?title_x ?brightness_x               Table 1. Result of submitting the SPARQL query
    WHERE { ?metadata rdfs:label ?title .
            ?resource mus-voc:meta ?metadata .                   artist x         title x           brightness x
            ?resource mus-voc:features ?features .               The Beatles      Can’t Buy Me Love   0.553889
            ?features mus-voc:timbre ?timbre .                   Whitney  Houston Never   Give Up     0.559786
            ?timbre mus-voc:brightness ?brightnessc .            Coldplay         Princess Of China 0.560039
            ?brightnessc rdf:value ?brightness.                                   Ft.  Rihanna
            ?brightnessc_x rdf:value ?brightness_x.              Lady Gaga        Judas               0.550279
                                                                 The Beatles      Penny Lane          0.550221
            ?timbre_x mus-voc:brightness ?brightnessc_x .
            ?features_x mus-voc:timbre ?timbre_x .
            ?resource_x mus-voc:features ?features_x .
            ?resource_x mus-voc:meta ?metadata_x .
            ?metadata_x rdfs:label ?title_x .             Table 2. Result of submitting the SPARQL query
            ?metadata_x mo:MusicArtist ?MusicArtist_x .
            ?MusicArtist_x rdfs:label ?artist_x .                artist x         title x           brightness x
            FILTER regex(?title, "Hello Goodby") . }             The Beatles      Can’t Buy Me Love   0.553889
    ORDER BY (                                                   Whitney Houston Never Give Up        0.559786
      IF( ?brightness < ?brightness_x,                           Coldplay         Princess Of China 0.560039
          ?brightness_x - ?brightness,                                            Ft. Rihanna
          ?brightness - ?brightness_x )                          Lady Gaga        Judas               0.550279
    ) LIMIT 5                                                    The Beatles      Penny Lane          0.550221


5    Conclusion
             Tableand     Future
                   1. Result            Work
                             of submitting the SPARQL query

                       artist x        title x
In this paper, we proposed      a platform
                       The Beatles                   forbrightness
                                       Can’t Buy Me Love providing
                                                          0.553889
                                                                   x
                                                                     audio features and the music
                       Whitney Houston Never Give Up      0.559786
metadata to MIR research.
                       Coldplay        Princess Of China 0.560039
                                       Ft. Rihanna
    In future, we plan Lady
                        to Gagaprovide Judas more sophisticated
                                                          0.550279     examples and applications
                       The Beatles     Penny Lane         0.550221
of music information analysis, which will encourage the expansion of the music
Linked Data to music researchers and developers.
                      Table 2. Result of submitting the SPARQL query

Acknowledgments. artist x
                     This
                   The Beatles
                               worktitle x           brightness x
                                            was supported
                                   Can’t Buy Me Love   0.553889
                                                                  by JSPS KAKENHI Grant
                   Whitney Houston Never Give Up       0.559786
Numbers 16K12411, 16K00419,
                   Coldplay            16K12533.
                                   Princess Of China 0.560039
                                              Ft. Rihanna
                               Lady Gaga      Judas           0.550279
                               The Beatles    Penny Lane      0.550221

References
1. Kitahara, T., Nagano, H.: Advancing Information Sciences through Research on
   Music:0. Foreword. IPSJ magazine “Joho Shori”. 57 (6), 504–505. (2016)
2. Wang, M., Kawamura, T., Sei, Y., Nakagawa, H., Tahara, Y., Ohsuga, A.: Context-
   aware Music Recommendation with Serendipity Using Semantic Relations. Proceed-
   ings of 3rd Joint International Semantic Technology Conference. 17–32. (2013)
3. Osmalskyj, J., Foster, P., Dixon, S., Embrechts, J.J.:Combining Features for Cover
   Song Identification. Proceedings of the 16th International Society for Music Infor-
   mation Retrieval Conference. 462–468. (2015)
4. Luo, Y-J., Su, L., Yang, Y-H., Chi, T-S.:Real-time Music Tracking using Multiple
   Performances as a Reference. Proceedings of the 16th International Society for Music
   Information Retrieval Conference. 357–363. (2015)