=Paper=
{{Paper
|id=Vol-2786/Paper34
|storemode=property
|title=Indian Classical Raga Identification using Machine Learning
|pdfUrl=https://ceur-ws.org/Vol-2786/Paper34.pdf
|volume=Vol-2786
|authors=Dipti Joshi,Jyoti Pareek,Pushkar Ambatkar
|dblpUrl=https://dblp.org/rec/conf/isic2/JoshiPA21
}}
==Indian Classical Raga Identification using Machine Learning==
<pdf width="1500px">https://ceur-ws.org/Vol-2786/Paper34.pdf</pdf>
<pre>
                                                                                                                     259


Indian Classical Raga Identification using Machine Learning
Dipti Joshi, Dr. Jyoti Pareek, Pushkar Ambatkar

Department of Computer Science, Gujarat University, Ahmedabad, Gujarat, India

          Abstract
          Ragas demonstrate the pride of Indian classical music. Raga is the original musical form in Indian
          classical music. It consists of set of swaras (lyrical notes) that made up of various characteristics as a
          melodious conception which is played by the instruments and the singer. Based on the features of the
          raga, the Indian classical music is separated into two Parts: Hindustani (North Indian) and Carnatic
          (South Indian) classical music. Our experiment is concentrate on Hindustani classical music. In our
          experiment, K Nearest Neighbor (KNN) and Support vector machine (SVM) classifiers are used on
          the raga dataset of Yaman and Bhairavi to achieve classification and identification of the raga. We
          have done accurate outcomes with both KNN and SVM classifiers.

          Keywords
          Raga identification, Feature extraction, Machine Learning, KNN, SVM


1. Introduction                                                     The characteristics of Ragas are based on Indian
                                                                    Classical Music techniques, which blend notes
Indian classical music is the music of the Indian                   with the following features to qualify as a Raga.
subcontinent. Raga or Raag hold a prominent
position in Indian Classical Music. A Raag is a                     Notes (swaras)
collection of musical notes that, when sung or                      There have to be at least 5 or 7 notes (swaras) in a
                                                                    Raag. The primary seven notes are S (Sa), R (Re
performed on a musical instrument, are quite
                                                                    or Ri), G (Ga), M (Ma), P (Pa), D (Dha), N (Ni).
attractive. Raga recognition comprises of methods                   Aaroh and Avroh
that define and classify notes from a piece of                      Each Raga or Raag is composed of a "Aaroh" that
music into a suitable raga. In Hindustani classical                 implies swaras scale up and a "Avroh" that implies
music, Ragas is a very significant idea and express                 swaras scale down.
the moods and sentiments of concert. The                            Vadi and Samvadi
classification of ragas comes only after an enough                  Each raag consisting of "Vadi" means main notes
amount of exposure as it is an intellectual process. 1              and "Samvadi" means supporting swaras.
Any of the attributes of ragas have to be translated                Gamakas
into appropriate characteristics for automated                      It has a constant frequency rate. Notes in a raga
recognition.                                                        are a series of continuous (back and forth
                                                                    movement in a rhythm) variation, such sort of
                                                                    notes are known as Gamakas.
ISIC'21: International Semantic Intelligence Conference, February   Pakad
25-27, 2021, New Delhi, India
    joshidipti1408@gmail.com (D. Joshi)                             A set of Swaras which are distinctively recognizes
   0000-0001-9166-4555 (D. Joshi)                                   a raga. There is a particular Pakad for each raga.
           ©2021 Copyright for this paper by its authors.
            Use permitted under Creative Commons License            Tala
           Attribution 4.0 International (CC BY 4.0).               Tala refers to a rhythmic form, which is
             CEUR Workshop Proceedings (CEUR-WS.org)
                                                                    constructed from variety of beats.
                                                                                                           260


Thaat                                                   for extracting track functions. A machine learning
Thaat is used in raga classification. There are         tool WEKA is used which works on .arff file
unique ten Thaats namely Kalyan, Bilawal,               format. Bayesian net, Naive Bayes, Support vector
Bhairav, Khamaj, Poorvi, Marwa, Kafi, Asawari,          machine (SVM), J48, Decision table, Random
Bhairavi, and Todi.                                     forest classifiers on Bhairav, Yaman, Shanakara,
The piece of music has to be converted to Swara         Saarang dataset. The predominant demanding
for classification. Due to the following factors,       situations are the complicated variables like pitch
there are several difficulties in converting the        and mood in the music track, skipping greater
piece of music in Swara [1]                             tones, the transformation of various dataset
                                                        parameters and Raag. The effects are as compared
1. During any performance, a music part is made         before and after discretization, though in this Raag
    up of many instruments.                             music identification, the accuracy of the
2. The notes in Indian classical music are on a         possibility-based classifier is greater. It shows that
    relative scale.                                     a probability-based classifier gave accurate results.
3. In a raga, there is no static initial Swara.         Comparatively, Bayesian Net provides better
4. In Indian music, the notes do not have a             performance. [4] Hiteshwari Sharma and R. S.
    predetermined frequency rate.                       Bali have recognized various key variables for
5. In classical music, the series of the swaras in      raga classification and Soft computing fuzzy sets
    the particular ragas is not static as it allows     technique for recognition of raga. They used a
    various innovations.                                dataset of five ragas like Des, Bhupali, Yaman,
The key purpose of raga recognition is that it will     Todi, and Pahadi with three parameters as time,
provide a good start for Hindustani music               dirgaswaras, and vadi and they have achieved
information retrieval and it allows us to predict the   reasonable accuracy as well. [5] G. Pandey, C.
raga's performance and accuracy. Besides this, for      Mishra, and Paul Ipe introduced the Hidden
music analysis, we can also create a playlist           Markov Model and pakad matching on the dataset
focused on ragas.                                       of two ragas Bhupali and Yaman kalian. They
                                                        have achieved an 77% accuracy with basic HMM
2. Related work                                         and 87% accuracy with both HMM and Pakad
                                                        matching methods. In [6] Muhammad Asim Ali
In this section, we have reviewed different work        and Zain Ahmed Siddiqui, their research was
done by other authors and analyzed their work for       based on Automatic Music Genres Classification
future scope. We have tried to give an analysis of      using Machine Learning. For which they have
various     classifiers,  their   relevance    and      used algorithms like the K Nearest Neighbor
performance for Raga identification. In [2]             (KNN) and Support Vector Machine (SVM) to
Sharma, Hiteshwari, Bali, Rasmeet S, Raga               anticipate the genre of songs. Using the GTZAN
identification have been done on the four ragas         dataset, which has a wide range of ten genres, such
like Des, Bhupali, Yaman, and Todi - dataset of         as blues, hip-hop, jazz, classical, metal, reggae,
live performances of both voice-based and               country, pop, disco, and rock, they gathered
instrumental, and executed identification using         musical data. They used the data set of 1000
pitch class profile and n-gram histogram machine        songs. The above comparison shows that SVM is a
learning classifiers. For the pitch class profile,      more efficient classifier than KNN. [7]
they received 83.39% accuracy and 97.3% for the         SnigdhaChillara, Kavitha A, Shwetha A Neginhal,
n-gram histogram. In this paper [3] Ekta Patel and      Shreya Haldia, Vidyullatha K, proposed to solve
Savita Chauhan, have used the MATLAB toolbox            the classification problem and comparison among
                                                                                                            261


some other models using the Free Music Archive           centroid might be around the center and if there
small (fma_small) dataset. In that, two sorts of         are excessive frequencies at the end of the sound
inputs were given to the models. Wherein CNN             then the centroid tends to be closer to its end.
models used the spectrogram images and .csv file         Zero-Crossing Rate
for Logistic Regression and ANN model used               The rate at which sign varies is known as the zero-
audio features stored in. They have received             crossing rate. Zero crossing rate is the rate wherein
88.5% accuracy using CNN on the spectrogram              the signal varies from positive to negative and vice
based model which is quite good compared to              versa. Speech recognition and music information
different algorithms used by other authors.              retrieval are being commonly used in Zero
                                                         crossing rates. It has excessive values for loud and
3. Work Done                                             noisy sounds like in metal and rock.
                                                         Roll-Off Frequency
In this section, we will discuss different               Particularly Roll-off suggests the activity of a
characteristics of audio and Machine Learning
                                                         particular sort of channel; one planned to Roll-off
algorithms like K Nearest Neighbor and Support
Vector Machine in a brief way.                           frequencies raised or lowered at a certain point. It
                                                         is called roll-off as the method is progressive.
   Feature Extraction                                    Spectral Bandwidth
Each audio signal comprises of several features.         A radiated spectral quantity is not less than half its
But, it requires fetching the characteristics that are   maximum value in spectral bandwidth. It
suitable for the issue that we want to solve. The        determines the extent of the Spectrum.
method of fetching characteristics to apply for the      This is an interval difference between lower and
study is referred to as feature extraction. We will      higher frequency.
have a brief idea about some of the characteristics
below, in detail.
Power Spectrogram                                           Machine Learning Algorithms
A spectrogram is a graphical demonstration of the        During our analysis, we have found that the
                                                         Supervised Machine Learning approach might be a
spectrum of frequencies of a signal as it differs        good fit for our problem.
with time. When it is used with an audio signal,         We have tried to implement various classification
spectrograms are sometimes referred to as the            algorithms and found that K Nearest Neighbor and
sonographs, voiceprints or voicegrams. To                Support Vector Machine is quite appropriate for
determine the raga, we are using the mean of             our experiment.
spectrogram to get which tone/ pitch is used more.       K Nearest Neighbor (KNN)
MFCC- Mel-Frequency Cepstral Coefficients                K Nearest Neighbor (KNN) is a supervised
This feature is one of the most necessary                learning method. It is the simplest but robust
techniques to extract attributes of an audio signal      algorithm that is applied for both regression and
and it is used mostly when we are working on             classification problems. To build a prediction, the
audio signals. The mfccs of a signal are a set of        KNN algorithm uses the whole dataset in which
characteristics (approximately (10–20)) which in         we attempt to classify data points to a particular
brief illustrates the general form of a spectral         category with the help of the training set.
cover.                                                   Support Vector Machine (SVM) Support
Spectral Centroid                                           vector machine (SVM)        is        also a
It shows that the "center of mass" that considered       supervised learning method that is usually used for
the weighted mean of the frequencies present             classification. A hyperplane that clearly divides
within the sound. If it gets the equal frequencies in    the sampling points with various labels is
tune for a particular time span, then the spectral       identified by this algorithm. It separates sample
                                                                                                        262


points of both labels and class on different sides of   5. Result and Discussion
the hyperplane.
                                                        In our experiment, we have chosen Ragas like
4. Dataset and Strategy                                 Yaman and Bhairavi. We have used vocal-
                                                        instrument Dataset consisting of 341 audio clips,
In this experiment, we have used the dataset of         out of which, 194 audio clips are of Yaman and
Audio files. The dataset was created by extracting      147 audio clips are of Bhairavi.
60-second audio clips from the internet. For music
and audio analysis, a python package Librosa is         The tables below show the accuracy of KNN and
used. It provides the segments that are required for    SVM algorithms.
creating music information retrieval systems.
Another open-source machine learning library is         Table.1 Results of          Raga    Identification
Scikit-learn which supports both, supervised and        accuracy Using KNN
unsupervised learning methods. It also provides a
                                                           KNN        Train/Test       Classification
variety of tools for model selection, data pre-
                                                                      Ratio            Accuracy
processing, model fitting, and estimation. In this
experiment, we have chosen Yaman and Bhairavi
                                                                        80 / 20             98%
Raga. We have split them into 60-sec frames
                                                             1          60 / 40             94%
which allow the computer to work only on the
specific part of the song like Pakad, Aaroh, and                        40 / 60             93%
Avroh and remove other noise and empty fields                           80 / 20             97%
from the audio. Using Librosa Library, we have               2          60 / 40             93%
created a .CSV file to save all the features like                       40 / 60             92%
Mfcc, Spectrogram, Bandwidth, Centroid, zero-                           80 / 20             97%
crossing, and Roll-off which are to be extracted             3          60 / 40             94%
from the audio file. Below is the flow of the                           40 / 60             93%
process.                                                                80 / 20             95%
                                                             4          60 / 40             93%
                                                                        40 / 60             93%
                Feature         Identification
 Audio         Extraction       and Classification
Dataset
                                                        Table.2 Results of Raga Identification accuracy
                                                        using SVM
Figure 1: Process of Raga Identification [8]
                                                         Classifier    Train/Test     Classification
This is the visual representation of the                                  Ratio          Accuracy
classification procedure in which features are                         80/20               95%
taken out from the audio frame and compared with            SVM        60/40               95%
the weight of the closer mean. Using Scikit Learn                      40/60               94%
library, we have implemented KNN and SVM                In both the tables given above, we have
algorithms on the .CSV data file. We found that         implemented two different classifiers i.e. KNN
both KNN and SVM fit best for the classification        and SVM. Using KNN, we can observe that the
of raga other than Logistic Regression.                 accuracy of raga identification is varied for
                                                        different Neighbour values. We have divided the
                                                                                                          263


Train/ Test ratio into 3 different categories that are   [4] H. Sharma and R. S. Bali, “Raga identification
80/20, 60/40, and 40/60 respectively. For all KNN            of Hindustani music using soft computing
values, we got the highest accuracy in Train/Test            techniques,” 2014 Recent Adv. Eng. Comput.
ratio 80/20 i.e. For KNN value 1 to 4; we received           Sci. RAECS 2014, pp. 6–8, 2014.
98%, 97%, 97%, and 95% accuracy respectively.            [5] G. Pandey, C. Mishra, and P. Ipe, “Tansen: A
Using SVM, we got 95% accuracy for the 80/20                 system for automatic raga identification,”
Train/Test ratio.                                            Indian Int. Conf. Artif. Intell., pp. 1350–1363,
                                                             2003.
From this comparison, we can conclude that KNN           [6] Muhammad Asim Ali and Zain Ahmed
with all Neighbor values for Train/Test ratio 80/20          Siddiqui     ,”Automatic       Music      Genres
gives the highest accuracy which is better in                Classification using Machine Learning”,
comparison with SVM.                                         International Journal of Advanced Computer
                                                             Science and Applications, Vol. 8, No. 8, 2017.
6. Conclusion and Future work                            [7] S. Chillara, A. S. Kavitha, S. A. Neginhal, S.
                                                             Haldia, and K. S. Vidyullatha, “Music Genre
A short introduction on raga and its attributes are          Classification using Machine Learning
considered. Prior policies for raga classification           Algorithms : A comparison,” no. May, pp.
and recognition are observed with their data                 851–858, 2019.
records, implementation applications, correctness,       [8] Kalyani C. Waghmare and Balwant A.
and problems. In this paper, we have discussed the           Sonkamble, “Raga Identification Techniques
classification of different ragas like Yaman and
                                                             for Classifying Indian Classical Music: A
Bhairavi by applying K-Nearest-Neighbor (KNN),
                                                             Survey”, International Journal of Signal
Support vector machine (SVM) machine learning
                                                             Processing Systems Vol. 5, No. 4, December
algorithms. We have obtained good results with
                                                             2017.
KNN and SVM, but in our experiment, KNN
seems to be performing slightly better.

In the future, we will expand our dataset with
many other Ragas for acquiring more accurate
results and also implement other classifiers for
detecting ragas in a well-defined manner.

References
[1] Vijay Kumar, Harit Pandya, C.V. Jawahar,
    “Identifying Ragas in Indian Music”, 22nd
    International     Conference       on    Pattern
    Recognition 2014.
[2] H. Sharma and R. S. Bali, “Comparison of ML
    classifiers for Raga recognition,” Int. J. Sci.
    Res. Publ., vol. 5, no. 10, pp. 1–5, 2015.
[3] E. Patel and S. Chauhan, “Raag detection in
    music using supervised machine learning
    approach,” Int. J. Adv. Technol. Eng. Explor.,
    vol. 4, no. 29, pp. 58–67, 2017.

</pre>