<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>New York City, USA, July</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Multimodal Sentiment Analysis of Telugu Songs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harika Abburi</string-name>
          <email>harika.abburi@research.iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eswar Sai Akhil Akkireddy</string-name>
          <email>eswarsai.akhil@research.iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Suryakanth V Gangashetty</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Radhika Mamidi</string-name>
          <email>radhika.mamidi@iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Langauage Technology Research Center IIIT Hyderabad</institution>
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>10</volume>
      <issue>2016</issue>
      <fpage>48</fpage>
      <lpage>52</lpage>
      <abstract>
        <p>In this paper, an approach to detect the sentiment of a song based on its multi-modality natures (text and audio) is presented. The textual lyric features are extracted from the bag of words. By using these features, Doc2Vec will generate a single vector for each song. Support Vector Machine (SVM), Naive Bayes (NB) and a combination of both these classifiers are developed to classify the sentiment using the textual lyric features. Audio features are used as an add-on to the lyrical ones which include prosody features, temporal features, spectral features, tempo and chroma features. Gaussian Mixture Models (GMM), SVM and a combination of both these classifiers are developed to classify the sentiment using audio features. GMM are known for capturing the distribution in the features and SVM are known for discriminating the features. Hence these models are combined to improve the performance of sentiment analysis. Performance is further improved by combining the text and audio feature domains. These text and audio features are extracted at the beginning, ending and for the whole song. From our experimental results, it is observed that the first 30 seconds(s) of a song gives better performance for detecting the sentiment of the song rather than the last 30s or from the whole song.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Sentiment analysis is defined as a task of finding the
opinion about specific entities. In our case it is a task of finding
the sentiment of a song. With the growing amount of music
and the demand of human to access the music information
retrieval, music sentiment analysis is emerging as an
important and essential task for various system and applications.
To extract the sentiment, thousands of text, audio and video
documents will process in few seconds . Sentiment
analysis mainly focuses on two approaches, text based and audio
based [Tyagi and Chandra, 2015]. For any approach
sentiment can be extracted using sentiment classification
techniques like machine learning approach, lexicon based
approach and hybrid approach [Medhat et al., 2014].</p>
      <p>In lyric-based song sentiment classification,
sentimentvector space model is used for song sentiment classification
[Xia et al., 2008]. Experiments are done on two approaches:
knowledge-based and machine learning. In
knowledgebased, HowNet [Dong et al., 2010] is used to detect the
sentiment words and to locate the sentiment units with in
the song lyric. In machine learning, the SVM algorithm
is implemented based on Vector Space Model (VSM) and
sentiment-Vector Space Model (s-VSM), respectively.
Experiments show that s-VSM gives better results compared to
VSM and knowledge-based. A previous work includes
sentiment analysis for mining the topics from songs based on their
moods [Shanmugapriya and Dr.B.Srinivasan, 2015]. The
input lyrics files are measured based on the wordnet graph
representation and the sentiments of each song are mined
using Hidden Markov Model (HMM). Based on single adjective
words available from the audio dataset USPOP, a new dataset
is derived from the last.fm tags [Hu et al., 2007]. Using
this dataset, K-means clustering method is applied to create
a meaningful cluster-based set of high-level mood categories
for music mood classification. This set was not adopted by
others because mood categories developed by them were seen
as a domain oversimplification. The authors in [Hu et al.,
2009] presented the usefulness of text features in music mood
classification on 18 mood categories derived from user tags
and they show that these text features outperform audio
features in categories where samples are more sparse. An
unsupervised method to classify music by mood is proposed in
[Patra et al., 2013]. Fuzzy c-means classifier is used to do the
automatic mood classification.</p>
      <p>In audio-based song sentiment classification: A method is
presented for audio sentiment detection based on KeyWord
Spotting (KWS) rather than using Automatic Speech
Recognition (ASR) [Kaushik et al., 2015]. Experiments show that
the presented method outperform the traditional ASR
approach by 12 percent increase in classification accuracy.
Another method for detecting the sentiment from natural audio
streams is presented [Kaushik et al., 2013]. To obtain the
transcripts from the video, ASR is used. Then a sentiment
detection system based on Maximum Entropy modeling and
Part of Speech tagging is used to measure the sentiment of
the transcript. The approach shows that it is possible to
automatically detect sentiment in natural spontaneous audio with
good accuracy. Instead of using KWS and ASR we can
directly extract the features like prosody, spectral etc to detect
the sentiment of a song from audio. For music audio
classification, instead of using Mel Frequency Cepstral Coefficients
(MFCC) and chroma features separately combination of both
gives better performance. Because chroma features are less
informative for classes such as artist, but contain information
which is independent of the spectral features [Ellis, 2007].
Due to this reason in our work, experiments are done by
combining both features along with some other features.</p>
      <p>Instead of using only lyrics or only audio, research is also
done on combinations of both the domains. In [Hu and
Downie, 2010] work is done on the mood classification in
music digital libraries by combining lyrics and audio features
and discovered that complementing audio with lyrics could
reduce the number of training samples required to achieve the
same or better performance than single source-based systems.
Music sentiment classification using both lyrics and audio is
presented [Zhong et al., 2012]. For lyric sentiment
classification task, CHI approach and an improved difference-based
CHI approach were developed to extract discriminative
affective words from lyrics text. Difference-based CHI approach
gives good results compare to CHI approach. For audio
sentiment classification task, features like chroma, spectral etc.
are used to build SVM classifier. Experiments show that the
fusion approach using data sources help to improve music
sentiment classification. In [Jamdar et al., 2015], [Wang et
al., 2009] music is retrieved based on both lyrics and melody
information. For lyrics, keyword spotting is used and for
melody MFCC and Pitch features are extracted. Experiments
show that by combining both modalities the performance is
increased.</p>
      <p>In this work, a method to combine both lyrics and audio
features is explored for sentiment analysis of songs. As of
now, less research is done on multimodal classification of
songs in Indian languages. Our proposed system is
implemented on Telugu database. For lyrics, Doc2Vec is used
to extract the fixed dimension feature vectors of each song.
SVM and Naive Bayes classifiers are built to detect the
sentiment of a song due to their excellence in text classification
task. For audio, several features are extracted like prosody,
temporal, spectral, chroma, harmonics and tempo.
Classifiers that are built to detect the sentiment of a song are SVM,
GMM and combination of both. It is observed that in the
literature a lot of work is done on whole song to know the
sentiment, but the whole song will not give good accuracy because
the whole song may or may not carry the same attribute like
happy (positive) and sad (negative). The beginning and the
ending parts of the song includes the main attribute of that
song. Hence, experiments are done on different parts of the
song to extract the sentiment.</p>
      <p>The rest of the paper is organized as follows:Database and
classifiers used in this work is discussed in section 2 and
sentiment analysis using lyric features is discussed in section 3.
Sentiment analysis using audio features is discussed in
section 4. Multimodal sentiment analysis and experimental
results in proposed method for detecting the sentiment of a song
is discussed in section 5. Finally, section 6 concludes the
paper with a mention on the future scope of the present work.</p>
    </sec>
    <sec id="sec-2">
      <title>Database and Classifiers used in this study</title>
      <p>The database used in this paper is collected from the
Youtube which is a publicly available source. A total of 300
Telugu movie songs and lyrics corresponding to each song are
taken. The two basic sentiments presented in the database are:
Happy and Sad. Joyful, thrilled, powerful, etc are taken as
happy sentiment and ignored, depressed, worry, etc are taken
as sad sentiment. As our native language is Telugu, work is
implemented on Telugu songs which don’t have any special
features compared to other language songs. Telugu songs are
one of the popular categories of Indian songs and are present
in Tollywood movies. Most of the people belonging to the
south part of India will listen to these songs.The songs
include variety of instruments along with the vocals. Here the
main challenging issue is the diversity of instruments and
vocals. The average length of each song is three minutes thirty
seconds and average number of words in lyrics for each song
is around 300. The database is annotated for the sentiment
happy and sad by three people. Annotators are provided with
the two modalities such as text and audio to correctly figure
out the sentiment of a song. Then based on inter-annotator
agreement, 50 happy songs and 50 sad songs are selected
because some songs seems to be happy or sad for one annotator
and neutral to another annotator. So, only 100 songs are
selected out of 300. Inter-annotator agreement is a measure of
how well two or more annotators can make the same
annotation decision for a certain category. Among them 40% of
songs are used for training and 60% of songs are used for
testing.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Naive Bayes</title>
      <p>Naive Bayes classifier is a probabilistic classifier of words
based on the Bayes theorem with an independence
assumption that words are conditionally independent of each other.
This assumption does not affect the accuracy in text
classification but makes really fast classification algorithm.
Despite the assumptions that this technique uses, Naive Bayes
performs well in many complex real-world problems.
Multinomial Naive Bayes is used in our system where the
multiple occurrences of the words matter a lot in the classification
problem.</p>
      <p>The main theoretical drawback of Naive Bayes method is
that it assumes conditional independence among the
linguistic features. If the main features are the tokens extracted
from texts, it is evident that they cannot be considered as
independent, since words co-occurring in a text are somehow
linked by different types of syntactic and semantic
dependencies. Despite its simplicity and conditional independence
assumption, Naive Bayes still tends to perform surprisingly
well [Rish, 2001]. On the other hand, more sophisticated
algorithms might yield better results; such as SVM.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Support Vector Machines</title>
      <p>Support vector machine classifier is intended to solve two
class classification problems. The basic principle
implemented in a support vector machine is that the input vectors
which are not linearly separable are transformed to a higher
dimensional space and an optimum liner hyperplane is
designed to classify both the classes. An SVM [Campbel et al.,
2006] is a two-class classifier constructed from sums of a
kernel functions.
GMMs are well known to capture the distribution of data in
the feature space. A Gaussian mixture density is a sum of
M weighted component densities [Reynolds and Rose, 1995]
given by the equation:</p>
      <p>M
p(xk|λ) = !
r=1
wrKr(xk )
(1)
where xk is an N dimensional input vector,
Kr(xk ), r = 1...M are the component densities and
wr, r = 1...M are the weights of the mixtures.</p>
      <p>The product of the component Gaussian with its mixture
weight i.e., Kp(xk )wr is termed as component density. Sum
of the component densities is given by Gaussian mixture
density. The accuracy in capturing the true distribution of data
depends on various parameters such as dimension of feature
vectors, number of feature vectors and number of mixture
components. In this work expectation maximization (EM)
algorithm is used to train the GMM models using audio
features.
3</p>
    </sec>
    <sec id="sec-5">
      <title>Sentiment Analysis using Lyric Features</title>
      <p>This section describes the process of extracting the textual
lyrics of a song. These features are then used to build a
classifier of positive or negative sentiment of a song. In
Preprocessing step, lyrics which contain stanza names like ”pallavi” and
”charanam” were removed because, as the lyrics are collected
from the Internet the headings (”pallavi” and ”charanam”) are
common for each song which does not act like a feature to
detect the sentiment of the song. If the same line has to repeated,
it is represented as ”x2” in the original lyrics, so ”x2” is
removed and the line opposite to that is considered as twice. For
each song in a database one feature vector with 300
dimension is generated for better results. As we have 100 files, 100
feature vectors are generated one for each song. For checking
the accuracy, each song is manually annotated and is given a
tag like happy or sad.</p>
      <p>Here Doc2Vec model is used for associating random
documents with labels. Doc2vec modifies word2vec algorithm
to a unsupervised learning of continuous representations for
larger blocks of text such as sentences, paragraphs or whole
documents means Doc2vec learns to correlate labels and
words rather than words with other words. In the word2vec
architecture, the two algorithms used are continuous bag of
words and skip-gram and for the doc2vec architecture, the
corresponding algorithms are distributed memory and
distributed bag of words. All songs are given as input to the
doc2vec. This generates a single vector that represents the
meaning of a document, which can then be used as input to
a supervised machine learning algorithm to associate
documents with labels. Song sentiment analysis based on lyrics
can be viewed as a text classification task which can be handle
by SVM and NaiveBayes (NB) algorithms due to their better
classification performance. Both SVM and NaiveBayes
classifiers are trained with vectors generated from the doc2vec.
After calculating the probabilities from both the classifiers,
average probabilities of them is computed. Which ever class
gives highest average probability that test case is
hypothesized from that class. Like this these two classifiers are
compared. By combining both the classifiers, rate of detecting the
sentiment of a song is improved. Given a test data song, the
trained models classifies it as either happy or sad. Three
experiments are done on each song:beginning 30 seconds, last
30 seconds and for the whole song.</p>
      <p>From Table 1 it is observed that a combination of both the
classifiers gives high percentage for beginning of the song
compared to the ending and whole song. Whole song gives
less accuracy in detecting the sentiment of a song. By keeping
the training data set constant several experiments are done on
the test data. The average performance of sentiment analysis
for beginning, ending and for whole song is 75.7, 72.4 and
70.2 respectively.
4</p>
    </sec>
    <sec id="sec-6">
      <title>Sentiment Analysis using Audio Features</title>
      <p>This section describes the process of extracting the audio
features of a song. These features are then used to build a
classifier of positive or negative sentiment of a song. Each song
underwent the preprocessing step of converting mp3 files into
wave file (.wav format), into 16 bit, 16000 Hz sampling
frequency and to a mono channel. To extract a set of audio
features like mfcc, chroma, prosody, temporal, spectrum,
harmonics and tempo from a wave file openEAR/openSMILE
toolkit [Eyben et al., 2010] is used. Brief details about audio
features are mentioned below:
• Prosody features include intensity, loudness and pitch
that describe the speech signal.
• Temporal features also called as time domain features
which are simple to extract like the energy of signal, zero
crossing rate.
• Spectral features also called as frequency domain
features which are extracted by converting the time
domain into frequency domain using the Fourier
Transform. It include features like fundamental frequency,
spectral centroid, spectral flux, spectral roll-off, spectral
kurtosis, spectral skewness. These features can be used
to identify the notes, pitch, rhythm, and melody.
• In Mel-frequency Cepstral Coefficients (MFCC) (13
dimension feature vector) the frequency bands are equally
spaced on the mel scale, which approximates the human
auditory system’s response more closely.
• Chroma features (12 dimension feature vector) are most
popular feature in music and is extensively used for
chord, key recognition and segmentation.
• Harmonic tempo is the rate at which the chords change
in the musical composition in relation to the rate of
notes.</p>
      <p>Although this toolkit is designed for the emotion recognition,
the research has been done on sentimental analysis by using
the same toolkit which is succeeded [Mairesse et al., 2012].
As prosody have been used before for the task of emotion
recognition in speech, it has also been experimented for the
task of sentiment analysis by the authors [Mairesse et al.,
2012]. Three experiments are performed here:beginning 30
seconds, last 30 seconds and for the whole song. Features
that are extracted are trained on the classifiers such as SVM,
GMM and combination of both. GMM are known for
capturing the distribution in the features and SVM are known
for discriminating the features. Hence these models are
combined improve the performance of detecting the sentiment of
a song using the audio features. GMM need more features
for training compared to Naive Bayes and SVM, but in
textual part we have less features (only one feature vector for
one song using doc2vec). Where as for audio, several
features are their because for each song features are extracted at
frame level with a frame size of 20 ms. So for acoustic
models GMM and SVM are used where as for linguistic features
Naive Bayes and SVM are used. A total of 40 dimension
feature vectors are extracted, each of them obtained at frame
level. During the feature extraction, frame size of 25ms and
frame shift of 10ms are used. In this work, number of
mixtures for GMM models (64) and Gaussian kernel parameters
for SVM models are determined empirically.</p>
      <p>Whole song
Beginning of the song
Ending of the song</p>
      <p>From Table 2 it is observed that the whole song gives less
performance in detecting the sentiment of a song because the
whole song will carries different attributes (happy and sad)
which is not clear. So by using part of song, the performance
is increased. Hence experiments are done even on beginning
and ending of the song. Combination of both classifiers gives
a high percentage for beginning of the song compared to the
ending of the song. SVM is best performed at the ending
of the song, GMM is best performed at the beginning of the
song. By keeping training data set constant several
experiments are done on the test data. The average performance of
sentiment analysis for beginning, ending and for whole song
is 88.3, 82.3 and 69.7 respectively.
5</p>
    </sec>
    <sec id="sec-7">
      <title>Multimodal Sentiment Analysis</title>
      <p>In textual data, the only source that we have is information
regarding the words and their dependencies, which may
sometime be insufficient to convey the exact sentiment of the song.
Instead, audio data contain multiple modalities like acoustic,
and linguistic streams. From our experiments it is observed
that textual data gives less percentage than the audio, so the
simultaneous use of these two modalities will help to create a
better sentiment analysis model to detect whether the song is
happy or sad.</p>
      <p>Sequence of steps in proposed approach is presented in the
Figure 1. Table 3 presents the accuracy of sentiment by
combining lyrics and audio features. The whole song may not
convey sentiment, so there will be lot of similarity between
sad and happy features. Hence features extracted from
different parts of a song are used to identify the sentiment of
the song. To handle the similarity of sentiment classes,
decision from different classification models trained using
different modalities are combined. By combining both the
modalities performance is improved by 3 to 5%.
Lyric features which are generated using Doc2Vec and most
efficient audio features like spectral, chroma, etc are used to
built the classifiers. Sentiment analysis systems are built
using the whole song, beginning of the song and ending of the
song. By taking the whole song the performance is very less
because the full song will contain more information (features)
which is confusing. Hence experiments are done on the
beginning and the ending of the songs which are giving better
results. Features are extracted from beginning of the song
are observed to be giving better performance compared to the
whole song and the ending of the song. Because the
instruments and vocals which convey the sentiment for beginning
part of the song may or may not sustain throughout the song.
Several experiments are done by keeping training data
constant. The proposed method is evaluated using 100 songs.
From the experimental results, recognition rate is observed
to be in between 85% to 91.2%. This work can be extended
by including more attributes like angry, fear and by
extracting more features like rhythm and tonality. The percentage of
lyric sentiment analysis can be improved by using rule based
and linguistic approach.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Campbel et al.,
          <year>2006</year>
          ]
          <string-name>
            <given-names>M William</given-names>
            <surname>Campbel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P Joseph</given-names>
            <surname>Cambell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Douglas</given-names>
            <surname>Reynolds</surname>
          </string-name>
          , Elliot Singer, and
          <string-name>
            <given-names>A Pedro</given-names>
            <surname>Torres-Carrasquillo</surname>
          </string-name>
          .
          <article-title>Support vector machines for speaker and language recognition</article-title>
          .
          <source>Computer Speech &amp; Language</source>
          ,
          <volume>20</volume>
          (
          <issue>2</issue>
          ):
          <fpage>210</fpage>
          -
          <lpage>229</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Dong et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>Zhendong</given-names>
            <surname>Dong</surname>
          </string-name>
          , Qiang Dong, and
          <string-name>
            <given-names>Changling</given-names>
            <surname>Hao</surname>
          </string-name>
          .
          <article-title>Hownet and its computation of meaning</article-title>
          .
          <source>In Proc. 23rd international conference on computational linguistics: demonstrations, association for computational linguistic</source>
          , pages
          <fpage>53</fpage>
          -
          <lpage>56</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Ellis</source>
          , 2007]
          <string-name>
            <given-names>D. P. W.</given-names>
            <surname>Ellis</surname>
          </string-name>
          .
          <article-title>Clasifying music audio with timbral and chroma features</article-title>
          .
          <source>In Proc. 8th Int. Conf. Music Inf. Retrieval (ISMIR)</source>
          , pages
          <fpage>339</fpage>
          -
          <lpage>340</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Eyben et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>F.</given-names>
            <surname>Eyben</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wollmer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Schulle</surname>
          </string-name>
          .
          <article-title>opensmile the munich versatile and fast open-source audio feature extractor</article-title>
          .
          <source>In Proc. ACM Multimedia (MM)</source>
          , pages
          <fpage>1459</fpage>
          -
          <lpage>1462</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Hu and Downie</source>
          , 2010]
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Downie</surname>
          </string-name>
          .
          <article-title>Improving mood classification in music digital libraries by combining lyrics and audio</article-title>
          .
          <source>In Proc. Joint Conference on Digital Libraries</source>
          ,
          <source>(JCDL)</source>
          , pages
          <fpage>159</fpage>
          -
          <lpage>168</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Hu et al.,
          <year>2007</year>
          ]
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bay</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Downie</surname>
          </string-name>
          .
          <article-title>Creating a simplified music mood classification ground-truth set</article-title>
          .
          <source>In Proc. 8th International Conference on Music Information Retrieval</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Hu et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>Xiao</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Stephen</given-names>
            <surname>Downie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Andreas F.</given-names>
            <surname>Ehmann</surname>
          </string-name>
          .
          <article-title>Lyric text mining in music mood classification</article-title>
          .
          <source>In Proc. 10th International Conference on Music Information Retrieval (ISMIR)</source>
          , pages
          <fpage>411</fpage>
          -
          <lpage>416</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Jamdar et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Adit</given-names>
            <surname>Jamdar</surname>
          </string-name>
          , Jessica Abraham, Karishma Khanna, and
          <string-name>
            <given-names>Rahul</given-names>
            <surname>Dubey</surname>
          </string-name>
          .
          <article-title>Emotion analysis of songs based on lyrical and audio features</article-title>
          .
          <source>International Journal of Artificial Intelligence and Applications(IJAIA)</source>
          ,
          <volume>6</volume>
          (
          <issue>3</issue>
          ):
          <fpage>35</fpage>
          -
          <lpage>50</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Kaushik et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Lakshmish</given-names>
            <surname>Kaushik</surname>
          </string-name>
          , Abhijeet Sangwan, and
          <string-name>
            <surname>John H L. Hansen</surname>
          </string-name>
          .
          <article-title>Sentiment extraction from natural audio streams</article-title>
          .
          <source>In proc. ICASSP</source>
          , pages
          <fpage>8485</fpage>
          -
          <lpage>8489</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Kaushik et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Lakshmish</given-names>
            <surname>Kaushik</surname>
          </string-name>
          , Abhijeet Sangwan,
          <string-name>
            <given-names>and John H.L.</given-names>
            <surname>Hansen</surname>
          </string-name>
          .
          <article-title>Automatic audio sentiment extraction using keyword spotting</article-title>
          .
          <source>In Proc. INTERSPEECH</source>
          , pages
          <fpage>2709</fpage>
          -
          <lpage>2713</lpage>
          ,
          <year>September 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Mairesse et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mairesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Polifroni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. Di</given-names>
            <surname>Fabbrizio</surname>
          </string-name>
          .
          <article-title>Can prosody inform sentiment analysis? experiments on short spoken reviews</article-title>
          .
          <source>In Proc. IEEE Int. Conf. Acoust</source>
          .,
          <string-name>
            <surname>Speech</surname>
          </string-name>
          , Signal Processing, pages
          <fpage>5093</fpage>
          -
          <lpage>5096</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Medhat et al.,
          <year>2014</year>
          ]
          <string-name>
            <given-names>Walaa</given-names>
            <surname>Medhat</surname>
          </string-name>
          , Ahmed Hassan, and
          <string-name>
            <given-names>Hoda</given-names>
            <surname>Korashy</surname>
          </string-name>
          .
          <article-title>Sentiment analysis algorithms and applications: A survey</article-title>
          .
          <source>Ain Shams Engineering journal</source>
          , pages
          <fpage>1093</fpage>
          -
          <lpage>1113</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Patra et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Patra</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Das</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bandyopadhyay</surname>
          </string-name>
          .
          <article-title>Unsupervised approach to hindi music mood classification</article-title>
          .
          <source>In Mining Intelligence and Knowledge Exploration (MIKE</source>
          <year>2013</year>
          ),
          <string-name>
            <given-names>R.</given-names>
            <surname>Prasath</surname>
          </string-name>
          and T. Kathirvalavakumar (Eds.):
          <source>LNAI 8284</source>
          , pages
          <fpage>62</fpage>
          -
          <lpage>69</lpage>
          ,
          <year>2013</year>
          . Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Reynolds and Rose</source>
          , 1995]
          <string-name>
            <given-names>A Douglas</given-names>
            <surname>Reynolds and C Richard</surname>
          </string-name>
          <article-title>Rose</article-title>
          .
          <article-title>Robust text-independent speaker identification using gaussian mixture speaker models</article-title>
          .
          <source>IEEE Transactions on Speech and Audio Processing</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <fpage>72</fpage>
          -
          <lpage>83</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Rish</source>
          , 2001]
          <string-name>
            <given-names>Irina</given-names>
            <surname>Rish</surname>
          </string-name>
          .
          <article-title>An empirical study of the naive bayes classifier</article-title>
          .
          <source>In Proc. IJCAI-01 Workshop on Empirical Methods in Artificial Intelligence</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Shanmugapriya and
          <string-name>
            <surname>Dr.B.Srinivasan</surname>
          </string-name>
          ,
          <year>2015</year>
          ]
          <string-name>
            <given-names>K.P</given-names>
            <surname>Shanmugapriya and Dr.B.Srinivasan</surname>
          </string-name>
          .
          <article-title>An efficient method for determining sentiment from song lyrics based on wordnet representation using hmm</article-title>
          .
          <source>International Journal of Innovative Research in Computer and Communication Engineering</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1139</fpage>
          -
          <lpage>1145</lpage>
          ,
          <year>February 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>[Tyagi and Chandra</source>
          , 2015]
          <string-name>
            <given-names>Atul</given-names>
            <surname>Tyagi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nidhi</given-names>
            <surname>Chandra</surname>
          </string-name>
          .
          <article-title>An introduction to the world of sentimnt analysis</article-title>
          .
          <source>In Proc. 28th IRF International Conference</source>
          ,
          <year>June 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>[Wang</surname>
          </string-name>
          et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>Tao</given-names>
            <surname>Wang</surname>
          </string-name>
          , DongJu Kim, KwangSeok Hong, and JehSeon Youn.
          <article-title>Music information retrieval system using lyrics and melody information</article-title>
          .
          <source>In proc. AsiaPacific Conference on Information Processing</source>
          , pages
          <fpage>601</fpage>
          -
          <lpage>604</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Xia et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>Yunqing</given-names>
            <surname>Xia</surname>
          </string-name>
          , Linlin Wang,
          <string-name>
            <surname>Kam-Fai Wong</surname>
            , and
            <given-names>Mingxing</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
          </string-name>
          .
          <article-title>Sentiment vector space model for lyric-based song sentiment classification</article-title>
          .
          <source>In proc. ACL-08:HLT</source>
          ,
          <string-name>
            <surname>Short</surname>
            <given-names>Papers</given-names>
          </string-name>
          , pages
          <fpage>133</fpage>
          -
          <lpage>136</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Zhong et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Jiang</given-names>
            <surname>Zhong</surname>
          </string-name>
          , Yifeng Cheng, Siyuan Yang, and
          <string-name>
            <given-names>Luosheng</given-names>
            <surname>Wen</surname>
          </string-name>
          .
          <article-title>Music sentiment classification integrating audio with lyrics</article-title>
          .
          <source>Information and Computational Science</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ):
          <fpage>35</fpage>
          -
          <lpage>54</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>