<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrieval Evaluation, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Sentiment Analysis on Multilingual Code-Mixed Kannada Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Satyam Dutta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Himanshi Agrawal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pradeep Kumar Roy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Information Technology</institution>
          ,
          <addr-line>Surat, Gujarat</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>1</volume>
      <fpage>3</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>The extended use of the internet and social networking platforms has ushered in a new avenue for people to share their ideas. Sentiment analysis is the process of categorizing people's feelings represented in their thoughts and remarks. It is one of the most debated and researched subjects in Natural Language Processing (NLP) at the moment. Many studies have been ofered for sentiment analysis of texts comprising only one language, such as English, Spanish, or Arabic. However, few studies have focused on code-mixed language analysis, which is critical in countries like India, where people speak and express themselves in multiple languages. We present a model in this research, that aids in sentiment analysis of Dravidian Code-Mixed Kannada comments, which achieved a promising weighted 1-score of 0.66 using the BERT model on the validation dataset, whereas the F1-score on the test dataset was 0.619.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sentiment Analysis</kwd>
        <kwd>Code-Mixed</kwd>
        <kwd>Kannada</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Transformer</kwd>
        <kwd>BERT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Sentiment analysis, in the field of natural language processing analyses text, uses computational
linguistics and biometrics to uniquely identify, extract, quantify, and study efective states and
subjective information [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Sentiment analysis is a fascinating yet common field of study.
Many from the research community have worked on varying topics and applied analytical tools
to get solutions to many problems [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Reviews are a vital element of many businesses that
deal with products, as they provide with consumers’ requirements while also helping many
ifrms develop from the ofered feedback [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This enables businesses to adapt and benefit more
efectively in response to customer demands. However, most of the research conducted on social
media posts and YouTube videos comments analysis uses English as the primary language [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        In terms of culture and languages, India is a land of contrasts. In India, about 447 languages
are spoken, with Dravidian languages accounting for 19.64 percent of the population. Anyone
may learn and converse in any language. As a result, the internet is brimming with information
in a variety of languages and even a combination of languages. So, in a sense, not a lot of
research has been conducted on using sentiment analysis for analyzing data in many of the
Indian languages, particularly languages from the Dravidian family, namely-Telugu, Tamil,
Malayalam, and Kannada [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
      </p>
      <p>
        In multilingual communities like India, language mixing, also known as code-mixing, is
rather prevalent. People who are multilingual (particularly non-native English speakers), have
a tendency to code-mix in their primary language by employing English-based phonetic typing
and inserting anglicisms [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. It is relatively typical to see code-mixing behavior at the word level,
in addition to mixing languages at the sentence level. These phenomena present a considerable
barrier to traditional NLP systems, which rely on monolingual resources to process the inputs.
NLP tasks such as language recognition and translation [9], speech tagging, parsing and semantic
analysis and processing, ofensive language detection [ 10, 11] are all afected by code mixing.
Traditional NLP systems rely extensively on one language, which limits their ability to handle
challenges like English-based phonetic typing, word-level code-mixing, and other issues while
dealing with code-mixed text.
      </p>
      <p>
        The focus of this study is to bring the challenge of sentiment analysis in a code-mixed textual
data to the attention of the research community. The data for the “Dravidian-CodeMix-FIRE
20211" challenge was scraped from the YouTube comments, where each instance of data is
labeled with one of the sentiment polarities: "positive, negative, neutral, mixed feelings or not
in the intended languages" [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We have described our numerous Machine Learning and Deep
Learning methodologies in this work, as well as our final proposed model, which is based on
Transformers, Bidirectional Encoder Representations from Transformers (BERT model).
      </p>
      <p>The rest of the paper is organized as follows: Section 2 discusses the related works. Section 3
discusses the proposed methodology. Section 4 discusses the experimental outcomes of various
machine learning and deep learning models. Finally, Section 5 concludes the work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>Sentiment analysis plays a major role in the decision-making system and is widely used in
various lfied including recommendation systems, e-commerce, hotel business and many others.
The importance of the sentiment analysis in diferent domains attracted researchers to build
an automated system, and hence many models have been reported in recent years [12, 13, 14].
Ouyang et al. [14] proposed a deep learning-based framework for sentiment analysis. They
used the word2Vec technique for text to a vector representation. Then three layers of CNN
followed by a pooling layer were used for the sentiment analysis task. Li et al. [15] developed a
model using parallel CNN and LSTM network for sentiment analysis on English movie reviews
and Chinese tourism reviews dataset. The sentiment padding technique was developed by the
authors and claimed the technique was better than zero padding. Further, lexicon integrated
CNN and Bi-LSTM model was developed. Ombabi et al. [16] used one layer of CNN and a
two-layered LSTM model for Arabic sentiment analysis purposes. The features extracted by
CNN and LSTM model were given to the SVM model to predict the sentiment. Authors used
FastText embedding for text representation and achieved 90.75% accuracy on multi-domain
corpus for the best case.</p>
      <p>The major issues with the existing researches including the language dependency of the
model. Mainly English, Chinese, Arabic or others, but a single language dataset was used
for model development, which can not process the multilingual data. The existing models</p>
      <sec id="sec-2-1">
        <title>1https://dravidian-codemix.github.io/2021/index.html</title>
        <p>
          are not considered for evaluation if the message, review, or tweets consist of more than one
language like English-Hindi, English-Chinese, English-Tamil, and similar. However, as per need,
few recent works reported the deep learning models to address these issues recently [
          <xref ref-type="bibr" rid="ref2">2, 17</xref>
          ].
They developed the Dravidian code-mixed dataset and developed multiple machines and deep
learning frameworks to handle the code-mixed input data. The shortcomings of the existing
techniques and limited research on Dravidian code-mixed data motivated us for this work.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>All the applied Machine Learning, Deep Learning, and Transformer models are described in
depth in this section, the developed code for this work can be found on the given link2. On the
Kannada dataset [18], we first used several basic Machine Learning models like Random Forest,
Support Vector Machine, and Logistic Regression with default values of hyper-parameters.
The performances of these models were evaluated in terms of precision, recall and F1-score
[19]. Table 1 shows the data statistics that were used in the analysis. The working steps of
the proposed methodology are shown in Figure 1. The dataset consisted of a large number of
invalid contents like emoji, URLs, and others. Hence, data cleaning and the prepossessing task
is performed prior to developing the model, such as eliminating all the emojis, emoticons, and
special symbols. We also removed all of the numbers and transformed the data to lowercase.
For text encoding, we employed the n-gram Term Frequency-Inverse Document Frequency
vectorizer.</p>
      <p>We created a hybrid model combining Convolutional Neural Networks (CNN) [20, 21] and
Bidirectional Long Short-Term Memory (Bi-LSTM) networks after the machine learning models
failed to produce promising results. The hybrid model’s details, as well as their hyperparameters,
are listed in Table 2. This hybrid model outperformed the machine learning models by a little
margin. However, to attain even better results, we built a Deep Learning model by utilizing the
Transformer architecture, known as Bidirectional Encoder Representations from Transformers
(BERT) [22]. We used two diferent methods to implement this model:
1. BERT model implementation from scratch using TensorFlow3.
2. BERT model implementation using a wrapper known as ktrain4.</p>
      <p>The basic diference between these methods is, in the first one, we must manually reformat
the data in ways that the BERT model accepts. Whereas, we only need to specify the
hyperparameters value in the ktrain [23] method, and ktrain will take care of the rest. Both strategies use
a similar implementation and indicate the same that transfer learning is the preferred method
for downstream jobs. Nonetheless, both strategies are discussed here because we began with
the conventional BERT model, where we built our model using BERT from the ground up to
better understand how it works. After knowing the internal workings of BERT, we went for a
strategy to create the same model architecture with less efort in less time, avoiding most of the
pre-processing steps. The following subsections provide an explanation of the two strategies
mentioned above.</p>
      <sec id="sec-3-1">
        <title>2https://github.com/RogNet11/Sentiment-Analysis-of-Code-Mixed-Dravidian-Text 3https://www.tensorflow.org/text/tutorials/classify_text_with_bert 4https://github.com/amaiya/ktrain</title>
        <p>Machine Learning Models:
Multinomial NB
Logistic Regression
Random Forest
XGBoost
Decision Tree
l
e
d
o
M
d
i
r
b
y
H</p>
        <p>Dense (5)
Dropout
Dense (64)</p>
        <p>Flatten
Max-pooling
Conv1D</p>
        <p>(64)
Bi-LSTM</p>
        <p>(100)
Embedding Layer</p>
        <p>(50x256)
Preprocessing
Code-Mixed</p>
        <p>Dataset
Transformer Based Models:
BERT with Tensorflow
BERT with ktrain</p>
        <sec id="sec-3-1-1">
          <title>3.1. BERT from scratch using TensorFlow</title>
          <p>BERT is a pre-trained transformer-based model. In addition to standard text input, the BERT
employs a feature known as ‘Attention Masks,’ which aids in the better capture of phrase context.
The research article [24] provides a full explanation about it.</p>
          <p>A transformer consists of an encoder that reads the text input and a decoder that generates a
prediction for the input. Nevertheless, only the encoder mechanism is required here because
BERT aims to create a language model. Compared to the directional models that read the input
text sequentially, the Transformer encoder’s most crucial characteristic is that it reads the
complete sequence of words at once. This property enables the model to deduce the context of
a word from its surroundings. It is faster too because it does not follow the recursive structure
of RNNs and LSTMs [25], but rather parallelization.</p>
          <p>We employed the ‘bert-base-uncased’5 model for this study, which produced better outcomes
than the Roberta and XLNet models. We also tried a few other multilingual models, but their
accuracy was lower than the one we used since, although being multilingual, the other models
are trained on monolingual phonetics, whereas the dataset largely contains code mixed inputs.
Each BERT model has its own tokenizer, which we used to encode the input data into vectors
and attention masks. After that, we loaded the data arrays into a TensorFlow dataset object and
turned them into TensorFlow tuples. Then, we constructed the model with the hyperparameters
listed in Table 3.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.2. BERT with ktrain</title>
          <p>ktrain is a lightweight wrapper for TensorFlow, Keras and other libraries, making it much easier
to design, train, and deploy neural networks and other machine learning models [23]. It comes
with all of the methods and pre-processing procedures pre-programmed, and it is simple to
specify and fine-tune the parameters to generate a good model. A list of the parameters supplied
to ktrain is presented in Table 4.</p>
          <p>When comparing the hyperparameters of both implementations, the only variation is the
number of epochs, as both approaches are nearly identical apart from implementation. However,
because the method ’early stopping’ is utilized, the number of epochs in run time will be the
same. Diferent learning rates were also tried, with the one that produced the best results being
chosen. During the testing of various language models, some of the models caused problems
when implemented from the scratch. As a result, ktrain was used to conduct the experiments
and obtain the final data, however both methods yielded similar results. With ktrain also, the
same model was used: ’bert-base-uncased’.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <p>In the task of Dravidian-CodeMix-FIRE2021, we have classified code mixed Kannada social
media comments into five diferent sentiment classes: (i) Mixed feelings, (ii) Positive, (iii)
Negative, (iv) Not related to that language (Not-Kannada) and (v) Unknown state. Table 2
shows Hyper-parameters details for the proposed hybrid model for Kannada Dataset. Table</p>
      <sec id="sec-4-1">
        <title>5https://github.com/google-research/bert</title>
        <p>3 shows Hyper-parameters details for the BERT model from scratch with TensorFlow for
Kannada Dataset, and Table 4 shows hyperparameter details for the BERT model with ktrain
for Kannada Dataset. Table 5 shows results for code-mixed Kannada sentiment analysis using
various Machine Learning models. Support vector machine achieved the highest accuracy of
0.56 and weighted F1-score of 0.50 compared to other machine learning models. Table 6 shows
code-mixed Kannada sentiment analysis results using hybrid Deep Learning Model(Bi-LSTM +
CNN), which was improved from machine learning models and provided promising weighted
precision of 0.66, recall of 0.52 and F1-score of 0.55.</p>
        <p>Table 7 shows results for code-mixed Kannada sentiment analysis using “BERT with
TensorFlow” and “BERT with ktrain”. “BERT with TensorFlow” model achieved the weighted precision
value of 0.64, recall of 0.66, and F1-score of 0.64, whereas, “BERT with ktrain" model achieved
0.66, 0.66, 0.66 as weighted precision, recall and F1-score respectively. The above-mentioned
results are obtained on the validation dataset, that was provided to us by the organizers. Our best
model, i.e., “BERT with ktrain” achieved the weighted precision, recall and F1-scores as 0.672,
0.654 and 0.619 respectively on the test dataset. To further analyze the misclassified instances
from the various classes, the confusion matrix obtained using the best performing model, is
plotted as shown in Figure 2. Among all, 51% of negative, 36% of positive, 31% mixed-feelings,
and 40% of unknown state are misclassified to not-Kannada category, which may be the reason
behind lower overall weighted F1-score. Data prepossessing and more samples of the data in
another category might be needed to improve the overall prediction accuracy.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Sentiment analysis has been the subject of discussion because it is seen to be significant in
light of the growing amount of data available on the internet. To classify our code-mixed data
into distinct sentiment categories, we used a variety of machine learning, deep learning and
transformer-based models, with the transformer-based BERT model outperforming the other
variants. The BERT model was implemented in two ways: one from the ground up and another
that is advanced and simple. For the code-mixed Kannada dataset, latter obtained a promising
weighted F1-score of 0.66 on the validation dataset; while on the test dataset, the weighted
F1-score achieved was 0.619.
[9] B. R. Chakravarthi, R. Priyadharshini, S. Banerjee, R. Saldanha, J. P. McCrae, A. K. M,
P. Krishnamurthy, M. Johnson, Findings of the shared task on machine translation in
Dravidian languages, in: Proceedings of the First Workshop on Speech and Language
Technologies for Dravidian Languages, Association for Computational Linguistics, Kyiv,
2021, pp. 119–125. URL: https://aclanthology.org/2021.dravidianlangtech-1.15.
[10] A. Hande, K. Puranik, K. Yasaswini, R. Priyadharshini, S. Thavareesan, A. Sampath,
K. Shanmugavadivel, D. Thenmozhi, B. R. Chakravarthi, Ofensive language
identification in low-resourced code-mixed Dravidian languages using pseudo-labeling, 2021.
arXiv:2108.12177.
[11] B. R. Chakravarthi, R. Priyadharshini, N. Jose, A. Kumar M, T. Mandl, P. K. Kumaresan,
R. Ponnusamy, H. R L, J. P. McCrae, E. Sherly, Findings of the shared task on ofensive
language identification in Tamil, Malayalam, and Kannada, in: Proceedings of the First
Workshop on Speech and Language Technologies for Dravidian Languages, Association
for Computational Linguistics, Kyiv, 2021, pp. 133–145. URL: https://aclanthology.org/2021.
dravidianlangtech-1.17.
[12] B. Liu, et al., Sentiment analysis and subjectivity., Handbook of natural language processing
2 (2010) 627–666.
[13] R. Feldman, Techniques and applications for sentiment analysis, Communications of the</p>
      <p>ACM 56 (2013) 82–89.
[14] X. Ouyang, P. Zhou, C. H. Li, L. Liu, Sentiment analysis using convolutional neural
network, in: 2015 IEEE international conference on computer and information technology;
ubiquitous computing and communications; dependable, autonomic and secure computing;
pervasive intelligence and computing, IEEE, 2015, pp. 2359–2364.
[15] W. Li, L. Zhu, Y. Shi, K. Guo, E. Cambria, User reviews: Sentiment analysis using lexicon
integrated two-channel cnn–lstm family models, Applied Soft Computing 94 (2020) 106435.
[16] A. H. Ombabi, W. Ouarda, A. M. Alimi, Deep learning cnn–lstm framework for arabic
sentiment analysis using textual information shared in social networks, Social Network
Analysis and Mining 10 (2020) 1–13.
[17] B. R. Chakravarthi, V. Muralidaran, R. Priyadharshini, J. P. McCrae, Corpus
creation for sentiment analysis in code-mixed Tamil-English text, in: Proceedings of the
1st Joint Workshop on Spoken Language Technologies for Under-resourced languages
(SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL),
European Language Resources association, Marseille, France, 2020, pp. 202–210. URL:
https://aclanthology.org/2020.sltu-1.28.
[18] A. Hande, R. Priyadharshini, B. R. Chakravarthi, KanCMD: Kannada CodeMixed dataset
for sentiment analysis and ofensive language detection, in: Proceedings of the Third
Workshop on Computational Modeling of People’s Opinions, Personality, and Emotion’s
in Social Media, Association for Computational Linguistics, Barcelona, Spain (Online),
2020, pp. 54–63. URL: https://aclanthology.org/2020.peoples-1.6.
[19] D. Tripathi, D. R. Edla, R. Cheruku, V. Kuppili, A novel hybrid credit scoring model based
on ensemble feature selection and multilayer ensemble classification, Computational
Intelligence 35 (2019) 371–394.
[20] P. K. Roy, A. K. Tripathy, T. K. Das, X.-Z. Gao, A framework for hate speech detection
using deep convolutional neural network, IEEE Access 8 (2020) 204951–204962.
[21] P. K. Roy, Multilayer convolutional neural network to filter low quality content from
quora, Neural Processing Letters 52 (2020) 805–821.
[22] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[23] A. S. Maiya, ktrain: A low-code library for augmented machine learning, arXiv preprint
arXiv:2004.10703 (2020).
[24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I.
Polosukhin, Attention is all you need, in: Advances in neural information processing systems,
2017, pp. 5998–6008.
[25] P. K. Roy, Deep neural network to predict answer votes on community question answering
sites, Neural Processing Letters 53 (2021) 1633–1646.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saumya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Nitp-ai-nlp@ Dravidian-codemix-fire2020: A hybrid cnn and bi-lstm network for sentiment analysis of Dravidian code-mixed social media posts</article-title>
          .,
          <source>in: FIRE (Working Notes)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>582</fpage>
          -
          <lpage>590</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muralidaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jose</surname>
          </string-name>
          , E. Sherly,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Overview of the track on sentiment analysis for Dravidian languages in code-mixed text</article-title>
          ,
          <source>in: Forum for Information Retrieval Evaluation</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chinnappa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Durairaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Vasantharajan</surname>
          </string-name>
          ,
          <source>Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text</source>
          <year>2021</year>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          , N. Jose,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>A sentiment analysis dataset for code-mixed Malayalam-English, in: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), European Language Resources association</article-title>
          , Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>184</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .sltu-
          <volume>1</volume>
          .
          <fpage>25</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Saumya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Detection of spam reviews: a sentiment analysis approach</article-title>
          ,
          <source>Csi Transactions on ICT 6</source>
          (
          <year>2018</year>
          )
          <fpage>137</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Saumya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <article-title>Iiit_dwd@ lt-edi-eacl2021: Hope speech detection in youtube multilingual comments</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sakuntharaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Madasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          , P. B,
          <string-name>
            <given-names>S. Chinnaudayar</given-names>
            <surname>Navaneethakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC-DravidianCodeMix Shared Task on Ofensive Language Detection in Tamil and Malayalam</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chinnappa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Thenmozhi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <article-title>Overview of the DravidianCodeMix 2021 Shared Task on Sentiment Detection in Tamil, Malayalam, and Kannada, in: Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>FIRE</surname>
          </string-name>
          <year>2021</year>
          ,
          <article-title>Association for Computing Machinery</article-title>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>