<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Is Meta Embedding better than pre-trained word embedding to perform Sentiment Analysis for Dravidian Languages in Code-Mixed Text?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Supriya Chanda</string-name>
          <email>supriyachanda.rs.cse18@itbhu.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rajat Pratap Singh</string-name>
          <email>rajatp.singh.cd.che19@itbhu.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sukomal Pal</string-name>
          <email>spal.cse@itbhu.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Technology (BHU)</institution>
          ,
          <addr-line>Varanasi</addr-line>
          ,
          <country country="IN">INDIA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the IRlab@IITBHU system for the Dravidian-CodeMix - FIRE 2021: Sentiment Analysis for Dravidian Languages pairs Tamil-English (TA-EN), Kannada-English (KN-EN), and MalayalamEnglish (ML-EN) in Code-Mixed text. We have reported three models output in this paper where We have submitted only one model for sentiment analysis of all code-mixed datasets. Run-1 was obtained from the FastText embedding with multi-head attention, Run-2 used the meta embedding techniques, and Run-3 used the Multilingual BERT(mBERT) model for producing the results. Run-2 outperformed Run-1 and Run-3 for all the language pairs.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Code Mixed</kwd>
        <kwd>Kannada</kwd>
        <kwd>Malayalam</kwd>
        <kwd>Tamil</kwd>
        <kwd>BERT</kwd>
        <kwd>fastText</kwd>
        <kwd>Sentiment Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>QA, NLI and Machine Translation. Understanding code-switched communication will help
large corporations better target their advertising. Understanding genuine user feedback on
product features aids in the development of future versions. Ignoring one language in favor
of another or completely ignoring code-switched languages can lead to incorrect conclusions
about user sentiment.</p>
      <p>It’s an essential study topic in natural language processing to understand how people feel
about things. Code Mixed language writings have become increasingly frequent in media
communication with the rise of social media. When analyzing a text, sentence, or paragraph,
sentiment analysis is the act of determining the sentiments, such as emotions and afectionate to
others.</p>
      <p>
        GLUECoS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] - an evaluation benchmark for code-mixed text - and LinCE [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] - a centralised
benchmark for 10 corpora including four diferent code-switched language pairings and four
tasks - have been conducted in this direction. In the past, code-switching workshops held in
conjunction with major NLP conferences have included shared tasks. The first and second
workshops on Computational Approaches to Code Switching shared a problem on Language
Identification 1 for numerous language pairs (Nepalese-English, Spanish-English,
MandarinEnglish and Modern Standard Arabic-Arabic dialects). Another goal was to identify named
entities2 in the English-Spanish and Modern Standard Arabic-Egyptiac arabic language pairs,
which was done in a shared task during the third workshop. Another shared assignment was
Machine Translation3 for many language combinations, which took place in the fourth
workshop.
      </p>
      <p>
        A number of code-switching tasks have been carried out by the Forum for Information
Retrieval Evaluation (FIRE). Code-mixed entity extraction, POS tagging for code-mixed Indian
social media (ICON 2016), sentiment analysis for code-mixed Indian languages [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] (ICON 2017),
and the Code-Mixed Question Answering Challenge are just a few examples of the tasks. There
was a competition for Sentiment Analysis in Code-Switched Data (Task 9: Sentiment Analysis
for Code-Mixed Social Media Text [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), which covered tweets in both Spanish-English and
HindiEnglish pairs.
      </p>
      <p>
        The shared task [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] here aims to identify the sentiment polarity of the code-mixed data
of YouTube comments in Dravidian Language pairs (Malayalam-English, Tamil-English, and
Kannada-English) collected from social media. A new dataset has been included in this year’s
shared work for the second consecutive year. Like last year, we’ll have to categorize the
text into five diferent categories: Positive, Negative, Mixed_feelings, unknown_state and
not&lt;language&gt;4. To solve the above task, we clean the comments, construct a representation of
comments with diferent word embedding methods, and then build the classification model.
All of the models’ test data findings are included in this report.
      </p>
      <p>The rest of the paper is organized as follows. Section 2 describes the dataset, pre-processing
and processing techniques. Model architecture are described in Section 3. In Section 4, we
report our results and analysis. Finally we conclude in Section 5.</p>
      <p>1http://emnlp2014.org/workshops/CodeSwitch/call.html
2https://code-switching.github.io/2018/#shared-task-id
3https://code-switching.github.io/2021
4The language might be Tamil, Kannada or Malayalam</p>
    </sec>
    <sec id="sec-2">
      <title>2. System Description</title>
      <sec id="sec-2-1">
        <title>2.1. Datasets</title>
        <p>
          The Dravidian-CodeMix shared task5 organizers provided a dataset for Training,
Development and test. The training dataset consists of 35,656 Tamil-English [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and 6,212
KannadaEnglish [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and 15,880 Malayalam-English [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] YouTube video comments. The statistics of
training, development, and test data corpus collection and their class distribution are shown in Table
1. The details of the dataset and benchmark results are given in overview [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and findings [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
of the Sentiment Analysis of Dravidian Languages. The dataset provided sufers from general
problems of social media data, particularly code-mixed data. The sentences are short with lack
of well-defined grammatical structures, and many spelling mistakes.
        </p>
        <p>5https://dravidian-codemix.github.io/2021/index.html</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Data Pre-processing</title>
        <p>
          The YouTube comment dataset used in this work is already labelled into five categories:
Positive, Negative, Mixed_feelings, unknown_state and not-&lt;language&gt;6. Our pre-processing of
comments includes the following steps:
• In the previously shared task report [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], we have seen that removing contiguous
repeating characters does not give any significant performance changes. That’s why this year,
we didn’t perform any removal of adjacent repeating characters.
• Removal of exclamations and other punctuation
• Removal of non-ASCII characters, all the emoticons, symbols, numbers, special
characters.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Word Embedding</title>
        <p>
          Word embedding is arguably the most widely known technology in the recent history of NLP.
It captures the semantic property of a word. We use bert-base-multilingual-cased
pretrained models7, FastText [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and TF-iDF [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] to get a vector as an embedding for the
sentence that we can use for classification.
        </p>
        <p>
          • fastText: fastText, developed by Facebook, combines certain concepts introduced by
the NLP and ML communities, representing sentences with a bag-of-words and n-grams
using subword information and sharing them across classes through a hidden
representation. fastText[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] can learn vector representations of out-of-vocabulary words, which
is useful for our dataset that contains Malayalam and Tamil words in Roman script.
• mBERT: A transformer architecture is an encoder-decoder network that uses self-attention
on the encoder side and attention on the decoder side. The models are pre-trained on
large text corpora such as Wikipedia and produce state-of-the-art results with necessary
ifne-tuning on several downstream tasks. The contextual language representation model
BERT (Bidirectional Encoder Representations from Transformers) has been used for the
downstream task of code-mixed language identification. Multilingual BERT or mBERT
(bert-base-multilingual-cased8) is pre-trained on cased text in the top 104
languages with the largest wikipedias and has a total 179M parameters with 12 transformers
blocks, 768 hidden layers and 12 attention head. This model takes a special [CLS] token
as input first, followed by a sequence of words as input. It then passes the input to the
next layer. [CLS] here stands for Classification. Each layer applies self-attention, passes
the result through a feedforward network to the next encoder.
6The language might be Tamil, Kannada or Malayalam
7https://huggingface.co/transformers/pretrained_models.html
8https://github.com/google-research/bert/blob/master/multilingual.md
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Model Architecture</title>
      <p>In this section, we summarise the modules that make up our model. The text input is first
tokenized using a language-independent subword tokenizer and SentencePiece. It performs the
subword segmentation supporting the byte-pair-encoding (BPE) algorithm and unigram
language model. Then it converts this text into an id sequence to guarantee perfect reproducibility
of the normalization and subword segmentation. The proposed model uses the fastText
embeddings to represent the vectors for the tokenized text as input. The main objective of the
fastText embeddings is to consider the internal structure of words instead of learning word
representations. Because of this, morphologically rich languages can learn their word
representations independently. Instead of learning vectors for words directly, fastText represents
each word as an n-gram of characters. This ensures that the words love, loved, and beloved
all have similar vector representations, even if they appear in diferent contexts. This feature
enhances learning on heavily inflected languages. A skip-gram model is trained to understand
the embeddings once the word has been represented using character n-grams.</p>
      <p>
        Attention-based models have been used in various topics, including sentiment analysis [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
In [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], the author has devised an architecture that improves and performs well beyond the
baseline using the Multi-Head attention mechanism. Moving in the same direction, we use a
Multi-Head Attention-based transformer encoder to get attention-aware context vectors for
the sentences. We add positional encoding to the word embedding vector before the first
selfattention layer to retain the notion of order. Self-attention enables us to find correlations
between diferent input words, indicating the syntactic and contextual structure of the sentence.
The encoded vectors now having eficacy on word-level are then passed from a bi-LSTM layer.
A classifier layer is used to predict the sentiment label of the input based on the output hidden
representations of the bi-LSTM layer.
      </p>
      <p>In our revised approach, we formulate meta-embedding by concatenating tf-idf vectors of
the tokenized texts and the hidden representations of the bi-LSTM layer. TF-IDF gives us a way
to associate each word in a document with a number that represents how relevant each word is
in that document. Then, documents with similar, appropriate words will have similar vectors.
The Meta embeddings form a more semantically and syntactically efective representation of
the input text, thus improving the score significantly. For the hyper-parameters,we considered
5 training rounds, a batch size of 16, learning rate of 5e-4 along with a dropout value of 0.1.
fastText embedding of 300D are trained over 15 iterations and we built one bi-LSTM layer of
hidden dimension size 256.</p>
      <p>After evaluating the mBERT model on validation data, the hyper-parameter were set. We
have used the following hyper-parameters: batch size = 32, learning rate = 2e-5,
optimizer = AdamW, epochs = 4.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Analysis</title>
      <p>In our submission, we considered a significantly large number of epochs as compared to the
updated version, from 15 to 5 which resulted in the model over-fitting for the task. We also
considered to ignore the PAD_IDX for Cross Entropy Loss which resulted in the model not
converging and failing to predict certain labels at all. In the updated approach, we also used
only one layer for bi-LSTM network to avoid over-fitting. As seen and evident from the results,
the model and the proposed architecture performs significantly better producing competitive
scores for the task of sentiment analysis of code-mixed data.</p>
      <p>Multilingual BERT based experiments was performed on Google’s Colab9. PyTorch deep
learning library has been used to implement the models. We also use HuggingFace’s
transformers to fine-tune pre-trained mBERT models. A Macro  1 score was used to evaluate every
system. Macro  1 score of the overall system was the average of  1 scores of the individual
classes. Table 2, Table 3 and Table 4 shows our oficial and unoficial performances as shared
by the organizers vis-a-vis the best performing team for Tamil, Kananda and Malayalam
language pair respectively. Table 5, Table 6 and Table 7 report the individual classwise Precision,
Recall and  1 score on Tamil-English, Kannada-English and Malayalam-English corpus
respectively.</p>
      <p>We used the confusion matrix for additional analysis (See Fig. 2). When describing the
performance of a classification model on test data for which the true values are known, confusion
matrix tables are often used.</p>
      <p>we could not verify the accuracy of the labelling because we do not understand Tamil,
Kannada or Malayalam. All model performed well in positive class follow by not_&lt;language&gt; class.
The reason behind this is the imbalanced data in corpus. For our first run that we have
submitted for evaluation, the model cannot classify unknown_state class on both Tamil-English and
Malayalam-English dataset and missed Negative class on Kannada-English dataset.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This study reports performance of our system for the shared task on Sentiment Analysis for
Dravidian Languages in Code-Mixed Text in Dravidian-CodeMix - FIRE 2021. We conducted a
(a) FastText model
(b) Meta Embedding model
(c) mBERT model
(d) FastText model
(e) Meta Embedding model
(f) mBERT model
(g) FastText model
(h) Meta Embedding model
(i) mBERT model
Precision, recall,  1-scores, and support for all experiment on Kannada-English test data
of improvement for the labels classified as not_Language as we suggest to add word language
specific embedding to the text vectors and can consider several other methods for the future</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Khanuja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dandapat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sitaram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Choudhury</surname>
          </string-name>
          ,
          <string-name>
            <surname>Gluecos :</surname>
          </string-name>
          <article-title>An evaluation benchmark for code-switched nlp</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2004</year>
          .12376.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Aguilar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kar</surname>
          </string-name>
          , T. Solorio,
          <article-title>LinCE: A Centralized Benchmark for Linguistic Codeswitching Evaluation</article-title>
          ,
          <source>in: Proceedings of The 12th Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>1803</fpage>
          -
          <lpage>1813</lpage>
          . URL: https://www.aclweb.org/anthology/2020.lrec-
          <volume>1</volume>
          .
          <fpage>223</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Patra</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
          </string-name>
          ,
          <source>Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL Code-Mixed Shared Task @ICON-2017</source>
          ,
          <year>2018</year>
          . arXiv:
          <year>1803</year>
          .06745.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Patwa</surname>
          </string-name>
          , G. Aguilar,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. PYKL</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gambäck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Solorio</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
          </string-name>
          , Semeval
          <article-title>-2020 task 9: Overview of sentiment analysis of code-mixed tweets</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2008</year>
          .04277.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muralidaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          , N. Jose,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <surname>Elizabeth</surname>
            <given-names>McCrae</given-names>
          </string-name>
          ,
          <article-title>Overview of the track on Sentiment Analysis for Davidian Languages in Code-Mixed Text</article-title>
          ,
          <source>in: Proceedings of the 12th Forum for Information Retrieval Evaluation</source>
          ,
          <source>FIRE '20</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muralidaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Corpus creation for sentiment analysis in code-mixed Tamil-English text</article-title>
          ,
          <source>in: Proceedings of the 1st Joint Workshop on Spoken Language Technologies</source>
          for
          <article-title>Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), European Language Resources association</article-title>
          , Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>202</fpage>
          -
          <lpage>210</lpage>
          . URL: https://www. aclweb.org/anthology/2020.sltu-
          <volume>1</volume>
          .
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>KanCMD: Kannada CodeMixed dataset for sentiment analysis and ofensive language detection</article-title>
          ,
          <source>in: Proceedings of the Third Workshop on Computational Modeling of People's Opinions</source>
          , Personality, and
          <article-title>Emotion's in Social Media, Association for Computational Linguistics</article-title>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>63</lpage>
          . URL: https://www.aclweb.org/anthology/2020.peoples-
          <volume>1</volume>
          .6.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          , N. Jose,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>A sentiment analysis dataset for code-mixed Malayalam-English, in: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), European Language Resources association</article-title>
          , Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>184</lpage>
          . URL: https://www.aclweb. org/anthology/2020.sltu-
          <volume>1</volume>
          .
          <fpage>25</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chinnappa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Durairaj</surname>
          </string-name>
          , E. Sherly,
          <article-title>Overview of the dravidiancodemix 2021 shared task on sentiment detection in tamil, malayalam, and kannada, in: Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>FIRE</surname>
          </string-name>
          <year>2021</year>
          ,
          <article-title>Association for Computing Machinery</article-title>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chinnappa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Thenmozhi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Vasantharajan</surname>
          </string-name>
          ,
          <article-title>Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Irlab@ iitbhu@ dravidian-codemix-fire2020: Sentiment analysis for dravidian languages in code-mixed text (</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics 5 (</article-title>
          <year>2017</year>
          )
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          . URL: https://aclanthology.org/Q17-1010. doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00051</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Aizawa</surname>
          </string-name>
          ,
          <article-title>An information-theoretic perspective of tf-idf measures</article-title>
          ,
          <source>Information Processing Management</source>
          <volume>39</volume>
          (
          <year>2003</year>
          )
          <fpage>45</fpage>
          -
          <lpage>65</lpage>
          . URL: https://www.sciencedirect.com/science/article/ pii/S0306457302000213. doi:https://doi.org/10.1016/S0306-
          <volume>4573</volume>
          (
          <issue>02</issue>
          )
          <fpage>00021</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Puhrsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          ,
          <article-title>Advances in Pre-Training Distributed Word Representations</article-title>
          ,
          <source>in: Proceedings of the International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ),
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Attention-based LSTM for aspect-level sentiment classification</article-title>
          ,
          <source>in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Austin, Texas,
          <year>2016</year>
          , pp.
          <fpage>606</fpage>
          -
          <lpage>615</lpage>
          . URL: https://aclanthology.org/D16-1058. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D16</fpage>
          -1058.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <article-title>Gao, Multi-head attention model for aspect level sentiment analysis</article-title>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .3233/JIFS-179383.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>