<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrival Evaluation, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Karthik Puranik</string-name>
          <email>karthikp18c@iiitt.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bharathi B</string-name>
          <email>bharathib@ssn.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Senthil Kumar B</string-name>
          <email>senthil@ssn.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Transformers, Transliteration, Machine Translation, Sentiment analysis</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science and Engineering, SSN College of Engineering</institution>
          ,
          <addr-line>Chennai</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Indian Institute of Information Technology Tiruchirappalli</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>1</volume>
      <fpage>3</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>Sentiment analysis of social media posts and comments for various marketing and emotional purposes is gaining recognition. With the increasing presence of code-mixed content in various native languages, there is a need for ardent research to produce promising results. This research paper bestows a tiny contribution to this research in the form of sentiment analysis of code-mixed social media comments in the popular Dravidian languages Kannada, Tamil and Malayalam. It describes the work for the shared task conducted by Dravidian-CodeMix at FIRE 2021 by employing pre-trained models like ULMFiT and multilingual BERT fine-tuned on the code-mixed dataset, transliteration (TRAI) of the same, English translations (TRAA) of the TRAI data and the combination of all the three. The results are recorded in this research paper where the best models stood 4th, 5th and 10th ranks in the Tamil, Kannada and</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Sentiment analysis is a popular technique for analysing and evaluating textual content to learn
the attitude and thoughts expressed in it [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The term “Sentiment analysis” was first introduced
in Nasukawa and Yi. This method is largely employed in the marketing sector to realize the
opinion of the customers on a certain product without reading all the feedbacks. Natural language
processing (NLP) truly automates the wearisome tasks like analysing feedbacks. Several other
tasks like sentiment classification, sentiment extraction, opinion summary, and subjectivity
detection can also be performed [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for various applications like spam email detection[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], fake
news detection [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], hate and hope speech detection [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], finding inappropriate texts in social
media [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] and many others [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]. This paper concentrates on the sentiment analysis of
code-mixed social media comments for Dravidian languages.
      </p>
      <p>
        Social media is known to us as a virtual space to share our opinions, and communicate.
However, social media is the largest hub for marketing. They provide spaces for brands to
advertise products and target interested customers, which is the prime source of income for
these platforms [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In order to market the right product which appeals to its user, the social
media platforms monitor their activities and comments[
        <xref ref-type="bibr" rid="ref12">12, 13</xref>
        ]. This enables them to know the
user’s sentiment towards a product and company [14]. Another crucial application of sentiment
analysis is to automatically spot comments or posts which are ofensive, abusive or spreads
hatred in the social media platforms [15]. Social media is a free space and no restrictions can be
imposed on the comments or posts being circulated. However, the comments can certainly be
detected and overseen to protect underage and the users who are vulnerable to get ofended
[16, 17].
      </p>
      <p>Social media features multilingual speakers from all over the world, and people tend to use a
lot of variations while expressing their thoughts[18]. Native speakers writing in Roman script
is the most common scene due to the easy accessibility and customary usage of Roman script
keyboards in mobile phones and desktop keyboards[19]. However, some users tend to write
in the native script too. Finally, there is a case of code-mixing where two or more languages
are merged in respect to the script or the usage [20]. Sentiment analysis becomes dificult for
such texts. In this paper, the method of transliterating the text is applied. Transliterating is the
process of converting a text from one script to another while maintaining the pronunciation
[21, 22]. This brings about a uniformity in the text and helps the model learn better. However,
due to the presence of English text in the code-mixed dataset, there has also been a slight efort
to translate [23] the text transliterated in the native language to English and train the model
with it.</p>
      <p>This research paper depicts our work for the shared task Dravidian-CodeMix1 at FIRE 2021
[24, 25]. The task was to detect the sentiment in the sentences for three of the major Dravidian
languages [26] Kannada, Tamil, and Malayalam. Our system models stood 5th, 4th and 10th
respectively in the shared task. The codes for the model and the transliterated and translated
datasets are provided in the link2.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset</title>
      <p>
        The dataset provided by the organizers of the shared task has been used to train the models
[
        <xref ref-type="bibr" rid="ref12">12, 13, 27, 28</xref>
        ]. It contains annotated sentences obtained by cleaning YouTube comments3.
The sentences are highly code-mixed and contains inter-sentimental, intra-sentimental and
tag switching which are prevalent in code-mixed data to be classified into five classes namely,
positive, negative, unknown state, mixed feelings and not the intended language. The train,
development and test distribution can be viewed in Table 1.
      </p>
      <p>Split
Training
Development
Test
Total
1https://dravidian-codemix.github.io/2021/index.html
2https://github.com/karthikpuranik11/FIRE2021
3https://www.youtube.com/</p>
      <p>Further, the transliterations (TRAI) of the code-mixed training (TRA) dataset in the respective
Dravidian languages were used. Small preprocessing steps like removing the language tag
and brackets, removing all the sentences which belong to “not-language” were removed. The
English translations (TRAA) of these transliterations was also used as a part of this research. It
was evident that English was the most widely used language after the Dravidian language. A
few comments represented in the TRA, TRAI and TRAA datasets belonging to the five classes
are tabulated in Table 2.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Based on previous researches, two of the most promising pre-trained models, ULMFiT [29]
and BERT [30] with bidirectional LSTM layers [31], were used to determine the sentiment
of the sentences. These models were fine-tuned separately on the training data provided by
the organizers, the transliterated data combined with the training data, the translated data
combined with the training data and the combination of all the three datasets.
3.1. BERT
Bidirectional Encoder Representation from Transformers (BERT) is one of the most popular
transformer based models, trained extensively on the entire Wikipedia and 0.11 million
WordPiece sentences [32] for over 104 languages in the world. The unprecedented methods like
Next Sentence Prediction (NSP) and Masked Language Modelling (MLM) successfully catch a
deeper context of the languages. For the particular task, bert-base-multilingual-cased [33]
from HuggingFace4 [34] has been used. It comprises 12 layers and attention heads and about
110M parameters.</p>
      <p>This model was further concatenated with bidirectional LSTM layers, which are known to
improve the information being fed. The bidirectional layers read the embeddings from both the
directions, hence, boosts the context and the F1 scores drastically. Further, the training was
done with an Adam optimizer [35], a learning rate of 2e-5 with the cross-entropy loss function
[36, 37] for a total of 5 epochs. The various parameters employed in the BERT+ BiLSTM model
van be viewed in Table 3.</p>
      <sec id="sec-3-1">
        <title>3.2. ULMFiT</title>
        <p>Universal Language Model Fine-tuning, or ULMFiT was one of the initial transfer learning
method to produce state-of-the-art results for NLP tasks. It was trained on very huge datasets
like Wikitext-1035 with around 103M sentences. It employs three novel techniques for
finetuning the language models for various NLP tasks, which are discriminative fine-tuning, slanted
triangular learning rates (STLR) and gradual unfreezing. AWD-LSTM language model [38, 39],
a standard LSTM consisting 3 layers and 1150 hidden activation per layer and an embedding
size of 400 and without any attentions and just well tune dropouts, is generally used. Adam
4https://huggingface.co/
5https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/
optimizer with starting learning rate of 1e-8 and an end learning rate of 1e-2 and a dropout of
0.5 is used.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Transliteration</title>
        <p>The IndianNLP-Transliteration6 tool from AI4Bharat was used to get the transliterations of the
training dataset. This deep transliteration tool can transliterate from Roman script to any low
resourced Indian language. The architecture majorly consists of Recurrent Neural Networks
(RNN) [40] with encoders and decoders [41]. The decoder employs top ’k’ predictions and
then re-ranked to get the most probable word [42]. It is observed that most of the sentences in
the Dravidian language present in the code-mixed dataset is the languages written in Roman
script. The multilingual pre-trained models, largely trained on these Dravidian languages in
their original scripts, might find it hard to comprehend such sentences. Transliterating them
back to the original script could possibly improve the accuracy [43].</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.4. Translation</title>
        <p>The transliterated data in the Dravidian language is translated to English using IndicTrans [44]
from AI4Bharat7. This PyTorch Fairseq8 [45, 46] based Transformer NMT model, is trained on
a large parallel corpus containing 46.9 million sentences of Samanantar dataset. The model is
known to produce state-of-the-art BLEU [47] scores for 11 Indian languages. The translations
given by the IndicTrans baseline model on the transliterated dataset was used. The reason for
using the translated data was due to the presence of excessive English in the code-mixed dataset,
and most of the pre-trained models are trained on large number of English sentences.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>In this section, the F1 scores of the BERT and ULMFiT models for the sentiment analysis of
Kannada, Tamil and Malayalam datasets are compared, and suitable analysis are recorded. The
weighted F1 scores are tabulated in Table 4. The models are fine-tuned on training dataset
6https://github.com/AI4Bharat/IndianNLP-Transliteration
7https://github.com/AI4Bharat/indicTrans
8https://github.com/pytorch/fairseq
Dataset
Train (TRA)
Transliterate + TRA (TRAI)
Translate + TRA (TRAA)
Merged (TRA+TRAI+TRAA)
Train (TRA)
Transliterate + TRA (TRAI)
Translate + TRA (TRAA)
Merged (TRA+TRAI+TRAA)
(TRA), the combination of transliterated dataset and TRA (TRAI), translated (TRAA) dataset
and TRA and all 3 merged.</p>
      <p>It is firstly clear from Table 4 that ULMFiT manages to get better F1 scores than BERT
concatenated with biLSTM layers for the majority of the datasets. The unique transfer learning
techniques used by ULMFiT like the discriminative fine-tuning, slanted triangular learning rates
and gradual unfreezing seem to successfully produce exceptional F1 scores. Discriminative
ifne-tuning allows us to fine-tune each layer separately with diferent learning rates. Gradual
unfreezing improves it further by keeping the last layer frozen in the first epoch and unfreezing
layer by layer for the further epochs. Except for the Tamil data, BERT manages to give results
comparable to ULMFiT for the other languages.</p>
      <p>ULMFiT fine-tuned on the TRA dataset gives the best F1-score of 0.639 for the Kannada task.
It is followed by BERT fine-tuned on the TRAA dataset with 0.623. Other models gave similar
results. It is surprising how the models managed to give F1 scores akin to other languages,
considering the limited size of the dataset. ULMFiT manages to surpass BERT by a huge
diference for the Tamil task. The presence of class imbalances in the Tamil dataset could be a
reason for this issue. The “positive” comments are 2,830 in number out of the 4,402 sentences in
the test dataset, while “not-Tamil” which is just 210. This imbalance causes a variation in the
results. ULMFiT on TRAI and TRAA gave nearly similar F1 scores of 0.658 and 0.651 respectively.
ULMFiT trained on all the four datasets gave equivalent results for the Malayalam task, with
TRAI giving the best score of 0.706. BERT trained on TRAI gave a competitive score of 0.6933
for the same task.</p>
      <p>The basic observation derived while comparing the various datasets is the equal contention
between the four datasets used. But, the most common scenario is that the TRAI dataset
manages to have the upper hand in the majority of the cases. The most plausible explanation to
this is due to the fact that the dataset is code-mixed and the Dravidian text written in Roman
script. When that is converted to the native script, the model manages to fine-tune well. With
the original data also present, the model manages to fine-tune on the English text too. However,
we can’t be entirely sure of the accuracy of transliterations from the IndianNLP-Transliteration
tool. Another drawback of transliterating the code-mixed sentences is that the English and other
language also get transliterated to the Kannada/Tamil/Malayalam script. Such words might
not be able to recognized by the model at all. Dravidian languages can be complex and there
might be several ways in which the comments in the Roman script can be transliterated, and a
slight variation can change the meaning entirely[48]. However, to tackle these, we merge the
transliterated dataset with the TRA data so that the model manages to learn the other languages
in the code-mixed data too.</p>
      <p>The TRAA and the merged dataset proves to be ineficient due to its low F1 scores. The
TRAA dataset is not significantly behind the TRAI data, which proves that there is a scope to
increase the accuracy with further research. Though IndicTrans one of the best models for
machine translation of Indian languages has been employed, we can surely not rely entirely
on the translations of the transliterated data. Further, cleaning of the TRAA data by removing
sentences which fail to make any sense and fine-tuning the IndicTrans model on a suitable
parallel corpus before translating it can be done to obtain good F1 scores for the TRAA dataset.
The combination of the three datasets however fails miserably in most of the cases due to the
repetition of the sentences in diferent forms, which seems to make the model not learning
anything productively, and the inaccuracies in the TRAA and TRAI datasets add up to reduce
the F1 scores even further.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Sentiment analysis of social media comments emerges as one of the most notable tasks of
natural language processing (NLP). In order to obtain good F1 scores for the sentiment analysis
of social media comments in code-mixed Dravidian languages Kannada, Tamil and Malayalam,
after careful experimentation with Transformer based ULMFiT and mBERT fine-tuned on TRA,
TRAI, TRAA and merged dataset, ULMFiT proved to give the best F1 scores for all the three
languages. For Kannada, it was on the TRA dataset, while TRAI proved efective for Tamil and
Malayalam. This paper introduces the use of TRAA dataset which can be worked upon in the
future.
[13] B. R. Chakravarthi, V. Muralidaran, R. Priyadharshini, J. P. McCrae, Corpus creation for
sentiment analysis in code-mixed Tamil-English text, in: Proceedings of the 1st Joint
Workshop on Spoken Language Technologies for Under-resourced languages (SLTU)
and Collaboration and Computing for Under-Resourced Languages (CCURL), European
Language Resources association, Marseille, France, 2020, pp. 202–210. URL: https://www.
aclweb.org/anthology/2020.sltu-1.28.
[14] F. Neri, C. Aliprandi, F. Capeci, M. Cuadros, T. By, Sentiment analysis on social media, in:
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining, 2012, pp. 919–926. doi:1 0 . 1 1 0 9 / A S O N A M . 2 0 1 2 . 1 6 4 .
[15] B. R. Chakravarthi, Hopeedi: A multilingual hope speech detection dataset for equality,
diversity, and inclusion, in: Proceedings of the Third Workshop on Computational
Modeling of People’s Opinions, Personality, and Emotion’s in Social Media, 2020, pp.
41–53.
[16] Y. Chen, Y. Zhou, S. Zhu, H. Xu, Detecting ofensive language in social media to protect
adolescent online safety, in: 2012 International Conference on Privacy, Security, Risk and
Trust and 2012 International Confernece on Social Computing, IEEE, 2012, pp. 71–80.
[17] S. U. Hegde, A. Hande, R. Priyadharshini, S. Thavareesan, R. Sakuntharaj, S. Thangasamy,
B. Bharathi, B. R. Chakravarthi, Do images really do the talking? analysing the significance
of images in tamil troll meme classification, 2021. a r X i v : 2 1 0 8 . 0 3 8 8 6 .
[18] U. Barman, A. Das, J. Wagner, J. Foster, Code mixing: A challenge for language
identification in the language of social media, in: Proceedings of the first workshop on
computational approaches to code switching, 2014, pp. 13–23.
[19] A. Hande, R. Priyadharshini, A. Sampath, K. P. Thamburaj, P. Chandran, B. R. Chakravarthi,</p>
      <p>Hope speech detection in under-resourced kannada language, 2021. a r X i v : 2 1 0 8 . 0 4 6 1 6 .
[20] S. Thara, P. Poornachandran, Code-mixing: A brief survey, in: 2018 International
Conference on Advances in Computing, Communications and Informatics (ICACCI), 2018,
pp. 2382–2388. doi:1 0 . 1 1 0 9 / I C A C C I . 2 0 1 8 . 8 5 5 4 4 1 3 .
[21] K. Regmi, J. Naidoo, P. Pilkington, Understanding the processes of translation and
transliteration in qualitative research, International Journal of Qualitative Methods 9 (2010)
16–26.
[22] P. Kalyan, D. Reddy, A. Hande, R. Priyadharshini, R. Sakuntharaj, B. R. Chakravarthi,
Iiitt at case 2021 task 1: Leveraging pretrained language models for multilingual protest
detection, in: CASE, 2021.
[23] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align
and translate, arXiv preprint arXiv:1409.0473 (2014).
[24] B. R. Chakravarthi, R. Priyadharshini, S. Thavareesan, D. Chinnappa, D. Thenmozhi,
E. Sherly, J. P. McCrae, A. Hande, R. Ponnusamy, S. Banerjee, C. Vasantharajan, Findings
of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text, in: Working Notes
of FIRE 2021 - Forum for Information Retrieval Evaluation, CEUR, 2021.
[25] R. Priyadharshini, B. R. Chakravarthi, S. Thavareesan, D. Chinnappa, D. Thenmozi,
E. Sherly, Overview of the dravidiancodemix 2021 shared task on sentiment detection
in tamil, malayalam, and kannada, in: Forum for Information Retrieval Evaluation, FIRE
2021, Association for Computing Machinery, 2021.
[26] B. Krishnamurti, The dravidian languages, Cambridge University Press, 2003.
[27] A. Hande, R. Priyadharshini, B. R. Chakravarthi, KanCMD: Kannada CodeMixed dataset
for sentiment analysis and ofensive language detection, in: Proceedings of the Third
Workshop on Computational Modeling of People’s Opinions, Personality, and Emotion’s
in Social Media, Association for Computational Linguistics, Barcelona, Spain (Online),
2020, pp. 54–63. URL: https://www.aclweb.org/anthology/2020.peoples-1.6.
[28] B. R. Chakravarthi, HopeEDI: A multilingual hope speech detection dataset for equality,
diversity, and inclusion, in: Proceedings of the Third Workshop on Computational
Modeling of People’s Opinions, Personality, and Emotion’s in Social Media, Association
for Computational Linguistics, Barcelona, Spain (Online), 2020, pp. 41–53. URL: https:
//aclanthology.org/2020.peoples-1.5.
[29] J. Howard, S. Ruder, Universal language model fine-tuning for text classification, 2018.</p>
      <p>a r X i v : 1 8 0 1 . 0 6 1 4 6 .
[30] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, 2019. a r X i v : 1 8 1 0 . 0 4 8 0 5 .
[31] J. P. C. Chiu, E. Nichols, Named entity recognition with bidirectional lstm-cnns, 2016.</p>
      <p>a r X i v : 1 5 1 1 . 0 8 3 0 8 .
[32] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao,
Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, Łukasz Kaiser, S. Gouws,
Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith,
J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, J. Dean, Google’s neural machine
translation system: Bridging the gap between human and machine translation, 2016.
a r X i v : 1 6 0 9 . 0 8 1 4 4 .
[33] T. Pires, E. Schlinger, D. Garrette, How multilingual is multilingual BERT?, in:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,
Association for Computational Linguistics, Florence, Italy, 2019, pp. 4996–5001. URL:
https://aclanthology.org/P19-1493. doi:1 0 . 1 8 6 5 3 / v 1 / P 1 9 - 1 4 9 3 .
[34] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao,
S. Gugger, M. Drame, Q. Lhoest, A. M. Rush, Huggingface’s transformers: State-of-the-art
natural language processing, 2020. a r X i v : 1 9 1 0 . 0 3 7 7 1 .
[35] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2017. a r X i v : 1 4 1 2 . 6 9 8 0 .
[36] Z. Zhang, M. R. Sabuncu, Generalized cross entropy loss for training deep neural networks
with noisy labels, 2018. a r X i v : 1 8 0 5 . 0 7 8 3 6 .
[37] A. F. Agarap, Deep learning using rectified linear units (relu), 2019. a r X i v : 1 8 0 3 . 0 8 3 7 5 .
[38] S. Merity, N. S. Keskar, R. Socher, Regularizing and optimizing lstm language models, 2017.</p>
      <p>a r X i v : 1 7 0 8 . 0 2 1 8 2 .
[39] A. Hande, K. Puranik, K. Yasaswini, R. Priyadharshini, S. Thavareesan, A. Sampath, K.
Shanmugavadivel, D. Thenmozhi, B. R. Chakravarthi, Ofensive language identification in
lowresourced code-mixed dravidian languages using pseudo-labeling, 2021. a r X i v : 2 1 0 8 . 1 2 1 7 7 .
[40] A. Sherstinsky, Fundamentals of recurrent neural network (rnn) and long short-term
memory (lstm) network, Physica D: Nonlinear Phenomena 404 (2020) 132306. URL:
http://dx.doi.org/10.1016/j.physd.2019.132306. doi:1 0 . 1 0 1 6 / j . p h y s d . 2 0 1 9 . 1 3 2 3 0 6 .
[41] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I.
Polosukhin, Attention is all you need, in: Advances in neural information processing systems,
2017, pp. 5998–6008.
[42] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align
and translate, 2016. a r X i v : 1 4 0 9 . 0 4 7 3 .
[43] M. Thomas, C. Latha, Sentimental analysis of transliterated text in malayalam using
recurrent neural networks, Journal of Ambient Intelligence and Humanized Computing
(2020) 1–8.
[44] G. Ramesh, S. Doddapaneni, A. Bheemaraj, M. Jobanputra, R. AK, A. Sharma, S. Sahoo,
H. Diddee, M. J, D. Kakwani, N. Kumar, A. Pradeep, K. Deepak, V. Raghavan, A.
Kunchukuttan, P. Kumar, M. S. Khapra, Samanantar: The largest publicly available parallel corpora
collection for 11 indic languages, 2021. a r X i v : 2 1 0 4 . 0 5 5 9 6 .
[45] M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, M. Auli, fairseq: A fast,
extensible toolkit for sequence modeling, 2019. a r X i v : 1 9 0 4 . 0 1 0 3 8 .
[46] K. Puranik, A. Hande, R. Priyadharshini, T. Durairaj, A. Sampath, K. Thamburaj, B. R.</p>
      <p>Chakravarthi, Attentive fine-tuning of transformers for translation of low-resourced
languages @loresmt 2021, 2021.
[47] K. Papineni, S. Roukos, T. Ward, W. J. Zhu, Bleu: a method for automatic evaluation of
machine translation (2002). doi:1 0 . 3 1 1 5 / 1 0 7 3 0 8 3 . 1 0 7 3 1 3 5 .
[48] A. Kumar, R. Cotterell, L. Padró, A. Oliver, Morphological analysis of the dravidian
language family, 2017, pp. 217–222. doi:1 0 . 1 8 6 5 3 / v 1 / E 1 7 - 2 0 3 5 .</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rambocas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gama</surname>
          </string-name>
          , Marketing Research:
          <article-title>The Role Of Sentiment Analysis</article-title>
          ,
          <source>FEP Working Papers</source>
          <volume>489</volume>
          ,
          <string-name>
            <surname>Universidade</surname>
          </string-name>
          do Porto, Faculdade de Economia do Porto,
          <year>2013</year>
          . URL: https: //ideas.repec.org/p/por/fepwps/489.html.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Nasukawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <article-title>Sentiment analysis: Capturing favorability using natural language processing</article-title>
          ,
          <year>2003</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>77</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 9 4 5 6 4 5 . 9 4 5 6 5 8 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Keith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fuentes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Meneses</surname>
          </string-name>
          ,
          <article-title>A hybrid approach for sentiment analysis applied to paper</article-title>
          ,
          <source>in: Proceedings of ACM SIGKDD Conference</source>
          , Halifax, Nova Scotia, Canada,
          <year>2017</year>
          , p.
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Anees</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shaikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shaikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaikh</surname>
          </string-name>
          ,
          <article-title>Survey paper on sentiment analysis: Techniques and challenges</article-title>
          ,
          <fpage>EasyChair2516</fpage>
          -
          <lpage>2314</lpage>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Puranik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Evaluating pretrained transformer-based models for covid-19 fake news detection</article-title>
          ,
          <source>in: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>766</fpage>
          -
          <lpage>772</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>0</volume>
          <fpage>9</fpage>
          <string-name>
            <surname>/ I C C M C 5</surname>
          </string-name>
          <volume>1 0 1 9 . 2 0 2 1 . 9 4 1 8 4 4 6 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Puranik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Iiitt@ltedi-eacl2021-hope speech detection: There is always hope in transformers</article-title>
          ,
          <year>2021</year>
          .
          <article-title>a r X i v : 2 1 0 4 . 0 9 0 6 6</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yasaswini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Puranik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          , IIITT@DravidianLangTech-EACL2021:
          <article-title>Transfer learning for ofensive language detection in Dravidian languages</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, Association for Computational Linguistics</source>
          , Kyiv,
          <year>2021</year>
          , pp.
          <fpage>187</fpage>
          -
          <lpage>194</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .dravidianlangtech-
          <volume>1</volume>
          .
          <fpage>25</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Jada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yasaswini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Prabakaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sampath</surname>
          </string-name>
          , S. Thangasamy, IIIT@
          <string-name>
            <surname>Dravidian-CodeMix-FIRE2021</surname>
          </string-name>
          :
          <article-title>Transformer Model based Sentiment Analysis in Dravidian Languages</article-title>
          , in: Working Notes of FIRE 2021 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Puranik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Domain identification of scientific articles using transfer learning and ensembles, in: Trends and Applications in Knowledge Discovery and Data Mining: PAKDD 2021 Workshops, WSPA, MLMEIN, SDPRA, DARAI, and</article-title>
          <string-name>
            <surname>AI4EPT</surname>
          </string-name>
          , Delhi, India, May
          <volume>11</volume>
          ,
          <source>2021 Proceedings 25</source>
          , Springer International Publishing,
          <year>2021</year>
          , pp.
          <fpage>88</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. U.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Benchmarking multi-task learning for sentiment analysis and ofensive language identification in under-resourced dravidian languages</article-title>
          ,
          <year>2021</year>
          .
          <article-title>a r X i v : 2 1 0 8 . 0 3 8 6 7</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Oikonomidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fouskas</surname>
          </string-name>
          , Is Social Media Paying Its Money?,
          <year>2019</year>
          , pp.
          <fpage>999</fpage>
          -
          <lpage>1006</lpage>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 0 3 0 - 1 2 4 5 3 - 3</volume>
          _
          <issue>1</issue>
          1
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          , N. Jose,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>A sentiment analysis dataset for code-mixed Malayalam-English, in: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), European Language Resources association</article-title>
          , Marseille, France,
          <year>2020</year>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>184</lpage>
          . URL: https://www.aclweb.org/anthology/ 2020.sltu-
          <volume>1</volume>
          .
          <fpage>25</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>