=Paper= {{Paper |id=Vol-3159/T6-7 |storemode=property |title=Transliterate or Translate? Sentiment Analysis of Code-Mixed Text in Dravidian Languages |pdfUrl=https://ceur-ws.org/Vol-3159/T6-7.pdf |volume=Vol-3159 |authors=Karthik Puranik,Bharathi B,Senthil Kumar B |dblpUrl=https://dblp.org/rec/conf/fire/PuranikBB21 }} ==Transliterate or Translate? Sentiment Analysis of Code-Mixed Text in Dravidian Languages== https://ceur-ws.org/Vol-3159/T6-7.pdf
Transliterate or translate? Sentiment analysis of
code-mixed text in Dravidian languages
Karthik Puranika , Bharathi Bb and Senthil Kumar Bb
a
    Indian Institute of Information Technology Tiruchirappalli
b
    Computer Science and Engineering, SSN College of Engineering, Chennai


                                         Abstract
                                         Sentiment analysis of social media posts and comments for various marketing and emotional purposes is
                                         gaining recognition. With the increasing presence of code-mixed content in various native languages,
                                         there is a need for ardent research to produce promising results. This research paper bestows a tiny
                                         contribution to this research in the form of sentiment analysis of code-mixed social media comments in
                                         the popular Dravidian languages Kannada, Tamil and Malayalam. It describes the work for the shared
                                         task conducted by Dravidian-CodeMix at FIRE 2021 by employing pre-trained models like ULMFiT and
                                         multilingual BERT fine-tuned on the code-mixed dataset, transliteration (TRAI) of the same, English
                                         translations (TRAA) of the TRAI data and the combination of all the three. The results are recorded
                                         in this research paper where the best models stood 4th, 5th and 10th ranks in the Tamil, Kannada and
                                         Malayalam tasks respectively.

                                         Keywords
                                         Transformers, Transliteration, Machine Translation, Sentiment analysis




1. Introduction
Sentiment analysis is a popular technique for analysing and evaluating textual content to learn
the attitude and thoughts expressed in it [1]. The term “Sentiment analysis” was first introduced
in Nasukawa and Yi. This method is largely employed in the marketing sector to realize the opin-
ion of the customers on a certain product without reading all the feedbacks. Natural language
processing (NLP) truly automates the wearisome tasks like analysing feedbacks. Several other
tasks like sentiment classification, sentiment extraction, opinion summary, and subjectivity
detection can also be performed [3] for various applications like spam email detection[4], fake
news detection [5], hate and hope speech detection [6], finding inappropriate texts in social
media [7, 8] and many others [9, 10]. This paper concentrates on the sentiment analysis of
code-mixed social media comments for Dravidian languages.
   Social media is known to us as a virtual space to share our opinions, and communicate.
However, social media is the largest hub for marketing. They provide spaces for brands to
advertise products and target interested customers, which is the prime source of income for
these platforms [11]. In order to market the right product which appeals to its user, the social
media platforms monitor their activities and comments[12, 13]. This enables them to know the
FIRE 2021: Forum for Information Retrival Evaluation, December 13-17, 2021, India
Envelope-Open karthikp18c@iiitt.ac.in (K. Puranik); bharathib@ssn.edu.in (B. B); senthil@ssn.edu.in (S. K. B)
Orcid 0000-0002-0877-7063 (K. Puranik); 0000-0001-7279-5357 (B. B); 0000-0003-0835-5271 (S. K. B)
                                       © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
user’s sentiment towards a product and company [14]. Another crucial application of sentiment
analysis is to automatically spot comments or posts which are offensive, abusive or spreads
hatred in the social media platforms [15]. Social media is a free space and no restrictions can be
imposed on the comments or posts being circulated. However, the comments can certainly be
detected and overseen to protect underage and the users who are vulnerable to get offended
[16, 17].
   Social media features multilingual speakers from all over the world, and people tend to use a
lot of variations while expressing their thoughts[18]. Native speakers writing in Roman script
is the most common scene due to the easy accessibility and customary usage of Roman script
keyboards in mobile phones and desktop keyboards[19]. However, some users tend to write
in the native script too. Finally, there is a case of code-mixing where two or more languages
are merged in respect to the script or the usage [20]. Sentiment analysis becomes difficult for
such texts. In this paper, the method of transliterating the text is applied. Transliterating is the
process of converting a text from one script to another while maintaining the pronunciation
[21, 22]. This brings about a uniformity in the text and helps the model learn better. However,
due to the presence of English text in the code-mixed dataset, there has also been a slight effort
to translate [23] the text transliterated in the native language to English and train the model
with it.
   This research paper depicts our work for the shared task Dravidian-CodeMix1 at FIRE 2021
[24, 25]. The task was to detect the sentiment in the sentences for three of the major Dravidian
languages [26] Kannada, Tamil, and Malayalam. Our system models stood 5th, 4th and 10th
respectively in the shared task. The codes for the model and the transliterated and translated
datasets are provided in the link2 .


2. Dataset
The dataset provided by the organizers of the shared task has been used to train the models
[12, 13, 27, 28]. It contains annotated sentences obtained by cleaning YouTube comments3 .
The sentences are highly code-mixed and contains inter-sentimental, intra-sentimental and
tag switching which are prevalent in code-mixed data to be classified into five classes namely,
positive, negative, unknown state, mixed feelings and not the intended language. The train,
development and test distribution can be viewed in Table 1.

                             Split             Kannada       Tamil   Malayalam
                             Training             6,213     35,657       15,889
                             Development            692      3,963        1,767
                             Test                   768      4,403        1,963
                             Total                7,673     44,023       19,619


Table 1
Train-Development-Test Data Distribution

    1
      https://dravidian-codemix.github.io/2021/index.html
    2
      https://github.com/karthikpuranik11/FIRE2021
    3
      https://www.youtube.com/
  Further, the transliterations (TRAI) of the code-mixed training (TRA) dataset in the respective
Dravidian languages were used. Small preprocessing steps like removing the language tag
and brackets, removing all the sentences which belong to “not-language” were removed. The
English translations (TRAA) of these transliterations was also used as a part of this research. It
was evident that English was the most widely used language after the Dravidian language. A
few comments represented in the TRA, TRAI and TRAA datasets belonging to the five classes
are tabulated in Table 2.


3. Methodology
Based on previous researches, two of the most promising pre-trained models, ULMFiT [29]
and BERT [30] with bidirectional LSTM layers [31], were used to determine the sentiment
of the sentences. These models were fine-tuned separately on the training data provided by
the organizers, the transliterated data combined with the training data, the translated data
combined with the training data and the combination of all the three datasets.

3.1. BERT
Bidirectional Encoder Representation from Transformers (BERT) is one of the most popular
transformer based models, trained extensively on the entire Wikipedia and 0.11 million Word-
Piece sentences [32] for over 104 languages in the world. The unprecedented methods like
Next Sentence Prediction (NSP) and Masked Language Modelling (MLM) successfully catch a
deeper context of the languages. For the particular task, bert-base-multilingual-cased [33]
from HuggingFace4 [34] has been used. It comprises 12 layers and attention heads and about
110M parameters.
   This model was further concatenated with bidirectional LSTM layers, which are known to
improve the information being fed. The bidirectional layers read the embeddings from both the
directions, hence, boosts the context and the F1 scores drastically. Further, the training was
done with an Adam optimizer [35], a learning rate of 2e-5 with the cross-entropy loss function
[36, 37] for a total of 5 epochs. The various parameters employed in the BERT+ BiLSTM model
van be viewed in Table 3.

3.2. ULMFiT
Universal Language Model Fine-tuning, or ULMFiT was one of the initial transfer learning
method to produce state-of-the-art results for NLP tasks. It was trained on very huge datasets
like Wikitext-1035 with around 103M sentences. It employs three novel techniques for fine-
tuning the language models for various NLP tasks, which are discriminative fine-tuning, slanted
triangular learning rates (STLR) and gradual unfreezing. AWD-LSTM language model [38, 39],
a standard LSTM consisting 3 layers and 1150 hidden activation per layer and an embedding
size of 400 and without any attentions and just well tune dropouts, is generally used. Adam

   4
       https://huggingface.co/
   5
       https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/
Table 2
Examples of the code-mixed sentences, its transliteration and translation in Kannada, Tamil and Malay-
alam


optimizer with starting learning rate of 1e-8 and an end learning rate of 1e-2 and a dropout of
0.5 is used.
                               Parameter                             Value
                               Number of LSTM units                     256
                               Dropout                                   0.4
                               Activation Function                    ReLU
                               Max Len                                  128
                               Batch Size                                32
                               Optimizer                           AdamW
                               Learning Rate                           2e-5
                               Loss Function                  cross-entropy
                               Number of epochs                            5

Table 3
Parameters for the BERT+BiLSTM model.


3.3. Transliteration
The IndianNLP-Transliteration6 tool from AI4Bharat was used to get the transliterations of the
training dataset. This deep transliteration tool can transliterate from Roman script to any low
resourced Indian language. The architecture majorly consists of Recurrent Neural Networks
(RNN) [40] with encoders and decoders [41]. The decoder employs top ’k’ predictions and
then re-ranked to get the most probable word [42]. It is observed that most of the sentences in
the Dravidian language present in the code-mixed dataset is the languages written in Roman
script. The multilingual pre-trained models, largely trained on these Dravidian languages in
their original scripts, might find it hard to comprehend such sentences. Transliterating them
back to the original script could possibly improve the accuracy [43].

3.4. Translation
The transliterated data in the Dravidian language is translated to English using IndicTrans [44]
from AI4Bharat7 . This PyTorch Fairseq8 [45, 46] based Transformer NMT model, is trained on
a large parallel corpus containing 46.9 million sentences of Samanantar dataset. The model is
known to produce state-of-the-art BLEU [47] scores for 11 Indian languages. The translations
given by the IndicTrans baseline model on the transliterated dataset was used. The reason for
using the translated data was due to the presence of excessive English in the code-mixed dataset,
and most of the pre-trained models are trained on large number of English sentences.


4. Results
In this section, the F1 scores of the BERT and ULMFiT models for the sentiment analysis of
Kannada, Tamil and Malayalam datasets are compared, and suitable analysis are recorded. The
weighted F1 scores are tabulated in Table 4. The models are fine-tuned on training dataset

   6
     https://github.com/AI4Bharat/IndianNLP-Transliteration
   7
     https://github.com/AI4Bharat/indicTrans
   8
     https://github.com/pytorch/fairseq
(TRA), the combination of transliterated dataset and TRA (TRAI), translated (TRAA) dataset
and TRA and all 3 merged.

    Table 4
    Weighted F1-scores of sentiment analysis on the test datasets, where P: Precision, R: Recall
    and F1: F1 score.
    Dataset                                                  Kannada
                                               BERT                           ULMFiT
                                         P         R        F1           P         R         F1
    Train (TRA)                     0.5952    0.6185    0.6040      0.6547    0.6276     0.6389
    Transliterate + TRA (TRAI)      0.5587    0.6133    0.5831       0.6239    0.6081     0.6150
    Translate + TRA (TRAA)          0.6176    0.6367    0.6231       0.6078    0.5990     0.6031
    Merged (TRA+TRAI+TRAA)          0.6079    0.6172    0.6113       0.6172    0.5885     0.5993
                                                                 Tamil
                                               BERT                           ULMFiT
                                         P         R        F1           P         R         F1
    Train (TRA)                     0.5291    0.5572    0.5308       0.6544    0.6229     0.6362
    Transliterate + TRA (TRAI)      0.5334    0.5502    0.5366      0.6889    0.6372     0.6583
    Translate + TRA (TRAA)          0.5284    0.5427    0.5310       0.6694    0.6379     0.6514
    Merged (TRA+TRAI+TRAA)          0.5298    0.5570    0.5367       0.6629    0.6306     0.6432
                                                            Malayalam
                                               BERT                           ULMFiT
                                         P         R        F1           P         R         F1
    Train (TRA)                     0.6238    0.6733    0.6457       0.7084    0.6937     0.6990
    Transliterate + TRA (TRAI)      0.6874    0.7018    0.6933      0.7139    0.7013     0.7062
    Translate + TRA (TRAA)          0.5976    0.7142    0.6467       0.7086    0.6901     0.6970
    Merged (TRA+TRAI+TRAA)          0.6822    0.6927    0.6863       0.7041    0.6952     0.6984

    It is firstly clear from Table 4 that ULMFiT manages to get better F1 scores than BERT
concatenated with biLSTM layers for the majority of the datasets. The unique transfer learning
techniques used by ULMFiT like the discriminative fine-tuning, slanted triangular learning rates
and gradual unfreezing seem to successfully produce exceptional F1 scores. Discriminative
fine-tuning allows us to fine-tune each layer separately with different learning rates. Gradual
unfreezing improves it further by keeping the last layer frozen in the first epoch and unfreezing
layer by layer for the further epochs. Except for the Tamil data, BERT manages to give results
comparable to ULMFiT for the other languages.
    ULMFiT fine-tuned on the TRA dataset gives the best F1-score of 0.639 for the Kannada task.
It is followed by BERT fine-tuned on the TRAA dataset with 0.623. Other models gave similar
results. It is surprising how the models managed to give F1 scores akin to other languages,
considering the limited size of the dataset. ULMFiT manages to surpass BERT by a huge
difference for the Tamil task. The presence of class imbalances in the Tamil dataset could be a
reason for this issue. The “positive” comments are 2,830 in number out of the 4,402 sentences in
the test dataset, while “not-Tamil” which is just 210. This imbalance causes a variation in the
results. ULMFiT on TRAI and TRAA gave nearly similar F1 scores of 0.658 and 0.651 respectively.
ULMFiT trained on all the four datasets gave equivalent results for the Malayalam task, with
TRAI giving the best score of 0.706. BERT trained on TRAI gave a competitive score of 0.6933
for the same task.
   The basic observation derived while comparing the various datasets is the equal contention
between the four datasets used. But, the most common scenario is that the TRAI dataset
manages to have the upper hand in the majority of the cases. The most plausible explanation to
this is due to the fact that the dataset is code-mixed and the Dravidian text written in Roman
script. When that is converted to the native script, the model manages to fine-tune well. With
the original data also present, the model manages to fine-tune on the English text too. However,
we can’t be entirely sure of the accuracy of transliterations from the IndianNLP-Transliteration
tool. Another drawback of transliterating the code-mixed sentences is that the English and other
language also get transliterated to the Kannada/Tamil/Malayalam script. Such words might
not be able to recognized by the model at all. Dravidian languages can be complex and there
might be several ways in which the comments in the Roman script can be transliterated, and a
slight variation can change the meaning entirely[48]. However, to tackle these, we merge the
transliterated dataset with the TRA data so that the model manages to learn the other languages
in the code-mixed data too.
   The TRAA and the merged dataset proves to be inefficient due to its low F1 scores. The
TRAA dataset is not significantly behind the TRAI data, which proves that there is a scope to
increase the accuracy with further research. Though IndicTrans one of the best models for
machine translation of Indian languages has been employed, we can surely not rely entirely
on the translations of the transliterated data. Further, cleaning of the TRAA data by removing
sentences which fail to make any sense and fine-tuning the IndicTrans model on a suitable
parallel corpus before translating it can be done to obtain good F1 scores for the TRAA dataset.
The combination of the three datasets however fails miserably in most of the cases due to the
repetition of the sentences in different forms, which seems to make the model not learning
anything productively, and the inaccuracies in the TRAA and TRAI datasets add up to reduce
the F1 scores even further.


5. Conclusion
Sentiment analysis of social media comments emerges as one of the most notable tasks of
natural language processing (NLP). In order to obtain good F1 scores for the sentiment analysis
of social media comments in code-mixed Dravidian languages Kannada, Tamil and Malayalam,
after careful experimentation with Transformer based ULMFiT and mBERT fine-tuned on TRA,
TRAI, TRAA and merged dataset, ULMFiT proved to give the best F1 scores for all the three
languages. For Kannada, it was on the TRA dataset, while TRAI proved effective for Tamil and
Malayalam. This paper introduces the use of TRAA dataset which can be worked upon in the
future.
References
 [1] M. Rambocas, J. Gama, Marketing Research: The Role Of Sentiment Analysis, FEP Working
     Papers 489, Universidade do Porto, Faculdade de Economia do Porto, 2013. URL: https:
     //ideas.repec.org/p/por/fepwps/489.html.
 [2] T. Nasukawa, J. Yi, Sentiment analysis: Capturing favorability using natural language
     processing, 2003, pp. 70–77. doi:1 0 . 1 1 4 5 / 9 4 5 6 4 5 . 9 4 5 6 5 8 .
 [3] B. Keith, E. Fuentes, C. Meneses, A hybrid approach for sentiment analysis applied to
     paper, in: Proceedings of ACM SIGKDD Conference, Halifax, Nova Scotia, Canada, 2017,
     p. 10.
 [4] A. F. Anees, A. Shaikh, A. Shaikh, S. Shaikh, Survey paper on sentiment analysis: Tech-
     niques and challenges, EasyChair2516-2314 (2020).
 [5] A. Hande, K. Puranik, R. Priyadharshini, S. Thavareesan, B. R. Chakravarthi, Evaluating
     pretrained transformer-based models for covid-19 fake news detection, in: 2021 5th
     International Conference on Computing Methodologies and Communication (ICCMC),
     2021, pp. 766–772. doi:1 0 . 1 1 0 9 / I C C M C 5 1 0 1 9 . 2 0 2 1 . 9 4 1 8 4 4 6 .
 [6] K. Puranik, A. Hande, R. Priyadharshini, S. Thavareesan, B. R. Chakravarthi, Iiitt@lt-
     edi-eacl2021-hope speech detection: There is always hope in transformers, 2021.
     arXiv:2104.09066.
 [7] K. Yasaswini, K. Puranik, A. Hande, R. Priyadharshini, S. Thavareesan, B. R. Chakravarthi,
     IIITT@DravidianLangTech-EACL2021: Transfer learning for offensive language detection
     in Dravidian languages, in: Proceedings of the First Workshop on Speech and Language
     Technologies for Dravidian Languages, Association for Computational Linguistics, Kyiv,
     2021, pp. 187–194. URL: https://aclanthology.org/2021.dravidianlangtech-1.25.
 [8] P. K. Jada, D. S. Reddy, K. Yasaswini, C. Prabakaran, A. Sampath, S. Thangasamy,
     IIIT@Dravidian-CodeMix-FIRE2021: Transformer Model based Sentiment Analysis in
     Dravidian Languages, in: Working Notes of FIRE 2021 - Forum for Information Retrieval
     Evaluation, CEUR, 2021.
 [9] A. Hande, K. Puranik, R. Priyadharshini, B. R. Chakravarthi, Domain identification of
     scientific articles using transfer learning and ensembles, in: Trends and Applications in
     Knowledge Discovery and Data Mining: PAKDD 2021 Workshops, WSPA, MLMEIN, SD-
     PRA, DARAI, and AI4EPT, Delhi, India, May 11, 2021 Proceedings 25, Springer International
     Publishing, 2021, pp. 88–97.
[10] A. Hande, S. U. Hegde, R. Priyadharshini, R. Ponnusamy, P. K. Kumaresan, S. Thavareesan,
     B. R. Chakravarthi, Benchmarking multi-task learning for sentiment analysis and offensive
     language identification in under-resourced dravidian languages, 2021. a r X i v : 2 1 0 8 . 0 3 8 6 7 .
[11] T. Oikonomidis, K. Fouskas, Is Social Media Paying Its Money?, 2019, pp. 999–1006. doi:1 0 .
     1007/978- 3- 030- 12453- 3_115.
[12] B. R. Chakravarthi, N. Jose, S. Suryawanshi, E. Sherly, J. P. McCrae, A sentiment analysis
     dataset for code-mixed Malayalam-English, in: Proceedings of the 1st Joint Workshop on
     Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration
     and Computing for Under-Resourced Languages (CCURL), European Language Resources
     association, Marseille, France, 2020, pp. 177–184. URL: https://www.aclweb.org/anthology/
     2020.sltu-1.25.
[13] B. R. Chakravarthi, V. Muralidaran, R. Priyadharshini, J. P. McCrae, Corpus creation for
     sentiment analysis in code-mixed Tamil-English text, in: Proceedings of the 1st Joint
     Workshop on Spoken Language Technologies for Under-resourced languages (SLTU)
     and Collaboration and Computing for Under-Resourced Languages (CCURL), European
     Language Resources association, Marseille, France, 2020, pp. 202–210. URL: https://www.
     aclweb.org/anthology/2020.sltu-1.28.
[14] F. Neri, C. Aliprandi, F. Capeci, M. Cuadros, T. By, Sentiment analysis on social media, in:
     2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and
     Mining, 2012, pp. 919–926. doi:1 0 . 1 1 0 9 / A S O N A M . 2 0 1 2 . 1 6 4 .
[15] B. R. Chakravarthi, Hopeedi: A multilingual hope speech detection dataset for equality,
     diversity, and inclusion, in: Proceedings of the Third Workshop on Computational
     Modeling of People’s Opinions, Personality, and Emotion’s in Social Media, 2020, pp.
     41–53.
[16] Y. Chen, Y. Zhou, S. Zhu, H. Xu, Detecting offensive language in social media to protect
     adolescent online safety, in: 2012 International Conference on Privacy, Security, Risk and
     Trust and 2012 International Confernece on Social Computing, IEEE, 2012, pp. 71–80.
[17] S. U. Hegde, A. Hande, R. Priyadharshini, S. Thavareesan, R. Sakuntharaj, S. Thangasamy,
     B. Bharathi, B. R. Chakravarthi, Do images really do the talking? analysing the significance
     of images in tamil troll meme classification, 2021. a r X i v : 2 1 0 8 . 0 3 8 8 6 .
[18] U. Barman, A. Das, J. Wagner, J. Foster, Code mixing: A challenge for language iden-
     tification in the language of social media, in: Proceedings of the first workshop on
     computational approaches to code switching, 2014, pp. 13–23.
[19] A. Hande, R. Priyadharshini, A. Sampath, K. P. Thamburaj, P. Chandran, B. R. Chakravarthi,
     Hope speech detection in under-resourced kannada language, 2021. a r X i v : 2 1 0 8 . 0 4 6 1 6 .
[20] S. Thara, P. Poornachandran, Code-mixing: A brief survey, in: 2018 International
     Conference on Advances in Computing, Communications and Informatics (ICACCI), 2018,
     pp. 2382–2388. doi:1 0 . 1 1 0 9 / I C A C C I . 2 0 1 8 . 8 5 5 4 4 1 3 .
[21] K. Regmi, J. Naidoo, P. Pilkington, Understanding the processes of translation and translit-
     eration in qualitative research, International Journal of Qualitative Methods 9 (2010)
     16–26.
[22] P. Kalyan, D. Reddy, A. Hande, R. Priyadharshini, R. Sakuntharaj, B. R. Chakravarthi,
     Iiitt at case 2021 task 1: Leveraging pretrained language models for multilingual protest
     detection, in: CASE, 2021.
[23] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align
     and translate, arXiv preprint arXiv:1409.0473 (2014).
[24] B. R. Chakravarthi, R. Priyadharshini, S. Thavareesan, D. Chinnappa, D. Thenmozhi,
     E. Sherly, J. P. McCrae, A. Hande, R. Ponnusamy, S. Banerjee, C. Vasantharajan, Findings
     of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text, in: Working Notes
     of FIRE 2021 - Forum for Information Retrieval Evaluation, CEUR, 2021.
[25] R. Priyadharshini, B. R. Chakravarthi, S. Thavareesan, D. Chinnappa, D. Thenmozi,
     E. Sherly, Overview of the dravidiancodemix 2021 shared task on sentiment detection
     in tamil, malayalam, and kannada, in: Forum for Information Retrieval Evaluation, FIRE
     2021, Association for Computing Machinery, 2021.
[26] B. Krishnamurti, The dravidian languages, Cambridge University Press, 2003.
[27] A. Hande, R. Priyadharshini, B. R. Chakravarthi, KanCMD: Kannada CodeMixed dataset
     for sentiment analysis and offensive language detection, in: Proceedings of the Third
     Workshop on Computational Modeling of People’s Opinions, Personality, and Emotion’s
     in Social Media, Association for Computational Linguistics, Barcelona, Spain (Online),
     2020, pp. 54–63. URL: https://www.aclweb.org/anthology/2020.peoples-1.6.
[28] B. R. Chakravarthi, HopeEDI: A multilingual hope speech detection dataset for equality,
     diversity, and inclusion, in: Proceedings of the Third Workshop on Computational
     Modeling of People’s Opinions, Personality, and Emotion’s in Social Media, Association
     for Computational Linguistics, Barcelona, Spain (Online), 2020, pp. 41–53. URL: https:
     //aclanthology.org/2020.peoples-1.5.
[29] J. Howard, S. Ruder, Universal language model fine-tuning for text classification, 2018.
     arXiv:1801.06146.
[30] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, 2019. a r X i v : 1 8 1 0 . 0 4 8 0 5 .
[31] J. P. C. Chiu, E. Nichols, Named entity recognition with bidirectional lstm-cnns, 2016.
     arXiv:1511.08308.
[32] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao,
     Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, Łukasz Kaiser, S. Gouws,
     Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith,
     J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, J. Dean, Google’s neural machine
     translation system: Bridging the gap between human and machine translation, 2016.
     arXiv:1609.08144.
[33] T. Pires, E. Schlinger, D. Garrette, How multilingual is multilingual BERT?, in: Pro-
     ceedings of the 57th Annual Meeting of the Association for Computational Linguistics,
     Association for Computational Linguistics, Florence, Italy, 2019, pp. 4996–5001. URL:
     https://aclanthology.org/P19-1493. doi:1 0 . 1 8 6 5 3 / v 1 / P 1 9 - 1 4 9 3 .
[34] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
     M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao,
     S. Gugger, M. Drame, Q. Lhoest, A. M. Rush, Huggingface’s transformers: State-of-the-art
     natural language processing, 2020. a r X i v : 1 9 1 0 . 0 3 7 7 1 .
[35] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2017. a r X i v : 1 4 1 2 . 6 9 8 0 .
[36] Z. Zhang, M. R. Sabuncu, Generalized cross entropy loss for training deep neural networks
     with noisy labels, 2018. a r X i v : 1 8 0 5 . 0 7 8 3 6 .
[37] A. F. Agarap, Deep learning using rectified linear units (relu), 2019. a r X i v : 1 8 0 3 . 0 8 3 7 5 .
[38] S. Merity, N. S. Keskar, R. Socher, Regularizing and optimizing lstm language models, 2017.
     arXiv:1708.02182.
[39] A. Hande, K. Puranik, K. Yasaswini, R. Priyadharshini, S. Thavareesan, A. Sampath, K. Shan-
     mugavadivel, D. Thenmozhi, B. R. Chakravarthi, Offensive language identification in low-
     resourced code-mixed dravidian languages using pseudo-labeling, 2021. a r X i v : 2 1 0 8 . 1 2 1 7 7 .
[40] A. Sherstinsky, Fundamentals of recurrent neural network (rnn) and long short-term
     memory (lstm) network, Physica D: Nonlinear Phenomena 404 (2020) 132306. URL:
     http://dx.doi.org/10.1016/j.physd.2019.132306. doi:1 0 . 1 0 1 6 / j . p h y s d . 2 0 1 9 . 1 3 2 3 0 6 .
[41] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polo-
     sukhin, Attention is all you need, in: Advances in neural information processing systems,
     2017, pp. 5998–6008.
[42] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align
     and translate, 2016. a r X i v : 1 4 0 9 . 0 4 7 3 .
[43] M. Thomas, C. Latha, Sentimental analysis of transliterated text in malayalam using
     recurrent neural networks, Journal of Ambient Intelligence and Humanized Computing
     (2020) 1–8.
[44] G. Ramesh, S. Doddapaneni, A. Bheemaraj, M. Jobanputra, R. AK, A. Sharma, S. Sahoo,
     H. Diddee, M. J, D. Kakwani, N. Kumar, A. Pradeep, K. Deepak, V. Raghavan, A. Kunchukut-
     tan, P. Kumar, M. S. Khapra, Samanantar: The largest publicly available parallel corpora
     collection for 11 indic languages, 2021. a r X i v : 2 1 0 4 . 0 5 5 9 6 .
[45] M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, M. Auli, fairseq: A fast,
     extensible toolkit for sequence modeling, 2019. a r X i v : 1 9 0 4 . 0 1 0 3 8 .
[46] K. Puranik, A. Hande, R. Priyadharshini, T. Durairaj, A. Sampath, K. Thamburaj, B. R.
     Chakravarthi, Attentive fine-tuning of transformers for translation of low-resourced
     languages @loresmt 2021, 2021.
[47] K. Papineni, S. Roukos, T. Ward, W. J. Zhu, Bleu: a method for automatic evaluation of
     machine translation (2002). doi:1 0 . 3 1 1 5 / 1 0 7 3 0 8 3 . 1 0 7 3 1 3 5 .
[48] A. Kumar, R. Cotterell, L. Padró, A. Oliver, Morphological analysis of the dravidian
     language family, 2017, pp. 217–222. doi:1 0 . 1 8 6 5 3 / v 1 / E 1 7 - 2 0 3 5 .