Overview of the Shared Task on Sentiment Analysis
and Homophobia Detection of YouTube Comments
in Code-Mixed Dravidian Languages⋆
Kogilavani Shanmugavadivel1 , Malliga Subramanian1 , Prasanna Kumar Kumaresan2 ,
Bharathi Raja Chakravarthi3 , B Bharathi4 ,
Subalalitha Chinnaudayar Navaneethakrishnan5 , Lavanya Sambath Kumar5 ,
Thomas Mandl6 , Rahul Ponnusamy7 , Vasanth Palanikumar8 and Manoj J Balaji9
1
  Kongu Engineering College, Tamil Nadu, India
2
  Indian Institute of Information Technology and Management-Kerala, India
3
  Insight Centre for Data Analytics, National University of Ireland, Galway
4
  SSN College of Engineering, Tamil Nadu, India
5
  SRM Institute of Science and Technology, Chennai, Tamil Nadu, India
6
  University of Hildesheim, Germany
7
  Techvantage Analytics, Kerala, India
8
  Chennai Institute of Technology, Tamil Nadu, India
9
  WorldQuant University, New Orleans, Louisiana


              Abstract
              Sentiment analysis is the task of identifying subjective opinions or emotional responses about a given
              topic. Sentiment analysis of social media posts, which are primarily code-mixed for Dravidian languages,
              is becoming more and more popular. Homophobia detection is the task of identifying homophobia,
              transphobia, and non-anti-LGBT+ content from social media YouTube comments. In this paper, we report
              an overview of the findings and results from the shared task on sentiment analysis and homophobia
              detection in Code-Mixed Dravidian languages organized as a part of FIRE 2022. For shared task-A, the
              participants were provided with development, training, and test dataset code-mixed text in Dravidian
              languages (Tamil-English, Malayalam-English, and Kannada-English). The goal of the shared task-A
              is to classify code-mixed YouTube comments into positive, negative, neutral, or mixed emotions. For
              shared task B, the participants were provided with development, training, and test dataset in English,
              Malayalam, and Tamil languages. The goal of the shared task B is to classify the text as homophobic,
              transphobic, or not. A total of 95 participants registered for the shared task, 13 teams finally submitted
              their results for task-A, and 10 teams submitted their results for task B. The performance of the systems
              submitted was evaluated in terms of macro-F1 score. The datasets for this challenge are openly available
              on the competition website1 .

              Keywords
              Sentiment analysis, Homophobia detection, Code-Mixed Dravidian Language, Machine Learning, Deep
              Learning,


1
    https://codalab.lisn.upsaclay.fr/competitions/5310
1. Introduction
It is now possible for a greater number of people than ever before to exercise their right to
freedom of expression thanks to the proliferation of social media platforms like Twitter and
YouTube, as well as the anonymity afforded to users of these platforms. This leads to an increase
in user-generated content, which can include opinions, sentiments, reviews about products and
movies, likes and dislikes regarding an event or news, and much more. Due to objectionable
content, such as threats and remarks directed at individuals, groups, or organizations, this, on
the other hand, leads to the exploitation of these platforms in order to spread violence [1, 2, 3, 4].
Naturally, comments, posts, and articles have a tendency to imply a variety of things to a wide
variety of people all over the world.
   People frequently take advantage of this freedom to make comments that promote hatred
and toxic behavior. Due to the ease with which users can share content (videos, posts, and
shots), as well as like, share, and comment on said content, YouTube has become an extremely
popular platform. The negative impact of this is that it makes more room for overt forms of
cyberbullying and online harassment to occur [5]. This frequently has a significant influence
on the lives of the individuals and communities that are impacted [6].
   The field of Natural Language Processing (NLP) has seen an increase in the use of shared
tasks [7, 8] in an effort to identify such exploits. Researchers and academicians have become
interested in developing models for these shared tasks. This article summarizes the research
works submitted for the shared task on Sentiment Analysis and Homophobia detection of
YouTube comments in Code-Mixed Dravidian Languages [9]. Sentiment analysis and identifying
homophobic and transphobic comments make up the two subtasks included in this shared task.
Below we provide a brief explanation of these subtasks.
   Sentiment analysis is a subtask of NLP that uses computational methods to analyze, process,
and better understand a user’s emotions behind a text or interaction [10]. It sorts the opinions
of its users into different groups. It lets organizations learn from a large amount of unstructured
data and change their strategies to suit their target market better. Sentiment analysis has a
subtask to find subjective opinions or emotional reactions to a given topic [11]. In the last two
decades20 years, both academia and industry have been doing research in this area.
   There is an increasing demand for sentiment detection on social media texts which are largely
code-mixed for Dravidian languages. Tamil, Malayalam, and Kannada belongs to Dravidian
Shared task on Sentiment Analysis and Homophobia detection of YouTube comments in Code-Mixed Dravidian Languages,
FIRE 2022
⋆
    You can use this document as the template for preparing your publication. We recommend using the latest version
    of the ceurart style.
∗
    Corresponding author.
†
     These authors contributed equally.
Envelope-Open kogilavani@kongu.ac.in (K. Shanmugavadivel); mallinishanth72@gmail.com (M. Subramanian);
prasanna.mi20@iiitmk.ac.in (P. K. Kumaresan); bharathi.raja@insight-centre.org (B. R. Chakravarthi);
bharathib@ssn.edu.in (B. Bharathi); subalalitha@gmail.com (S. C. Navaneethakrishnan);
sklavanyasambath@gmail.com (L. S. Kumar); mandl@uni-hildesheim.de (T. Mandl);
rahulponnusamy160032@gmail.com (R. Ponnusamy); vasanthpcse2019@citchennai.net (V. Palanikumar);
manojbalaji1@gmail.com (M. J. Balaji)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
languages [12, 13]. In the shared task, sentiment analysis aims to figure out how the text
makes people feel and put them into predefined groups. This shared subtask includes YouTube
comments written in Tamil, Malayalam, and Kannada and labeled as ”positive,” ”negative,”
”mixed,” or ”unknown.” The Tamil dataset has an extra class called ”Non-Tamil” for comments
that are not written in Tamil [14]. The researchers are also given test datasets to evaluate the
proposed classification models.
   The term ”LGBTQ+ community” [15] designates a group or community of individuals who
identify as lesbian, gay, bisexual, transgender, or queer, including all gender identities and
sexual orientations not expressly covered by LGBTQ. The term ”homophobia” describes the
hostility toward those who identify as homosexual, transgender, or queer. LGBTQ individuals
may experience significant psychological stress due to homophobia and transphobia, which will
prevent them from participating in typical social activities and could result in severe mental
illness. To clear cyberspace, build a friendly and healthy online community, and increase
awareness of the unfair treatment of LGBTQ groups, it is crucial to identify and remove
homophobia and transphobia as soon as they appear [16]. An effort to spread positivity about
the LGBTQIA community by building a Tamil dataset about the community and identifying the
offensive and Non-Offensive terminology in the dataset [17][18].
   The second subtask of identifying such unpleasant comments has been made to aid in this
issue. The datasets for detecting Homophobic and transphobic comments in three languages,
Tamil, English, and Malayalam, are presented as a part of this shared sub-task. In addition, a
code-mixed dataset with Tamil and English has also been proposed. The datasets have the class
labels such as Homophobic, Transphobic, and Non-anti-LGBT+ content.
   Participants in the shared task were given access to the training and validation data, complete
with labels, and the test data, which did not contain any labels. These participants in the shared
task built machine learning and deep learning models for the two subtasks and then submitted
their predictions for the labels that should be applied to the test data.
   This article presents an overview of the shared task on Sentiment Analysis and Homophobia
detection of YouTube comments in Code-Mixed Dravidian Languages. This work discusses the
various models submitted to the shared task and the results of the participating teams. The rest
of the article is orchestrated as follows: Section 2 describes the shared task. Section 3 discusses
about the dataset. Section 4 provides information about task setting. Section 5 summarizes the
systems and the methodologies used in each participating team for both the shared sub-tasks
and highlights the features of each model. The analysis of the results and findings of the
methodologies submitted by the participants are presented in Section 6. Concluding remarks
are presented in Section 7.


2. Task Description
The goal of the proposed shared task is to perform Sentiment Analysis and Homophobia
detection of social media comments in Code-Mixed Dravidian Languages. Sentiment analysis
is the task of identifying subjective opinions or emotional responses about a given topic.
Homophobia and Transphobia detection identifies homophobia, transphobia, and non-anti-
LGBT+ content from the given corpus.
2.1. Shared Task-A
Shared task-A aims to identify sentiment polarity such as positive, negative, neutral, or mixed
emotions of the code-mixed dataset of comments/posts in Tamil-English, Malayalam-English,
and Kannada-English collected from social media. The participants were provided training,
development, and test datasets in three code-mixed languages (Tamil-English, Malayalam-
English, and Kannada-English). The annotations of the datasets were made at the comment/post
level. A comment/post may contain more than one sentence, but the average sentence length
of the corpus is one. The participants could choose to take part in classifying one or more
code-mixed languages. Leader-board results were published for each code-mixed language.
Some sample sentences from the datasets and their annotations are provided below.

2.2. Shared Task-B
Shared task B aims to identify homophobia, transphobia, or not from the given corpus. Ho-
mophobia and transphobia are toxic languages directed at LGBTQ+ individuals described as
hate speech. In this shared task, participants were provided with comments extracted from
social media platforms, and they had to predict whether it was homophobic/transphobic in
nature. The seed data for this task is the Homophobia/Transphobia Detection dataset [9], a
collection of comments from YouTube. This dataset consists of manually annotated comments
indicating whether the text is homophobic/transphobic or not. The participants were provided
with development, training, and test datasets in English, Malayalam, and Tamil. Some sample
sentences from the datasets and their annotations are provided below.


3. Datasets
The corpus in this shared task-A consists of 67,554 social media comments in three different
code-mixed languages. There are 40,267 comments in Tamil-English, 19,616 in Malayalam-
English, and 7,671 in Kannada-English. The corpus provided in this shared task B consists of
20,150 social media comments in Tamil, English, Malayalam, and English+Tamil. There are
3,977 comments in Tamil, 4,946 in English, 5,193 in Malayalam, and 6,034 in English+Tamil.
   Table 1 shows the corpus statistics for task A, and Table 2 represents the corpus details of
task B in terms of language. The annotated datasets were divided into training, development,
and test sets to contain approximately 80%, 10% and 10% of the total number of comments. The
corpus statistics were calculated using the NLTK tool [19]. There are more non-hope speech
comments than hope speech [20, 21, 22]. This makes the datasets imbalanced and skewed more
towards one class than the other, which the participants had to consider when developing their
classification systems.
Table 1
Number of comments in each dataset used for Task A
                                           Task - A
                                        Train Dev        Test     Total
                         Tamil           35656 3962       650     40268
                         Malayalam 15888 1766            1962     19616
                         Kannada          6212    691     768     7671

Table 2
Number of comments in each dataset used for Task B
                                       Task - B
                                          Train Dev        Test    Total
                        Tamil               2662   666      649    3977
                        Malayalam           3114   866     1213    5193
                        English             3164   792      990    4946
                        Tamil-English       3861   966     1207    6034


4. Task Setting
4.1. Training Phase
During the training phase, we provided participants with labeled training and development
data that they could use to train and validate their models. We released the data for all the
languages, and the participants had to decide whether they could participate in developing
models for more than one language. The goal of this phase was to provide the participants with
sufficient data that they could use to perform cross-validation for their preliminary evaluations
and hyper-parameter setting. This ensured that participants were ready for the assessment
before releasing the unlabeled test data. 95 participants were registered for the shared task and
downloaded the datasets in this phase.

4.2. Testing Phase
During the testing phase, the participants were given test data without labels. Each participating
team was allowed as many submissions as possible, from which the best result was considered
for preparing the leaderboard ranking. The submission outputs were compared with the gold
standard labels. The ranking list was based on the best performance measured on the macro
F1-scores. For the shared task-A phase, 8,11, and 13 participants submitted their results for
Tamil-English, Malayalam-English, and Kannada-English, respectively. For the shared task-B
stage, there were 8,8,9, and 8 participants submitted their results for Tamil, English, Malayalam,
and English+Tamil, respectively.


5. Systems
The systems used by the participants include a broad spectrum of machine learning algorithms,
deep learning algorithms, and transformer-based models. Machine learning algorithms have Lo-
gistic Regression, Passive Aggressive, Support Vector Machine, Naive Bayes, Gradient Boosting,
Stacking Ensemble, Gradient Boosting Classifier, Random forest, and Voting Ensemble models
[23][24]. Count vectorizer and TF-IDF have been used for feature representation. Bidirectional
LSTM, Multi-Layered Perceptron, and fastText+LightGBM are the deep learning algorithms
opted for by the participants. Transformer-based models like MPNet, SBERT, XLM RoBERTa,
Indic BERT, and LaBSE model have been experimented with for the given task [25][26][27][28].
   While analyzing the performance of the top three systems of Task A for the Tamil-English
dataset, Bidirectional LSTM has performed well when compared to that of the machine learning,
and transformer paradigms [29]. For the Malayalam- English and Kannada-English datasets,
the transformer models, mBERT and XLMRoBERTa, have better performance compared to the
other models [30].
   On the other hand, while analyzing the top three systems of Task B, for Tamil-English, LSTM
has performed well, followed by XLMRoBERTa [31][32][33]. For Malayalam-English, again
deep learning model has performed better when compared to that of the other models followed
by machine learning models and FastText [34]. For English-only texts, fastText, XLMRoBERTa,
and Indic BERT have performed well.


6. Results and Discussion
The rank list obtained by the participant’s language wise for Task A is represented in Tables 3,
4, and 5. Table 3 shows the rank list of task A in Tamil language, Table 4 presents the rank list
of task A in Malayalam language, and Table 5 depicts the rank list for task A in English track.
The rank list obtained by the participant’s language wise for task B is presented in Tables 6, 7,
8, and 9. Table 6 shows the rank list of task B in Tamil language, Table 7 presents the rank list
of task B in English language, Table 8 shows the rank list of task B in Malayalam language and
Table 9 depicts the rank list for task B in Tamil-English track.

Table 3
Rank list for Task A: Tamil track
                       Team Name    Precision    Recall    F1-score    Rank
                       SRMNLP           0.340     0.330       0.270       1
                       BharathNLP       0.190      0.220       0.190      2
                       bilstm           0.220      0.190       0.190      3
                       SSN-CSE          0.220      0.260       0.170      4
                       Sentiment        0.240      0.220       0.170      5
                       MUCS             0.240      0.190       0.160      6
                       Fnet             0.150      0.130       0.130      7
                       JPMCAI           0.020      0.160       0.020      8

  It can be observed that the performance of the top-ranked systems, the precision, and recall
values have not seen high scores. This indicates the need for much more robust pre-processing
methods and models to interpret and classify the code-mixed data in Dravidian languages.
While looking at models used for the Tamil language for Task A and task B, the transformer
models have yielded low precision scores compared to deep learning models like LSTM. The
Table 4
Rank list for Task A: Malayalam track
                      Team Name      Precision   Recall    F1-score    Rank
                      IRLAB              0.670    0.670       0.660       1
                      Fnet               0.660     0.620       0.640      2
                      Sentiment          0.620     0.630       0.630      3
                      MUCS               0.610     0.610       0.610      4
                      NITK               0.600     0.600       0.600      5
                      SRMNLP             0.610     0.550       0.570      6
                      lone_warrior       0.520     0.590       0.520      7
                      bilstm             0.490     0.580       0.500      8
                      BharathNLP         0.160     0.270       0.200      9
                      JPMCAI             0.340     0.200       0.140     10
                      SSN-CSE            0.090     0.140       0.110     11

Table 5
Rank list for Task A: Kannada track
                       Team Name Precision       Recall    F1-score    Rank
                       IRLAB         0.560        0.560        0.550      1
                       Sentiment     0.520        0.500        0.510      2
                       lone_warrior  0.470        0.510        0.480      3
                       NITK          0.480        0.500        0.480      4
                       Fnet          0.500        0.490        0.480      5
                       AI Defenders  0.490        0.480        0.480      6
                       SRMNLP        0.540        0.440        0.460      7
                       JPMCAI        0.550        0.430        0.450      8
                       MUCS          0.470        0.460        0.440      9
                       bilstm        0.480        0.500        0.430     10
                       QWERTY        0.460        0.350        0.350     11
                       BharataNLP    0.290        0.330        0.300     12
                       SSN-CSE       0.120        0.170        0.110     13

Table 6
Rank list for Task B: Tamil track
                                 Team Name         F1-score    Rank
                             mucs [33]                 0.366      1
                             fnet [25]                 0.327      2
                             CITK                      0.290      3
                             IRLab@IITBHU [30]         0.289      4
                             qwerty [23]               0.234      5
                             SSN-CSE-2022 [24]         0.234      5
                             BharataNLP [28]           0.234      5
                             nlpzip [26]               0.228      6


transformer models generally outperform deep learning algorithms like LSTM due to its multi-
headed self-attention mechanism through which it tries to understand the code mixed data
Table 7
Rank list for Task B: English track
                                 Team Name          F1-score    Rank
                          BharataNLP [28]               0.493      1
                          fnet [25]                     0.486      2
                          nlpzip [26]                   0.462      3
                          mucs [33]                     0.374      4
                          IRLab@IITBHU [30]             0.337      5
                          qwerty [23]                   0.332      6
                          SSN-CSE-2022 [24]             0.322      7
                          kongu.eng-21MSR002 [27]       0.319      8

Table 8
Rank list for Task B: Malayalam track
                                 Team Name       F1-score    Rank
                             Nitk [34]               0.974      1
                             qwerty [23]             0.943      2
                             BharataNLP [28]         0.942      3
                             CITK                    0.860      4
                             mucs [33]               0.750      5
                             fnet [25]               0.696      6
                             nlpzip [26]             0.542      7
                             IRLab@IITBHU [30]       0.427      8
                             SSN-CSE-2022 [24]       0.296      9

Table 9
Rank list for Task B: Tamil-English track
                                 Team Name       F1-score    Rank
                             mucs [33]               0.580      1
                             fnet [25]               0.555      2
                             CITK                    0.477      3
                             nlpzip [26]             0.393      4
                             qwerty [23]             0.344      5
                             IRLab@IITBHU [30]       0.333      6
                             SSN-CSE-2022 [24]       0.316      7
                             BharataNLP [28]         0.316      8


better than the LSTM-based models [17], but surprisingly the transformer-based models have
not given good results.
  This may be alleviated if the model is fine-tuned with more number of code-mixed data
and complimented with robust pre-processing strategies like transliteration, translation, spell
checking, etc., to handle the code-mixed data. A detailed error analysis of the test set will
give a more precise idea of the type of scenarios and the pr-processing strategy to be chosen.
Furthermore, it can also be observed that the values spitted by the top models are also not
that high as the top-ranked team has got only a precision score of 0.34. Since the top-ranked
models are LSTM-based models, the values can be pushed up by choosing a suitable embedding
mechanism instead of relying on TF-IDF and count vectorizer for feature representations.
  While looking at Malayalam and Kannada, the transformer models have outperformed the
deep learning forTask A, which reflects the current state of the art, and again in Task B, for
Malayalam, the transformer-based models have not outperformed the deep learning models.
This can likewise be dealt with as mentioned earlier for Task A.


7. Conclusion
To summarize, this shared task has two subtasks, sentiment analysis and homophobia/trans-
phobia detection. The shared task aims to promote the research work on Dravidian Languages.
Sentiment analysis aims at classifying the text that makes people feel and puts them into
predefined groups. The second sub-task focuses on detecting hateful comments against the
LGBTQ community. There were 13 submissions out of which 10 submissions focused on de-
tecting comments against LGBTQ. The participants have developed various models based on
machine learning and deep learning. The submissions were ranked based on the performance
of the models. When compared to the performance of the other systems, it was found that the
transformer models exhibited significantly higher levels of performance.


References
 [1] A. A. Siegel, online hate speech v2, 2019. URL: https://alexandra-siegel.com/wp-content/
     uploads/2019/08/Siegel_Online_Hate_Speech_v2.pdf.
 [2] S. Thavareesan, S. Mahesan, Sentiment lexicon expansion using word2vec and fasttext for
     sentiment prediction in tamil texts, in: 2020 Moratuwa Engineering Research Conference
     (MERCon), IEEE, 2020, pp. 272–276.
 [3] S. Thavareesan, S. Mahesan, Sentiment analysis in tamil texts: A study on machine
     learning techniques and feature representation, in: 2019 14th Conference on Industrial
     and Information Systems (ICIIS), IEEE, 2019, pp. 320–325.
 [4] S. Thavareesan, S. Mahesan, Word embedding-based part of speech tagging in tamil texts,
     in: 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS),
     IEEE, 2020, pp. 478–482.
 [5] R. Priyadharshini, B. R. Chakravarthi, C. N. Subalalitha, T. Durairaj, M. Subramanian,
     K. Shanmugavadivel, S. U. Hegde, P. K. Kumaresan, Findings of the shared task on
     Abusive Comment Detection in Tamil, in: Proceedings of the Second Workshop on Speech
     and Language Technologies for Dravidian Languages. Association for Computational
     Linguistics, 2022.
 [6] B. R. Chakravarthi, R. Priyadharshini, R. Ponnusamy, P. K. Kumaresan, K. Sampath, D. Then-
     mozhi, S. Thangasamy, R. Nallathambi, J. P. McCrae, Dataset for identification of homopho-
     bia and transophobia in multilingual youtube comments, arXiv preprint arXiv:2109.00227
     (2021).
 [7] B. R. Chakravarthi, V. Muralidaran, R. Priyadharshini, C. N. Subalalitha, J. P. McCrae,
     M. Á. García, S. M. Jiménez-Zafra, R. Valencia-García, P. Kumaresan, R. Ponnusamy, et al.,
     Overview of the shared task on hope speech detection for equality, diversity, and inclusion,
     in: Proceedings of the Second Workshop on Language Technology for Equality, Diversity
     and Inclusion, 2022, pp. 378–388.
 [8] B. R. Chakravarthi, V. Muralidaran, Findings of the shared task on hope speech detection
     for equality, diversity, and inclusion, in: Proceedings of the first workshop on language
     technology for equality, diversity and inclusion, 2021, pp. 61–72.
 [9] K. Shanmugavadivel, M. Subramanian, P. K. Kumaresan, B. R. Chakravarthi, B. Bharathi,
     C. N. Subalalitha, S. K. Lavanya, T. Mandl, R. Ponnusamy, V. Palanikumar, B. Manoj J,
     Overview of the Shared Task on Sentiment Analysis and Homophobia Detection of YouTube
     Comments in Code-Mixed Dravidian Languages, in: Working Notes of FIRE 2022 - Forum
     for Information Retrieval Evaluation, CEUR, 2022.
[10] A. Sampath, T. Durairaj, B. R. Chakravarthi, R. Priyadharshini, C. N. Subalalitha, K. Shan-
     mugavadivel, S. Thavareesan, S. Thangasamy, P. Krishnamurthy, A. Hande, et al., Findings
     of the shared task on Emotion Analysis in Tamil, in: Proceedings of the Second Workshop
     on Speech and Language Technologies for Dravidian Languages, 2022, pp. 279–285.
[11] K. Shanmugavadivel, S. H. Sampath, P. Nandhakumar, P. Mahalingam, M. Subramanian,
     P. K. Kumaresan, R. Priyadharshini, An analysis of machine learning models for sentiment
     analysis of Tamil code-mixed data, Computer Speech & Language (2022) 101407.
[12] R. Anita, C. Subalalitha, An approach to cluster tamil literatures using discourse connec-
     tives, in: 2019 IEEE 1st International Conference on Energy, Systems and Information
     Processing (ICESIP), IEEE, 2019, pp. 1–4.
[13] C. Subalalitha, E. Poovammal, Automatic bilingual dictionary construction for tirukural,
     Applied Artificial Intelligence 32 (2018) 558–567.
[14] B. R. Chakravarthi, V. Muralidaran, R. Priyadharshini, J. P. McCrae, Corpus cre-
     ation for sentiment analysis in code-mixed Tamil-English text, in: Proceedings of the
     1st Joint Workshop on Spoken Language Technologies for Under-resourced languages
     (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL),
     European Language Resources association, Marseille, France, 2020, pp. 202–210. URL:
     https://aclanthology.org/2020.sltu-1.28.
[15] B. R. Chakravarthi, R. Priyadharshini, T. Durairaj, J. P. McCrae, P. Buitelaar, P. Kumaresan,
     R. Ponnusamy, Overview of the shared task on homophobia and transphobia detection in
     social media comments, in: Proceedings of the Second Workshop on Language Technology
     for Equality, Diversity and Inclusion, 2022, pp. 369–377.
[16] N. Moyano, M. del Mar Sanchez-Fuentes, Homophobic bullying at schools: A systematic
     review of research, prevalence, school-related predictors and consequences, Aggression
     and violent behavior 53 (2020) 101441.
[17] S. K. Lavanya, C. N. Subalalitha, Building Tamil Text Dataset on LGBTQIA and Offensive
     Language Detection using Multilingual BERT, in: 2022 International Conference on
     Inventive Computation Technologies (ICICT), IEEE, 2022, pp. 489–496.
[18] M. Subramanian, R. Ponnusamy, S. Benhur, K. Shanmugavadivel, A. Ganesan, D. Ravi, G. K.
     Shanmugasundaram, R. Priyadharshini, B. R. Chakravarthi, Offensive language detection
     in Tamil YouTube comments by adapters and cross-domain knowledge transfer, Computer
     Speech & Language 76 (2022) 101404.
[19] S. Bird, Nltk: the natural language toolkit, in: Proceedings of the COLING/ACL 2006
     Interactive Presentation Sessions, 2006, pp. 69–72.
[20] B. R. Chakravarthi, Hope speech detection in youtube comments, Social Network Analysis
     and Mining 12 (2022) 1–19.
[21] B. R. Chakravarthi, Multilingual hope speech detection in english and dravidian languages,
     International Journal of Data Science and Analytics 14 (2022) 389–406.
[22] B. R. Chakravarthi, A. Hande, R. Ponnusamy, P. K. Kumaresan, R. Priyadharshini, How
     can we detect homophobia and transphobia? experiments in a multilingual code-mixed
     setting for social media governance, International Journal of Information Management
     Data Insights 2 (2022) 100119.
[23] S. Saumya, V. Jha, S. Biradar, Sentiment and Homophobia Detection on YouTube using
     Ensemble Machine Learning Techniques, in: Working Notes of FIRE 2022 - Forum for
     Information Retrieval Evaluation, CEUR, 2022.
[24] J. Varsha, B. Bharathi, A. Meenakshi, Sentiment Analysis and Homophobia detection of
     YouTube comments in Code-Mixed Dravidian Languages using machine learning and
     transformer models, in: Working Notes of FIRE 2022 - Forum for Information Retrieval
     Evaluation, CEUR, 2022.
[25] F. Nilsson, S. S. Al-Azzawi, G. Kovács, Leveraging Sentiment Data for the Detection of
     Homophobic/Transphobic Content in a Multi-Task, Multi-Lingual Setting Using Trans-
     formers, in: Working Notes of FIRE 2022 - Forum for Information Retrieval Evaluation,
     CEUR, 2022.
[26] S. Venkatesan, S. Donepudi, P. P, T. Durairaj, Homophobia and Transphobia Detection of
     Youtube Comments in Code-Mixed Dravidian Languages using Deep learning, in: Working
     Notes of FIRE 2022 - Forum for Information Retrieval Evaluation, CEUR, 2022.
[27] D. Manikandan, M. Subramanian, K. Shanmugavadivel, A System For Detecting Abusive
     Contents Against LGBT Community Using Deep Learning Based Transformer Models, in:
     Working Notes of FIRE 2022 - Forum for Information Retrieval Evaluation, CEUR, 2022.
[28] M. B. J, C. Hs, A Study on Sentimental Analysis, Homophobia-Transphobia Detection for
     Dravidian Languages, in: Working Notes of FIRE 2022 - Forum for Information Retrieval
     Evaluation, CEUR, 2022.
[29] S. K. Lavanya, F. N. Muhammad, Sentiment Analysis of YouTube comments in Dravidian
     Code-Mixed Language using Deep Neural Network, in: Working Notes of FIRE 2022 -
     Forum for Information Retrieval Evaluation, CEUR, 2022.
[30] S. Chanda, A. Mishra, S. Pal, Sentiment Analysis and Homophobia detection of Code-
     Mixed Dravidian Languages leveraging pre-trained model and word-level language tag, in:
     Working Notes of FIRE 2022 - Forum for Information Retrieval Evaluation, CEUR, 2022.
[31] S. K. Lavanya, S. Sivaprasath, Homophobia, Transphobia Detection in Tamil, Malayalam,
     English Languages using Logistic Regression and Code-Mixed Data using AWD_LSTM, in:
     Working Notes of FIRE 2022 - Forum for Information Retrieval Evaluation, CEUR, 2022.
[32] S. K. Lavanya, A. A. Samuel, A Sequential DNN for Sentiment Analysis of Dravidian
     Code-Mixed Language Comments on YouTube, in: Working Notes of FIRE 2022 - Forum
     for Information Retrieval Evaluation, CEUR, 2022.
[33] A. ”Hegde, H. Shashirekha, Leveraging Dynamic Meta Embedding for Sentiment Analysis
     and Detection of Homophobic/Transphobic Content in Code-mixed Dravidian Languages,
     in: Working Notes of FIRE 2022 - Forum for Information Retrieval Evaluation, CEUR, 2022.
[34] S. Ugursandi, A. Kumar M, Sentiment Analysis and Homophobia detection of YouTube
     comments, in: Working Notes of FIRE 2022 - Forum for Information Retrieval Evaluation,
     CEUR, 2022.
[35] R. Sakuntharaj, S. Mahesan, A novel hybrid approach to detect and correct spelling in
     tamil text, in: 2016 IEEE international conference on information and automation for
     sustainability (ICIAfS), IEEE, 2016, pp. 1–6.
[36] R. Sakuntharaj, S. Mahesan, Use of a novel hash-table for speeding-up suggestions for
     misspelt tamil words, in: 2017 IEEE international conference on industrial and information
     systems (ICIIS), IEEE, 2017, pp. 1–5.
[37] H. Visuwalingam, R. Sakuntharaj, R. G. Ragel, Part of speech tagging for tamil language
     using deep learning, in: 2021 IEEE 16th International Conference on Industrial and
     Information Systems (ICIIS), IEEE, 2021, pp. 157–161.