<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrieval Evaluation, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>COMPARATIVE ANALYSIS FOR OFFENSIVE LANGUAGE IDENTIFICATION OF TAMIL TEXT USING SVM AND LOGISTIC CLASSIFIER</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Prabhu Ram. N</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agalya T</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meeradevi.T</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vibin Mammen Vinod</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gothainayaki.A</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anusha S</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Electronics and Communication Engineering, Kongu Engineering College</institution>
          ,
          <addr-line>Erode, TamilNadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>1</volume>
      <fpage>3</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>Social media like Twitter, Facebook, YouTube provide an opportunity of the fastest communication between people. The social media texts are largely filled with code-mixed comments/post and reactions and its content may be filled with ofensive language or non-ofensive language. It is necessary to classify the YouTube comments/post and reactions as ofensive label and non-ofensive label. As the ofensive comments/post is very sensational to something or someone to react in the society, Government has responsibility to identify it in the social media, before it reaches a larger audience. In India, multi-lingual practices use code mixed comments/post in social media, which leads to dificulty in offensive text classification automatically. The Dravidian code mixed data set is used to train the machine learning model to classify the label as ofensive language or non-ofensive language. The text data set is transformed into numerical data based on relative occurrence in the available datasets of training and testing using TFIDF method. However, the imbalanced dataset may be biased to a particular class of label, and hence it is turned into balanced dataset using SMOTE method. It is trained on SVM classifier and Logistic Classifier. The F1 score is analsyed and it is observed that balanced dataset predictions are better than unbalanced dataset predictions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multilingual</kwd>
        <kwd>SMOTE</kwd>
        <kwd>TFIDF</kwd>
        <kwd>SVM</kwd>
        <kwd>Logistic classifier</kwd>
        <kwd>NLP</kwd>
        <kwd>Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the modern era there are 3.78 billion social media users worldwide in 2021. The social media
makes communication easier and faster over the world and connecting everyone together. The
social media like Facebook, YouTube, Twitter gave us freedom to express opinion in public.
It may allow some bad actors in spreading fake news and ofensive content. The ofensive
language in the social platform is one of the most dangerous activities. So people have to
be protected themselves from these hateful activities in social media. The main challenges
in the social media is to identify ofensive text content and deleting the problematic posts.
Research based on safety and security in social media has grown substantially in the last decade.
In many countries like United Kingdom, Canada, France, these activities are punishable[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Social networks have introduced policies to restrict the ofensive speech on people based on
racism, gender etc. A fine-drawn hate speech in sentences can be considered as hate or not
hate depending upon the person who interprets. The social media texts are represented with
multilingual text and code-mix text. The phenomenon of mixing the second language into
the first language or mixing the foreign languages into the native language structure is said
to be code mix. Such that, Tamil words are written in English script. Multilingual text is the
combination of multiple native language in single sentence. Such that Tamil and English words
were written in their native script in single sentence. The technique to identify the solution
to this problem by NLP(Natural Language Processing). NLP is a field of artificial intelligence,
which has an ability to understand, analyse the context of the human language.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>Hate speech identification through sentiment analysis is one of the current research fields in
Natural Language Processing. The solution is given by either machine learning approach or
lexicon based approach. The machine learning approach involves collecting an annotated data,
pre-processing the collected text data, transformation into machine learning input vector by
vectorisation technique and trained to classify using machine learning model. Lexicon based
approach is widely used in sentiment analysis, where the sentiment are collected from WordNet,
SentiwordNet and are used for classification. In lexicon based approach, there is no necessity
for labelling which is a time consuming process.</p>
      <p>
        Hate speech identification on monolingual english dataset[
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] and code-mix dataset for
Tamil and Malayalam scripts, the features extraction is executed by various methods like
Hash Vectorizer[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Count Vectoriser, TFIDF(Term Frequency Inverse Document Frequency)[
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6">3,
4, 5, 6</xref>
        ] and Word Embedding, customized word embedding, CBOG, Skip-gram, word2vec,
doc2vec, fastText[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. TFIDF vectorizer, Count vectorizer are most commonly used vectorisation
algorithms which are not neural network based transformation. However, TFIDF performs well
on smaller vocabulary size, but more features are recorded on larger dataset, by modifying
IDF(Inverse Document Frequency) feature size with minimum computation time[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Neural
network based vectorization methods such as word2vec, doc2vec, fastText are used on code-mix
dataset. In which fastText vectorization performs better than other neural network based
vectorization methods[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The neural network based classification architectures like sub-word
level LSTM model, Hierarchical LSTM model, BERT, XLM-RoBERT, LSTM, GRU, XLNet[
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10,
11, 12</xref>
        ] were used. Some machine learning based classification models such as Support Vector
Machine(SVM), Logistic Regression (LR), Random Forest Classifier (RFC)[
        <xref ref-type="bibr" rid="ref13 ref3 ref4">3, 4, 13, 14, 15, 16</xref>
        ]
and K-Nearest Neighbour (KNN)[17] are used. SVM model performs better for code-mix tamil
dataset than other machine learning models. Deep learning models such as RNN[
        <xref ref-type="bibr" rid="ref11">11, 18</xref>
        ],MLP
are also used for classification[ 19, 20] for enhancement in prediction of classification. The
evaluation of predictive model by accuracy, f1-score, precision, recall[14, 15, 17]. Hate speech
identification of code mix data, trained model has reduced prediction accuracy due to imbalanced
dataset. Section 3 describes the methodology, Section 4 describes about experimental setup for
training model of SVM and logistic classifier in diferent configurations of hyper-parameter. The
conversion of imbalanced dataset into balanced dataset using SMOTE method is also described
in Section 4. Section 5 describes about the results and discussion. Section 6 describes about the
conclusion.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Text Pre-processing</title>
        <p>The flow of methodology have been described in detail in the following sub sections.
Preprocessing involves the removal of special characters such as reaction smiles, punctuation
using standard package. The number of vocabularies gets reduced after removal of special
characters. In English language, conversion of token of words into its equivalent base form of
word by stemming and lemmitization is done. However, in Dravidian language, such processes
are not possible. The stream of text data is converted into token of word as unigram word,
bigram word, n-gram words as a token by the process called as tokenisation.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Vectorisation</title>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Training Model</title>
        <p>The text after pre-processing is vectorised. The vectorisation method include TFIDF(Term
Frequency Inverse Document Frequency) used to represent the text data into its equivalent
numerical data. TFIDF adds weightage to unique words in the document.</p>
        <p>The logistic regression and SVM model are trained by tri-gram based TFIDF vectorization of
training dataset. The aim of the task is to classify the text as ofensive or not-ofensive class.
Logistic regression(LR) and Support Vector Machine (SVM) are supervised machine learning
algorithms used for classification and regression and they are best suited for binary classification.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Making Balanced dataset</title>
        <p>
          The dataset may be balanced or imbalanced dataset. The balanced dataset contains equal number
of labels as ofensive labels and not-ofensive labels. The imbalanced dataset is the one which
has either one of the labels high. The dataset with 1153 ofensive and 4724 not ofensive is
an example of imbalanced dataset. This imbalancing in the dataset may lead to fit the model
on majority class which may give lower prediction results. There are some methods to make
imbalanced dataset to balanced dataset. They are:
• Oversampling
• Undersampling
• SMOTE(Synthetic Minority Oversampling Technique)
9: Xinew ← Xi +  × (Xi −
10:  ←  − 1
11:  ←  + 1
12: end while
13: return 
14: end procedure
6:
7:
8:
Algorithm 1 : SMOTE’s algorithm
1: procedure SMOTE(, )
2: [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
3:
4:
5:
ℎ ←
 ← 4
 ← ()
 ← () − ()
minority classes
 ← (0, 1) ◁ Scalar multiplicative value
while  ≤  and  ∈y and min ̸= 0 do
        </p>
        <p>Xinn ← [[1 ], [2 ], ..., [ℎ ]] ◁  Nearest neighbour sample
◁ SMOTE of X data array and y target array
◁ Number of nearest neighbors
◁ Number of cores on execution
◁ Number of input samples
◁ Number of majority and
Xinn)
◁ ,  ∈y
◁ Augmented Data</p>
        <p>Oversampling methods is duplicating actual minority data from the dataset. The
undersampling method is removal of actual majority data from the data set.These approaches does not add
any new information to the dataset.SMOTE is the process of synthetically generating features
of minority class[21, 22, 23]. Based on Algorithm 1 the balanced dataset is generated.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Evaluating Model</title>
        <p>The trained model is to be evaluated with the test data set. The metrics used to evaluate the
model are accuracy, f1-score, precision and recall. The accuracy of the model alone is insuficient
to evaluate as best fitted model. This is due to model may be biased to certain classes which can
be identified using f1-score metrics.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setup</title>
      <p>The dataset given in HASOC-Dravidian CodeMix FIRE 2021 [24] for the task of detection of
ofensive language is split into training samples and testing samples and is described in Table 1.
In Table 2 is the description of number of known vocabulary from training set and unknown
vocabulary in cross validation dataset and test dataset with respect to the known vocabulary
from training samples. The datasets are labelled as ofensive label, not-ofensive label and
not-tamil label. The occurrence of "not-tamil" label in the given dataset is minimum in count,
so the samples of "not-tamil" labels are dropped in text pre-processing stage. 30% of training
samples is treated as the cross validation data sets. Since samples are imbalanced,it is necessary
to make them as a balanced training samples using SMOTE method. The imblearn package
from python is used to perform SMOTE[21].</p>
      <p>The training samples has been trained by logistic classifier and SVM classifier with certain
parameters.The Logistic classifier and SVM models can be trained using open source python
Not Ofensive Class
Ofensive Class
Not Tamil
Average Length of sentence
Maximum Length of sentence</p>
      <p>Minimum Length of sentence
package such as sklearn. The parameter value setting in the logistic classifier and SVM classifier
are tabulated in Table 3. The parameter C is termed as inverse of regularization strength. If the
value of C is larger, SVM classifier minimises the number of misclassified samples and their by
making smaller margin of decision boundary.1</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussions</title>
      <p>The logistic trained model and SVM classifier model is evaluated using labelled test samples by
accuracy, f1-score of average weighted by support, precision and recall metrics in the Table 4
and Table 5. The macro average metrics are calculated for each class and is used to determine
the average of it without considering imbalanced classes into account. The weighted average
1https://github.com/GothainayakiA/Hatesppech.git
Sample count in Testing set
metrics which calculate the average weight of number of true instance for each class.
In the Table 4, f1-score of ofensive class has been improved to 0.253 and overall weighted
average f1-score of balanced dataset by SMOTE is increased from 73.8% to 76.1% in logistic
classifier. Similarly, in SVM classifier as shown in Table 5, f1-score of ofensive class has been
improved to 0.23 and overall weighted average f1-score of balanced dataset by SMOTE is
increased from 74.9% to 77.1%. The number of unknown vocabulary described in Table 2 is
maximum in cross validation set and testing set as compared to training dataset. This leads to
misclassification and reaches an average accuracy.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The task of identifying ofensive language for the dataset given in HASOC-Dravidian CodeMix
FIRE 2021[24] is performed by using TFIDF Vectorisation methods and trained on logistic
classifier model and SVM classifier model. It is observed that the models are trained with
imbalanced samples provides biased predictions to one specific class. Hence, to improve the
level of biased prediction to certain class, the oversampling technique is used to generate new
labelled dataset from the existing dataset. The generated balanced dataset is trained on logistic
classifier and SVM classifier. It is concluded that there is an improvement in average weighted
f1-score prediction by 2.3% and 2.2% with logistic classifier model and SVM classifier model
respectively. However, the occurrence of unknown vocabularies in the cross validation and
test set is possible, contextual based word representation to the unknown vocabulary may be
applied. In future SMOTE can be performed for pre-trained models like word2vec,fastText and
also custom trained model of word vectorisation and the model to be trained using sequential
neural network like RNN,LSTM,GRU.
planning algorithm for ofline navigation using SVM classifier, International Journal of
Scientific and Technology Research 9 (2020) 2082–2086.
[14] A. Hande, R. Priyadharshini, B. R. Chakravarthi, KanCMD: Kannada CodeMixed Dataset
for Sentiment Analysis and Ofensive Language Detection, Proceedings of the Third
Workshop on Computational Modeling of People’s Opinions, Personality, and Emotion’s
in Social Media (2020) 54–63. URL: https://www.aclweb.org/anthology/2020.peoples-1.6.
[15] B. R. Chakravarthi, N. Jose, S. Suryawanshi, E. Sherly, J. P. McCrae, A sentiment analysis
dataset for code-mixed Malayalam-English (2020) 177–184. URL: https://www.aclweb.org/
anthology/2020.sltu-1.25.
[16] N. P. Ram, K. Sandhiya, V. M. Vinod, V. Mekala, Ofline navigation: Gps based assisting
system in sathuragiri forests using machine learning, in: 2018 International Conference on
Intelligent Computing and Communication for Smart World (I2C2SW), 2018, pp. 326–331.
doi:10.1109/I2C2SW45816.2018.8997523.
[17] B. R. Chakravarthi, V. Muralidaran, R. Priyadharshini, J. P. McCrae, Corpus Creation
for Sentiment Analysis in Code-Mixed Tamil-English Text (2020) 202–210. URL: http:
//arxiv.org/abs/2006.00206. doi:10.5281/zenodo.4015253. arXiv:2006.00206.
[18] B. R. Chakravarthi, R. Priyadharshini, V. Muralidaran, S. Suryawanshi, N. Jose, E. Sherly,
J. P. McCrae, Overview of the track on Sentiment Analysis for Dravidian Languages
in Code-Mixed Text, ACM International Conference Proceeding Series (2020) 21–24.
doi:10.1145/3441501.3441515.
[19] Z. Al-Makhadmeh, A. Tolba, Automatic hate speech detection using killer natural language
processing optimizing ensemble deep learning approach, Computing 102 (2020) 501–522.</p>
      <p>URL: https://doi.org/10.1007/s00607-019-00745-0. doi:10.1007/s00607-019-00745-0.
[20] A. Al-Hassan, H. Al-Dossari, Detection of Hate Speech in Social Networks: a Survey on</p>
      <p>Multilingual Corpus (2019) 83–100. doi:10.5121/csit.2019.90208.
[21] G. Lemaître, F. Nogueira, C. K. Aridas, Imbalanced-learn: A python toolbox to tackle the
curse of imbalanced datasets in machine learning, Journal of Machine Learning Research
18 (2017) 1–5. URL: http://jmlr.org/papers/v18/16-365.html.
[22] G. Douzas, F. Bacao, Geometric SMOTE a geometrically enhanced drop-in replacement
for SMOTE, Information Sciences 501 (2019) 118–135. URL: https://doi.org/10.1016/j.ins.
2019.06.007. doi:10.1016/j.ins.2019.06.007.
[23] S. Kiyohara, T. Miyata, T. Mizoguchi, Prediction of grain boundary structure and energy
by machine learning 18 (2015) 1–5. URL: http://arxiv.org/abs/1512.03502. doi:10.1126/
sciadv.1600746. arXiv:1512.03502.
[24] B. R. Chakravarthi, P. K. Kumaresan, R. Sakuntharaj, A. K. Madasamy, S. Thavareesan,
P. B, S. Chinnaudayar Navaneethakrishnan, J. P. McCrae, T. Mandl, Overview of the
HASOC-DravidianCodeMix Shared Task on Ofensive Language Detection in Tamil and
Malayalam, in: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation,
CEUR, 2021.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Davidson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warmsley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Macy</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Weber</surname>
          </string-name>
          ,
          <article-title>Automated hate speech detection and the problem of ofensive language</article-title>
          ,
          <source>in: Proceedings of the International AAAI Conference on Web and Social Media</source>
          , volume
          <volume>11</volume>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kumaraguru</surname>
          </string-name>
          ,
          <article-title>Automating fake news detection system using multilevel voting model</article-title>
          ,
          <source>Soft Computing</source>
          <volume>24</volume>
          (
          <year>2020</year>
          )
          <fpage>9049</fpage>
          -
          <lpage>9069</lpage>
          . URL: https://doi.org/10.1007/ s00500-019-04436-y. doi:
          <volume>10</volume>
          .1007/s00500-019-04436-y.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Muneer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Fati</surname>
          </string-name>
          ,
          <article-title>A comparative analysis of machine learning techniques for cyberbullying detection on twitter</article-title>
          ,
          <source>Future Internet</source>
          <volume>12</volume>
          (
          <year>2020</year>
          ). URL: https://www.mdpi.com/ 1999-5903/12/11/187. doi:
          <volume>10</volume>
          .3390/fi12110187.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Pathak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mundada</surname>
          </string-name>
          , T. Joshi,
          <article-title>KBCNMUJAL@HASOC-DravidianCodeMixFIRE2020: Using machine learning for detection of hate speech and ofensive code-mixed social media text</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          <volume>2826</volume>
          (
          <year>2020</year>
          )
          <fpage>351</fpage>
          -
          <lpage>361</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muralidaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Corpus creation for sentiment analysis in code-mixed Tamil-English text (</article-title>
          <year>2020</year>
          )
          <fpage>202</fpage>
          -
          <lpage>210</lpage>
          . URL: https://www. aclweb.org/anthology/2020.sltu-
          <volume>1</volume>
          .
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Swaminathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. K.</given-names>
            <surname>Ganesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pandiyarajan</surname>
          </string-name>
          , HRS-TECHIE@
          <article-title>Dravidian-CodeMix and HASOC-FIRE2020: Sentiment analysis and hate speech identification using machine learning, deep learning and ensemble models</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          <volume>2826</volume>
          (
          <year>2020</year>
          )
          <fpage>241</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Mandalam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <source>Sentiment Analysis of Dravidian Code Mixed Data, Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages</source>
          (
          <year>2021</year>
          )
          <fpage>46</fpage>
          -
          <lpage>54</lpage>
          . URL: https://www.aclweb.org/anthology/2021.dravidianlangtech-
          <volume>1</volume>
          .6.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Manochandar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Punniyamoorthy</surname>
          </string-name>
          ,
          <article-title>Scaling feature selection method for enhancing the classification performance of Support Vector Machines in text mining</article-title>
          ,
          <source>Computers and Industrial Engineering</source>
          <volume>124</volume>
          (
          <year>2018</year>
          )
          <fpage>139</fpage>
          -
          <lpage>156</lpage>
          . URL: https://doi.org/10.1016/j.cie.
          <year>2018</year>
          .
          <volume>07</volume>
          .008. doi:
          <volume>10</volume>
          .1016/j.cie.
          <year>2018</year>
          .
          <volume>07</volume>
          .008.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sreelakshmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Premjith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Soman</surname>
          </string-name>
          ,
          <article-title>Detection of Hate Speech Text in Hindi-English Code-mixed Data</article-title>
          ,
          <source>Procedia Computer Science</source>
          <volume>171</volume>
          (
          <year>2020</year>
          )
          <fpage>737</fpage>
          -
          <lpage>744</lpage>
          . URL: https://doi.org/10. 1016/j.procs.
          <year>2020</year>
          .
          <volume>04</volume>
          .080. doi:
          <volume>10</volume>
          .1016/j.procs.
          <year>2020</year>
          .
          <volume>04</volume>
          .080.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T. Y.</given-names>
            <surname>Santosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. V.</given-names>
            <surname>Aravind</surname>
          </string-name>
          ,
          <article-title>Hate speech detection in Hindi-English code-mixed social media text</article-title>
          , ACM International Conference Proceeding Series (
          <year>2019</year>
          )
          <fpage>310</fpage>
          -
          <lpage>313</lpage>
          . doi:
          <volume>10</volume>
          . 1145/3297001.3297048.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jose</surname>
          </string-name>
          , A. K. M, T. Mandl,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Hariharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Mccrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sherly</surname>
          </string-name>
          ,
          <article-title>Findings of the Shared Task on Ofensive Language Identification in Tamil , Malayalam ,</article-title>
          and
          <string-name>
            <surname>Kannada</surname>
          </string-name>
          (
          <year>2021</year>
          )
          <fpage>133</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jayapal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thavareesan</surname>
          </string-name>
          ,
          <article-title>Nuig-shubhanker@dravidian-codemix-fire2020: Sentiment analysis of code-mixed dravidian text using xlnet</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N. P.</given-names>
            <surname>Ram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Vinod</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mekala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Manimegalai</surname>
          </string-name>
          ,
          <article-title>A fast and energy eficient path</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>