<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrieval Evaluation, December</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Offensive Language in Bengali, Bodo, and Assamese using Word Unigrams, Char N-grams, Classical Machine Learning, and Deep Learning Methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Avigail Stekel</string-name>
          <email>Stekel@g.jct.ac.il</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Avital Prives</string-name>
          <email>avitalprives@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yaakov HaCohen-Kerner</string-name>
          <email>kerner@jct.ac.il</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, Jerusalem College of Technology</institution>
          ,
          <addr-line>Jerusalem 9116001</addr-line>
          ,
          <country country="IL">Israel</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <fpage>5</fpage>
      <lpage>18</lpage>
      <abstract>
        <p>In this paper, we, the JCT team, describe our submissions for the HASOC 2023 track. We participated in task 4, which addresses the problem of hate speech and offensive language identification in three languages: Bengali, Bodo, and Assamese. We developed different models using five classical supervised machine learning methods: multinomial Naive Bayes )MNB(, support vector classifier, random forest, logistic regression (LR), and multi-layer perceptron. Our models were applied to word unigrams and/or character n-gram features. In addition, we applied two versions of relevant deep learning models. Our best model for the Assamese language is an MNB model with 5-gram features, which achieves a macro averaged F1-score of 0.6988. Our best model for Bengali is an MNB model with 6-gram features, which achieves a macro averaged F1-score of 0.66497. Our best submission for Bodo is a LR with all word unigrams in the training set. This model obtained a macro averaged F1-score of 0.85074. It was ranked in the shared 2nd3rd place out of 20 teams. Our result is lower by only 0.00576 than the result of the team that was ranked in the 1st place. Our GitHub repository link is avigailst/co2023 (github.com). Char n-grams, hate speech, offensive language, supervised machine learning, word unigrams Proceedings</p>
      </abstract>
      <kwd-group>
        <kwd>Classical</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        "Offensive language" lacks a universally agreed-upon definition. In the study of Jay and Janschewitz
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], offensive language is characterized as encompassing vulgar, pornographic, and hateful expressions.
Xu and Zhu [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] observed that the interpretation of offensive language is subjective, as individuals can
perceive the same content differently. Xu and Zhu adopted the Internet Content Rating Association's
(ICRA) description of offensive language, categorizing it as text containing profanity, sexually explicit
material, racism, graphic violence, or any content that might be deemed offensive based on social,
religious, cultural, or moral standards. Another widely accepted interpretation of offensive language is
any explicit or implicit form of attack or insult directed at an individual or group.
      </p>
      <p>The prevalent use of offensive language constitutes a significant challenge within online communities
and among their users. Instances of offensive language proliferate rapidly across social networks like
Twitter, Facebook, and blog posts. This trend detrimentally impacts the credibility of these online
communities, hindering their expansion and causing user detachment.</p>
      <p>Distinguishing between offensive language and hate speech in contrast to non-offensive language
and non-hate speech is a complex endeavor due to several factors. First, hate speech does not always rely
on offensive slurs, and offensive language does not consistently convey hatred. Second, there exists a
wide array of implicit and explicit methods to verbally target individuals or groups. Third, the brevity of</p>
      <p>2023 Copyright for this paper by its authors.
CEUR</p>
      <p>ceur-ws.org
certain tweets adds to the challenge. Finally, the presence of incoherent tweets further complicates
matters.</p>
      <p>
        A recent outcome arising from addressing this challenge has been the establishment of several
competitions focused on identifying various forms of offensive language across diverse languages,
including but not limited to English, German, Hindi, Tamil, Marathi, and Malayalam. Notable instances
of these contests include HASOC 2019 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], HASOC 2020 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], HASOC 2021, HASOC 2022,
SemEval2019 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and SemEval-2020 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Within these tournaments, leveraging natural language processing
(NLP) and machine learning (ML) models to detect offensive language has demonstrated its
effectiveness.
      </p>
      <p>Particularly vulnerable user segments, such as the elderly, children, youth, women, and certain
minority groups, are exposed to various risks stemming from encountering offensive content. These risks
encompass emotions like fear, panic, and animosity directed at specific individuals or communities,
potentially resulting in adverse effects on their mental and physical well-being.</p>
      <p>The rationale behind researching the detection of offensive language is quite evident. A clear need
exists for top-tier systems capable of identifying offensive language posts, curbing their dissemination,
and alerting appropriate authorities. The implementation of such systems stands to enhance the
safeguarding and security of individuals, particularly in contexts closely tied to their physical and mental
health.</p>
      <p>The structure of the rest of the paper is as follows. Section 2 introduces the general background
concerning offensive language. Section 3 describes the HASOC 2023 Subtask 4. In Section 4, we present
the applied models and their experimental results. Section 5 summarizes, concludes, and suggests ideas
for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        According to the United Nations (UN) definition [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], hate speech is "any type of communication in
speech, writing or behavior that attacks or uses derogatory or discriminatory language in reference to a
person or group on the basis of who they are, in other words, on the basis of their religion, ethnicity,
nationality, race, color, origin, gender or other identity factor." Some studies [
        <xref ref-type="bibr" rid="ref8 ref9">8-9</xref>
        ] characterized hate
speech as messages marked by hostility and aggression, often referred to as flames. In more recent studies
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10-12</xref>
        ], there has been a shift toward using the term "cyberbullying" to describe these harmful online
behaviors. Nevertheless, within the Natural Language Processing (NLP) community, a range of terms is
employed to encompass the realm of hate speech, including discrimination, flaming, abusive language,
profanity, toxic discourse, or derogatory comments [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. These various terms collectively encompass the
multifaceted nature of offensive and harmful speech in the digital sphere.
      </p>
      <p>
        Most of the studies in the field of hate and offensive speech recognition have primarily centered on
widely spoken languages, such as English, while the challenges posed by less-represented languages,
including Assamese, Bodo, and Bengali, have garnered increased attention Notable studies have delved
into these challenges by examining the nuances of identifying hate speech and offensive content in these
languages. For instance, Ishmam et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] introduced a ML-based model, as well as Gated Recurrent
Unit (GRU), based deep neural network model for classifying users' comments on Facebook pages in the
Bengali language. Baruah et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] suggested multinomial naive Bayes (MNB) and support vector
machine (SVM) with various word embedding and n-gram models as classification algorithms to detect
an offensive language in Assamese text. These investigations serve as pioneering efforts in developing
culturally sensitive solutions for detecting hate and offensive speech across linguistically diverse
landscapes.
      </p>
      <p>
        HaCohen-Kerner and his students have experience from previous workshops that dealt with offensive
language detection [
        <xref ref-type="bibr" rid="ref16 ref17 ref18 ref19">16-19</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task Description</title>
      <p>The Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages
(HASOC) 2023 track includes four tasks. We took part in Task 4, which aims to detect hate speech in the
Bengali, Bodo, and Assamese languages. It is a binary classification task. Each dataset (for the three
languages) consists of a list of sentences with their corresponding class: hate or offensive (HOF) or not
hate (NOT). Data is primarily collected from Twitter, Facebook, or YouTube comments. The macro
averaged F1-score is the result measure of this task.</p>
      <p>
        The overview of the HASOC Sub-track at FIRE 2021 is described in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Additional information
about Subtask 4 in Assamese, Bengali, and Bodo is described in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. The HASOC 2023 train and test
datasets for Bengali, Bodo, and Assamese are located at [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Applied Models and Their Experimental Results</title>
      <p>We used the given training and test datasets (see the end of the previous section). Due to time
limitations, (We joined the competition late), we did not apply any preprocessing methods. We applied
five classical supervised ML methods: Multinomial Naive Bayes (MNB), Random Forest (RF), Support
Vector Classifier (SVC), Multi-Layer Perceptron (MLP), and Logistic Regression (LR) using classical
features such as word unigrams and char n-gram features and features.</p>
      <p>MNB is a statistical ML algorithm based on the Bayes theorem (Kim et al., 2006). MNB assumes that
the features (i.e., attributes) are conditionally independent given the target class, and ignores all
dependencies among features. MNB estimates the probabilities of each class and the probabilities of each
feature given the class and uses these probabilities to make predictions.</p>
      <p>
        RF is an ensemble learning method for classification and regression [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Ensemble methods use
multiple learning algorithms to obtain improved predictive performance compared to what can be
obtained from any of the constituent learning algorithms. RF operates by constructing a multitude of
decision trees at training time and outputting classification for the case at hand. RF combines Breiman’s
“bagging” (Bootstrap aggregating) idea [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and a random selection of features introduced by Ho [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] to
construct a forest of decision trees.
      </p>
      <p>
        SVC is a variant of the support vector machine (SVM) ML method [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] implemented in SciKit-Learn.
SVC uses LibSVM [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], which is a fast implementation of the SVM method. SVM is a supervised ML
method that classifies vectors in a feature space into one of two sets, given training data. It operates by
constructing the optimal hyperplane dividing the two sets, either in the original feature space or in higher
dimensional kernel space.
      </p>
      <p>
        MLP is a deep, artificial neural network [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. This model is based on a network of computational
units, called perceptron, interconnected in a feed-forward way. The network is composed of layers of
perceptron where each one has directed connections to the neurons of the subsequent layer. Usually, these
units apply a sigmoid function, called the activation function, on the input they get and feed the next layer
with the output of the function. This model is very useful especially when the data is not linearly
separable.
      </p>
      <p>
        LR [
        <xref ref-type="bibr" rid="ref29 ref30">29-30</xref>
        ] is a linear classification model. It is known also as maximum entropy regression (MaxEnt),
logit regression, and the log-linear classifier. In this model, the probabilities describing the possible
outcome of a single trial are modeled using a logistic function. Generally, a sigmoid function is used as
a predictive function. LR can be used both for binary classification and multi-class classification.
      </p>
      <p>
        BERT [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] (Bidirectional Encoder Representations from Transformers) is a transformer-based model
that was trained on a massive corpus of text data, allowing it to learn rich representations of the
relationships between words and their meaning. These representations can be fine-tuned for specific NLP
tasks, e.g., TC, by tokenizing the text and converting it to numerical representations using pre-trained
tokenizers. These representations are fed into the pre-trained BERT model to obtain contextualized
representations of the input text (Chi et al., 2019). These representations can be thought of as a
fixedlength vector, which is then passed through a fully connected neural network (NN) for classification. One
key advantage of using BERT for TC is that it can handle contextual information effectively.
      </p>
      <p>
        BanglaBERT [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] is a language model designed to understand and process the Bengali language, also
known as Bangla. It's part of the BERT (Bidirectional Encoder Representations from Transformers)
family of models that have proven effective in various natural language processing tasks. The purpose of
BanglaBERT [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] is to facilitate various language-related tasks in Bengali, even in scenarios where
there's limited training data available (low-resource settings). By pre-training on a vast corpus of Bengali
text, BanglaBERT learns to represent the nuances of the language and can be fine-tuned for specific tasks
such as text classification, sentiment analysis, and more. This enables more effective natural language
understanding and processing for the Bengali language.
      </p>
      <p>The system architecture we used is described in Figure 1. This figure shows the procedure we
performed on the input sentence and the use of the algorithm mentioned before.</p>
      <p>
        The applied ML methods used the following tools and information sources:
● The Python 3.8 programming language [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ].
● Sklearn – a Python library for ML methods [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ].
● Numpy – a Python library that provides fast algebraic calculous processing [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ].
● Pandas – a Python library for data analysis. It provides data structures for efficiently storing large
datasets and tools for working with them [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ].
● Pytorch - open-source ML framework for building, training, and deploying neural network models.
      </p>
      <p>In our experiments, we test dozens of TC models for each language. We applied the models on the
given training set. During the experiments, we checked what happens when we use all the existing words,
and also what happens when we take only common words that appear in at least two or three documents.</p>
      <p>Tables 1-3 present the F-Measure results of our baseline models for Bengali, Bodo, and Assamese,
respectively. As mentioned above, we applied five different supervised ML methods: multinomial
Naive Bayes, support vector classifier, random forest, logistic regression, and multi-layer perceptron
using their default values. For these baseline models, we use only word unigrams that occur in at least 2
documents in the training set.</p>
      <p>In our initial experiments, we randomly split each tournament train dataset into two sub-sets: train
sub-set (80% of the original train sub-set) and test sub-set (20% of the original train sub-set).
In the train sub-set: in the Assamese language, 27,570 words appear in three tweets or more, in the Bengali
language 1,648 words appear in three tweets or more, and in the Bodo language 1,066 words appear in
three tweets or more.</p>
      <p>In Tables 1-3, we present the best baseline results in Bengali, Bodo, and Assamese respectively. The best
baseline result in each table is highlighted in bold font.</p>
      <p>We ran the baseline models on different numbers of words, and reached the results described in the
above tables, some are better, and some are less. In the Bodo language, using LR with 1,000 word
unigrams2 we reached an F-Measure of 0.775795. In the other languages, the results were lower.</p>
      <p>Later we applied character n-gram series for n values between 3 and 7. We also ran combinations of
different sizes of BOWs with different character n-gram series, which caused an increase in F1 for the
Assamese and Bengali languages and reached them, using a combination of BOW and character n-grams,
to F-Measure of 0.6988 and 0.66497, respectively.</p>
      <p>We also applied two types of Bert models: all-language Bert, which is a general Bert model that is not
adapted to a specific language, and a Bengali Bert model, also called Bert2, which is a Bert model adapted
to the Bengali language. In the Assamese language, we reached a result of 0.66967 for running Bert and
MNB3, in the Bengali language we reached a result of 0.609 when we ran the Bert2 model, and in the
Bodo language, we reached a result of 0.73 when we ran Bert and MLP. We applied also MLP4, which
yielded a result of 0.7952 for the Bodo language and less good results for the other languages.</p>
      <p>For each language, we submitted various models including the top three models according to their
FMeasure results. Our best F-Measure results in the competition were as follows: Assamese (F-Measure =
0.6988, 10th place) using MNB with all word and character 5-gram features, Bengali (F-Measure =
0.66497, 12th place) using MNB with all word and 6-grams, and Bodo (F-Measure = 0.85074, 2nd place).
Our best submission was the model we built for offensive language identification in Bodo using LR. This
2 only words that appear in two or more documents in the training set.
3 https://www.ic.unicamp.br/~rocha/teaching/2011s1/mc906/aulas/naive-bayes.pdf
4https://www.researchgate.net/profile/FranciscoEscobar/publication/320692297_Geomatic_Approaches_for_Modeling_Land_Change_Scenarios_An_Introduction/links/5e0da50a92851c836
4ab9b63/Geomatic-Approaches-for-Modeling-Land-Change-Scenarios-An-Introduction.pdf#page=458
model was ranked in 2nd place out of 20 teams. Our result is lower by only 0.00576 than the result
(0.8565) of the team that was placed in the 1st place.</p>
      <p>
        An interesting phenomenon is that in two languages (Assamese and Bengali), the MNB method was
found to be the best among five classical learning methods and two variants of BERT. In the third
language (Bodo), LR was found as the best ML method. However, in Bodo, a number of good models
using MNB were discovered. MNB is a popular classifier for many text classification tasks, due to its
simplicity, computational efficiency, relatively good predictive performance, and trivial scaling to large-scale
tasks [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Summary, Conclusions, and Future Work</title>
      <p>In this paper, we, the JCT team, described our submitted models for subtask 4 of the HASOC 2021
competition, which addresses the problem of hate speech and offensive language identification in three
languages: Bengali, Bodo, and Assamese. We applied classical ML methods and deep learning methods:
MNB, SVC, MLP, RF, and LR. These ML methods were applied to various combinations of character
ngram features )for n values from 1 to 7) and word unigrams.</p>
      <p>Two interesting phenomena were discovered. First, while in Bodo the use of a classical learning
method like LR was enough for a high result and shared the 2nd-3rd place. Second, in the Assamese and
Bengali languages, the use of classical learning methods such as RF, LR, and SVC did not yield good
enough results, and precisely the naive MNB model produced the best results.</p>
      <p>The HOF and NOT classes are unbalanced. In the Assamese language, the HOF group is about 16%
larger than the NOT group, while in the Bengali language, the NOT group is about 19.5% larger than the
HOF group, and in the Bodo language, the HOF group is 19% larger than the NOT group. In future
research, we can apply oversampling in order to balance the classes. Oversampling is a technique used in
machine learning to balance the class distribution by increasing the frequency of the minority class in the
training dataset.</p>
      <p>Additional ideas for future research are: (1) parameter tuning, also known as hyperparameter tuning,
which is the process of finding the best combination of hyperparameters for a ML model to achieve
optimal performance on a specific task or dataset, (2) application of various preprocessing methods [39],
and (3) definition and application of style-based and content-based features and combinations of them
[40].</p>
    </sec>
    <sec id="sec-6">
      <title>6. References</title>
      <p>[39] Y. HaCohen-Kerner, D. Miller, and Y. Yigal, The influence of preprocessing on text classification
using a bag-of-words representation. PloS one, 15(5), e0232525. 2020.
[40] Y. HaCohen-Kerner, H. Beck, E. Yehudai, M. Rosenstein, and D. Mughaz, Cuisine: Classification
using stylistic feature sets and/or name‐based feature sets, Journal of the American Society for
Information Science and Technology 61(8) ,2010 , 1644-1657.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Jay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Janschewitz</surname>
          </string-name>
          ,
          <article-title>The pragmatics of swearing</article-title>
          ,
          <source>Journal of Politeness Research 4</source>
          ,
          <year>2008</year>
          ,
          <fpage>267</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Filtering offensive language in online communities using grammatical relations</article-title>
          .
          <source>In Proceedings of the Seventh Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mandlia</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <article-title>Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages</article-title>
          .
          <source>In Proceedings of the 11th forum for information retrieval evaluation</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            , A.
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Overview of the hasoc track at fire 2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english and german</article-title>
          .
          <source>In Forum for Information Retrieval Evaluation</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Farra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , Semeval
          <article-title>-2019 task 6: Identifying and categorizing offensive language in social media (offenseval</article-title>
          ),
          <year>2019</year>
          , arXiv preprint arXiv:
          <year>1903</year>
          .08983.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Atanasova</surname>
          </string-name>
          , G. Karadzhov,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <article-title>Mubarak and Ç</article-title>
          . Çöltekin, SemEval-2020 task 12:
          <article-title>Multilingual offensive language identification in social media</article-title>
          (OffensEval
          <year>2020</year>
          ),
          <year>2020</year>
          , arXiv preprint arXiv:
          <year>2006</year>
          .07235.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>United</given-names>
            <surname>Nations</surname>
          </string-name>
          <article-title>Office of the High Commissioner for Human Rights</article-title>
          . (n.d.).
          <source>Hate Speech.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Spertus</surname>
          </string-name>
          ,
          <article-title>Smokey: Automatic recognition of hostile messages</article-title>
          , in: Aaai/iaai,
          <year>1997</year>
          , pp.
          <fpage>1058</fpage>
          -
          <lpage>1065</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kaufer</surname>
          </string-name>
          ,
          <article-title>Flaming: A white paper</article-title>
          , Department of English, Carnegie Mellon University,
          <source>Retrieved July 20</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>J.M. Xu</surname>
            ,
            <given-names>K.S.</given-names>
          </string-name>
          <string-name>
            <surname>Jun</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
            and
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bellmore</surname>
          </string-name>
          ,
          <article-title>Learning from bullying traces in social media, in: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies</article-title>
          ,
          <year>2012</year>
          , pp.
          <fpage>656</fpage>
          -
          <lpage>666</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hosseinmardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Mattson</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. I. Rafiq</surname>
          </string-name>
          , R. Han,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <article-title>Detection of cyberbullying incidents on the instagram social network</article-title>
          ,
          <year>2015</year>
          , arXiv preprint arXiv:
          <volume>1503</volume>
          .
          <fpage>03909</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Squicciarini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Rajtmajer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Griffin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Caragea</surname>
          </string-name>
          ,
          <article-title>Contentdriven detection of cyberbullying on the instagram social network</article-title>
          ,
          <source>in: IJCAI</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>3952</fpage>
          -
          <lpage>3958</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          ,
          <article-title>A survey on hate speech detection using natural language processing</article-title>
          ,
          <source>in: Proceedings of the Fifth International workshop on natural language processing for social media</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ishmam</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Sadia</surname>
          </string-name>
          .
          <article-title>"Hateful speech detection in public facebook pages for the bengali language." 2019 18th IEEE international conference on machine learning and applications (ICMLA)</article-title>
          . IEEE,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>N.</given-names>
            <surname>Baruah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. Arjunand N.</given-names>
            <surname>Mandira</surname>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>Detection of Hate Speech in Assamese Text." International Conference on Communication and Computational Technologies</source>
          . Singapore: Springer Nature Singapore,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>HaCohen-Kerner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ben-David</surname>
          </string-name>
          ,
          <string-name>
            <surname>G</surname>
          </string-name>
          , Didi,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rochman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Shayovitz</surname>
          </string-name>
          . JCTICOL at SemEval
          <article-title>-2019 Task 6: Classifying offensive language in social media using deep learning methods, word/character n-gram features, and preprocessing methods</article-title>
          .
          <source>In Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>645</fpage>
          -
          <lpage>651</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Uzan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>HaCohen-Kerner</surname>
          </string-name>
          . JCT at SemEval-2020
          <source>Task</source>
          <volume>12</volume>
          :
          <article-title>Offensive language detection in tweets using preprocessing methods, character and word n-grams</article-title>
          .
          <source>In Proceedings of the Fourteenth Workshop on Semantic Evaluation</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>2017</fpage>
          -
          <lpage>2022</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Uzan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>HaCohen-Kerner</surname>
          </string-name>
          .
          <article-title>Detecting Hate Speech Spreaders on Twitter using LSTM and BERT in English and Spanish</article-title>
          . CLEF,
          <year>2021</year>
          , pp.
          <fpage>2178</fpage>
          -
          <lpage>2185</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>HaCohen-Kerner</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Uzan</surname>
          </string-name>
          .
          <article-title>Detecting Offensive Language in English, Hindi, and Marathi using Classical Supervised Machine Learning Methods and Word/Char N-grams</article-title>
          .
          <article-title>Forum for Information Retrieval Evaluation (FIRE), CEUR-WS</article-title>
          .
          <year>Org</year>
          .
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ranasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Senapati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Dmonte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Satapara</surname>
          </string-name>
          ,
          <source>Overview of the HASOC Subtracks at FIRE</source>
          <year>2023</year>
          :
          <article-title>Hate speech and offensive content identification in Assamese, Bengali, Bodo, Gujarati, and Sinhala</article-title>
          .
          <source>In Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation</source>
          ,
          <string-name>
            <surname>FIRE</surname>
          </string-name>
          <year>2023</year>
          , Goa, India,
          <source>December 15-18</source>
          ,
          <year>2023</year>
          , ACM.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Senapati</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Annihilate Hates (Task 4</article-title>
          ,
          <string-name>
            <surname>HASOC</surname>
          </string-name>
          <year>2023</year>
          )
          <article-title>: Hate Speech Detection in Assamese, Bengali, and Bodo Languages</article-title>
          , In Working Notes FIRE 2023 -
          <article-title>Forum for Information Retrieval Evaluation</article-title>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          , December
          <volume>15</volume>
          -
          <issue>18</issue>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Senapati</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Pal</surname>
          </string-name>
          , Annihilate Hates Datasets, URL: https://sites.google.com/view/hasoc-2023
          <string-name>
            <surname>-</surname>
          </string-name>
          annihilate-hates/home.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          , Random forest,
          <source>Machine Learning 45(1)</source>
          ,
          <year>2001</year>
          ,
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          , Bagging predictors,
          <source>Machine Learning 24(2)</source>
          ,
          <year>1996</year>
          ,
          <fpage>123</fpage>
          -
          <lpage>140</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>T. K. Ho</surname>
          </string-name>
          ,
          <article-title>Random decision forests</article-title>
          ,
          <source>In Proceedings of 3rd International Conference on Document Analysis and Recognition</source>
          ,
          <year>1995</year>
          , Vol.
          <volume>1</volume>
          , pp.
          <fpage>278</fpage>
          -
          <lpage>282</lpage>
          , IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cortes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          ,
          <article-title>Support-vector networks</article-title>
          ,
          <source>Machine learning 20</source>
          ,
          <year>1995</year>
          ,
          <fpage>273</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>C.-C.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.-J.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>LIBSVM:</given-names>
          </string-name>
          <article-title>a library for support vector machines</article-title>
          ,
          <source>ACM transactions on intelligent systems and technology (TIST) 2</source>
          ,
          <year>2011</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <article-title>Multilayer perceptron, fuzzy sets, classification</article-title>
          ,
          <source>IEEE transactions on Neural Networks</source>
          <volume>3</volume>
          (
          <issue>5</issue>
          ),
          <year>1992</year>
          ,
          <fpage>683</fpage>
          -
          <lpage>697</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Cox</surname>
            ,
            <given-names>D. R.</given-names>
          </string-name>
          <article-title>The regression analysis of binary sequences</article-title>
          .
          <source>Journal of the Royal Statistical Society Series B: Statistical Methodology</source>
          ,
          <volume>20</volume>
          (
          <issue>2</issue>
          ),
          <fpage>215</fpage>
          -
          <lpage>232</lpage>
          .
          <year>1958</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>D. W. Hosmer</given-names>
            <surname>Jr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lemeshow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. X.</given-names>
            <surname>Sturdivant</surname>
          </string-name>
          , Applied logistic regression, Vol.
          <volume>398</volume>
          , John Wiley &amp; Sons. Applied logistic regression (Vol.
          <volume>398</volume>
          ). John Wiley &amp; Sons,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>arXiv preprint arXiv:1810.04805</source>
          .
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>B.</given-names>
            <surname>Abhik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tahmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Wasi</given-names>
            <surname>Uddin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. Md</given-names>
            <surname>Saiful</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Anindya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.M.</given-names>
            <surname>Sohel</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Rifat</surname>
          </string-name>
          ,
          <article-title>BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla</article-title>
          .
          <source>arXiv preprint arXiv: 2101.00204</source>
          .
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kowsher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.A.</given-names>
            <surname>Sami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.J.</given-names>
            <surname>Prottasha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.S.</given-names>
            <surname>Arefin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.K.</given-names>
            <surname>Dhar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Koshiba</surname>
          </string-name>
          , Bangla-BERT:
          <article-title>transformer-based efficient model for transfer learning and language understanding</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>10</volume>
          ,
          <fpage>91855</fpage>
          -
          <lpage>91870</lpage>
          .
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Van</surname>
            <given-names>Rossum</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guido</surname>
            ,
            <given-names>and Fred L.</given-names>
          </string-name>
          <string-name>
            <surname>Drake</surname>
          </string-name>
          .
          <article-title>Introduction to python 3: python documentation manual part 1</article-title>
          . CreateSpace,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Buitinck</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Louppe</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mueller</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>API design for machine learning software: experiences from the scikit-learn project</article-title>
          .
          <year>2013</year>
          . arXiv preprint arXiv:
          <volume>1309</volume>
          .
          <fpage>0238</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>C.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Millman</surname>
          </string-name>
          , K.J.,
          <string-name>
            <surname>Van Der Walt</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gommers</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Virtanen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wieser</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Berg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kern</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <article-title>Array programming with NumPy</article-title>
          .
          <source>Nature</source>
          ,
          <volume>585</volume>
          (
          <issue>7825</issue>
          ), pp.
          <fpage>357</fpage>
          -
          <lpage>362</lpage>
          .
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>McKinney</surname>
            ,
            <given-names>Wes.</given-names>
          </string-name>
          <article-title>"Data structures for statistical computing in python</article-title>
          .
          <source>Proceedings of the 9th Python in Science Conference</source>
          . Vol.
          <volume>445</volume>
          . No.
          <issue>1</issue>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>E.</given-names>
            <surname>Frank</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.R.</given-names>
            <surname>Bouckaert</surname>
          </string-name>
          ,
          <article-title>Naive bayes for text classification with unbalanced classes</article-title>
          .
          <source>In Knowledge Discovery in Databases: PKDD 2006: 10th European Conference on Principles and Practice of Knowledge Discovery in Databases Berlin, Germany, September 18-22</source>
          ,
          <year>2006</year>
          Proceedings 10 (pp.
          <fpage>503</fpage>
          -
          <lpage>510</lpage>
          ). Springer Berlin Heidelberg.
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>