<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>QutNocturnal@HASOC'19: CNN for Hate Speech and O ensive Content Identi cation in Hindi Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hindi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Electrical Engineering and Computer Science Queensland University of Technology</institution>
          ,
          <addr-line>Brisbane</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe our top-team solution to Task 1 for Hindi in the HASOC contest organised by FIRE 2019. The task is to identify hate speech and o ensive language in Hindi. More speci cally, it is a binary classi cation problem where a system is required to classify tweets into two classes: (a) Hate and O ensive (HOF) and (b) Not Hate or O ensive (NOT). In contrast to the popular idea of pretraining word vectors (a.k.a. word embedding) with a large corpus from a general domain such as Wikipedia, we used a relatively small collection of relevant tweets (i.e. random and sarcasm tweets in Hindi and Hinglish) for pretraining. We trained a Convolutional Neural Network (CNN) on top of the pretrained word vectors. This approach allowed us to be ranked rst for this task out of all teams. Our approach could easily be adapted to other applications where the goal is to predict class of a text when the provided context is limited.</p>
      </abstract>
      <kwd-group>
        <kwd>Hate Speech Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CNN</p>
      <p>
        Deep
The \Hate Speech and O ensive Content Identi cation in Indo-European
Languages" track1 (HASOC) is one of the tracks in FIRE 2019 conference2 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Task
1 in this track is identi cation of hate speech and O ensive (HOF) language in
English, German and Hindi in social media posts. In this paper, we describe
our approach to the solution of Task 1 in Hindi. The goal is to label a tweet
written in Hindi as HOF if it contains any form of non-acceptable language such
as hate speech, aggression or profanity; otherwise it is labelled as NOT. There
has been signi cant research on hate speech and o ensive content identi cation
in several languages, especially in English [
        <xref ref-type="bibr" rid="ref2 ref24 ref25 ref3 ref6">3, 2, 6, 25, 24</xref>
        ]. However, there is a lack
of work in most other languages. People are now realising the urgency of such
research in other languages. Recently, SemEval 2019 Task 5 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] was carried out
on detecting hate speech against immigrants and women in Spanish and
English messages extracted from Twitter, GermEval Share Task [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] was carried
out on the Identi cation of O ensive Language in German language tweets, and
TRAC-1 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] conducted a shared task on aggression identi cation in Hindi and
English. Therefore, HASOC Task 1 for Hindi intends to nd out the quality of
hate speech and o ensive content identi cation technology in Hindi.
      </p>
      <p>The training dataset is comprised of 4665 labelled tweets in Hindi. The
training dataset is created from Twitter and participants are allowed to use external
datasets for this task. In the competition setup, the testing dataset is
comprised of 1319 unlabelled tweets that were also created from Twitter. The testing
dataset and leaderboard were kept unknown to participants until the results were
announced. Competitors had to split the training set to get validation set and
use the validation set through the competition to compare models. The testing
set was only used at the end of the competition for the nal leaderboard.</p>
      <p>
        Th proposed approach relies on very little feature-engineering and
preprocessing as compared to many existing approaches. Section 2 discusses our
topranked model building approach. It consists of two steps: (a) pretraining word
vectors using a relevant collection of unlabelled tweets and (b) training a
Convolutional Neural Network (CNN) model using the labelled training set on top of
the pretrained word vectors. Section 3 describes other sophisticated alternative
models that we tried. Though these models did not perform as good as
compared to our winning model in this track, their performance provides further
insight into how to use machine learning models for identifying hate speech and
o ensive language in Hindi. Section 4 provides experimental results comparing
and analysing our various models both on testing set and validation set. The
source code of our model can be found online at [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Winning Model: QutNocturnal</title>
      <p>
        Data Collection
Labelled Contest Dataset The goal of Task 1 for Hindi is to predict the
class (HOF or NOT) of a given tweet written in Hindi. Out of 4665 labelled
tweets in the training set, 2469 (52.92%) are HOF and 2196 (47.07%) are NOT.
We randomly kept 20% of training data for validation set. We used ten cross
validation in the remaining training set for hyper parameter setting.
Unlabelled External Dataset It is a di cult task to separate abusive tweets
from tweets that are sarcastic, joking, or contained abusive keywords in a
nonabusive context [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Lexical detection methods tend to have low accuracy [
        <xref ref-type="bibr" rid="ref23 ref6">6,
23</xref>
        ] because they classify a tweet as abusive if it contains any abusive keywords.
Also tweets are signi cantly noisy and do not follow a standard language format.
For example, words in tweets are often misspelled, altered, written in Roman
letters, include local dialects or foreign languages. To transfer the knowledge of
these contexts to the CNN based deep learning model, we pretrain word vectors
using 0.5 million relevant tweets. More speci cally, we collected 4,94,311 random
tweets in Hindi (i.e. topic of discussion can be anything) using TrISMA3 and 5251
sarcasm tweets in Hinglish [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] (i.e. sarcasm in Hindi language but written in
Roman letters) from [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] for pretraining.
      </p>
      <p>
        Preprocessing We de-identi ed person occurrence (e.g. @someone) with xxatp,
url occurence with xxurl, source of modi ed retweet with xxrtm and source of
unmodi ed retweet with xxrtu. We xed the repeating characters (e.g. goooood)
in word and removed common invalid characters (e.g. &lt; br= &gt;, &lt; unk &gt;, @ @,
etc). We used html unescape to replace hexadecimal escape sequences with the
character that it represents. We used multi-language spaCy module4 to
lemmatize words and a lightweight stemmer for Hindi language [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] for stemming the
words.
2.2
      </p>
      <p>
        Word Embedding
Embedding models quantify semantic similarities between words based on their
distributional property that a word is characterised by the company it keeps.
These models quantify semantic properties of words by mapping co-occurring
words close to each other in an Euclidean space. Given a sizeable corpus, these
models can e ectively learn a high-quality word embedding from the co-occurrence
of words in the corpus. Word embedding maps each word from the vocabulary
to a vector of real numbers. Mikolov et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] proposed two popular models
for word embedding based on the feed-forward neural network: Skip-gram and
Continuous Bag-of-Words as shown in Figure 1.
      </p>
      <p>In embedding models, a sliding window of a xed size moves along the text of
a corpus. For a given position of the sliding window, let the word in the middle
is current word wi and the words on its left and right within the sliding window
are context words C. The continuous bag-of-words model predicts the current
word wi from the surrounding context words C, i.e. p(wijC). In contrast, the
skip-gram model uses the current word wi to predict the surrounding context
words C, i.e. p(Cjwi). In Figure 1, for example in this corpus, if the current
position of a running sliding window contains the phrase tum sirf chutiya kat ti
ho. In continuous bag-of-words, the context words ftum, sirf, kat, ti, hog can be
used to predict the current word fchutiyag, whereas, in skip-gram, the current
word fchutiyag can be used to predict the context words ftum, sirf, kat, ti, hog.</p>
      <p>
        The objective of model training is to nd a word embedding that maximises
p(wijC) or p(Cjwi) over a corpus. In each step of training, each word is either
(a) pulled closer to the words that co-occur with it or (b) pushed away from
all the words that do not co-occur with it. A softmax or approximate softmax
function can be used to achieve this objective [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. At the end of the training,
the embedding brings closer not only the words that are explicitly co-occurring
3
https://research.qut.edu.au/dmrc/projects/trisma-tracking-infrastructure-forsocial-media-analysis/
4 https://spacy.io/models/xx
      </p>
      <sec id="sec-2-1">
        <title>Projection</title>
      </sec>
      <sec id="sec-2-2">
        <title>Output</title>
      </sec>
      <sec id="sec-2-3">
        <title>Input</title>
      </sec>
      <sec id="sec-2-4">
        <title>Projection Output</title>
        <p>∑
wi
wi
∑
wi-2
wi-1
wi+1
wi+2
S
iil
d
n
g
W
i
n
d
o
w
w
o
d
n
i
W
g
n
iil
d
S</p>
      </sec>
      <sec id="sec-2-5">
        <title>Input</title>
        <p>wi-2
wi-1
wi+1
wi+2</p>
        <p>Continuous bag-of-words Skip-gram</p>
        <p>
          Fig. 1: Continuous Bag-of-Words and Skip-gram Word Embedding Models [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
in a training dataset, but also the words that implicitly co-occur. For example,
if w1 explicitly co-occurs with w2 and w2 explicitly co-occurs with w3, then the
model can bring closer not only w1 to w2, but also w1 to w3.
        </p>
        <p>
          We use the continuous bag-of-words model in this contest as this model is
faster and has a slightly better accuracy for the words that appear frequently
based on our experimental results. We implemented this model using the module
Word2Vec in Gensim Python library. We set the word vector dimension to 200,
minimum word count to 2, number of iteration in pretraining to 10, sliding
window size to 5 and maximum vocabulary count to 0. We run this model on
the unlabelled external dataset described in Section 2.1 to get the pretrain word
vectors. Our pretrained word vectors and corresponding python code to use them
in classi er are available online at [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
2.3
        </p>
        <p>
          Model Architecture
The proposed architecture of our top-ranked model CNN to identify hate speech
and o ensive language in Hindi is given in Figure 2. This is an empirically
customised and regulated version of the architecture that we have used in our prior
work of misogynistic tweets identi cation on Tweeter [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. In this architecture, we
use word embedding to represent each word w in an n-dimensional word vector
w 2 Rn. We represent a tweet t with m words as a matrix t 2 Rm n. We apply
convolution operation to the tweet matrix with one stride. Each convolution
operation applies a lter fi 2 Rh n of size h. Empirically, based on the accuracy
improvement in ten-fold cross validation, 256 lters are used for h 2 f3; 4g and
512 lters for h 2 f5g. The convolution is a function c(fi; t) = r(fi tk:k+h 1),
where tk:k+h 1 is the kth vertical slice of the tweet matrix from position k to
k + h 1, fi is the given lter and r is a Recti ed Linear Unit (ReLU) function
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. The function c(fi; t) produces a feature ck similar to nGrams for each slice
k, resulting in m h + 1 features. The max-pooling operation [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] is applied over
these features and the maximum value is taken, i.e. ^ci = max(c(fi; t)).
Maxpooling captures the most important feature for each lter. As there are a total
of 1024 lters (256+256+512) in the proposed model, the 1024 most important
features are learned from the convolution layer.
        </p>
        <p>Then, we pass these features to a fully connected hidden layer with 256
perceptrons that use the ReLU activation function. This fully connected hidden
layer learns the complex non-linear interactions between the features from the
convolution layer and generates 256 higher level new features. Finally, we pass
these 256 higher level features to the output layer with single perceptron that
uses the sigmoid activation function. The perceptron in output layer generates
the probability of the tweet being HOF or NOT.</p>
        <p>
          In this architecture (Figure 2), a proportion of units are randomly
droppedout from each layer except the output. This is done to prevent co-adaptation of
units in a layer and to reduce over tting. We set 50% units droppedout from the
input layer, the lters of size 3 and the fully connected hidden layer based on
best empirical results. Only 20% units are droppedout from the lters of size 4
and 5. Python code for this model is available online at [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>
          could
be
par
tum
sirf
chutiya
kat
ti
ho
{ Long Short-Term Memory Network (LSTM) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. We implement LSTM with
100 units, 50% dropout, binary cross-entropy loss function, Adam optimiser
and sigmoid activation.
{ Feedforward Deep Neural Network (DNN) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We implement DNN with
ve hidden layers, each layer containing eight units, 50% dropout applied to
the input layer and the rst two hidden layers, softmax activation and 0.04
learning rate. We manually tuned hyper parameters of all neural network
based models (CNN, LSTM, DNN) based on cross-validation.
{ Non NN models including Support Vector Machines (SVM) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], Random
Forest (RF) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], XGBoost (XGB) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], Multinomial Naive Bayes (MNB)
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], k-Nearest Neighbours (kNN) [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] and Ridge Classi er (RC) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. We
automatically tune hyper parameters of all these models using ten-fold
crossvalidation and GridSearch from scikit-learn. Among all the models, only
CNN and LSTM use transfer learning.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental Results</title>
      <p>A total of nine machine learning models, including the winning customised CNN
model, were trained to identify hate speech and o ensive language in Hindi. We
used transfer learning of word vectors for both CNN and LSTM. The word
vectors were pre-trained on a collection of relevant tweets and tuned with the
training dataset during the model training.
4.1</p>
      <p>Results
The experimental results comparing models in custom validation set are given
in Table 1. The detailed results of the winning CNN model in test dataset are
given in Table 2.5
5 In the absence of any other information except the email message about the top-team
performance, we are not able to provide the comparative results with other submitted
team results. We will update this table with the rest of the team performance, once
we receive information from the track organisers.
Experimental results in both validation and test set show that CNN outperforms
all other models. CNN is able to outperform LSTM and other baseline models
because of the speci c nature of tweets. For example, tweets can be super
condensed and indirect texts (e.g. satire), may not follow the standard sequence of
the language and be full of noise.</p>
      <p>Traditional models (e.g. SVM, XGBoost, RF, kNN, etc.) are based on
bag-ofwords assumption. The bag-of-words (or bag-of-phrases) representation cannot
capture sequences and patterns that are very important to identify hate speech
and o ensive contents in tweets. For example, if a tweet ends saying if you know
what I mean, there is a high chance that it is an o ensive tweet, even though
individual words are innocent.</p>
      <p>
        A LSTM model is popularly used in natural language processing research
because of its e ectiveness of handling sequences in text datasets. Empirical
results in Table 1 show that it performed as a second best model. However,
the sequence in a tweet can be highly impacted by the noise [
        <xref ref-type="bibr" rid="ref23 ref3">3, 23</xref>
        ], consequently
LSTM nds it di cult to identify the class. On the other hand, CNN can identify
many small and large patterns in a tweet, if some of them are impacted by noise
it can still use other patterns to identify the class.
      </p>
    </sec>
    <sec id="sec-4">
      <title>5 Conclusion</title>
      <p>We introduce an e ective method for the task of hate speech and o ensive
content identi cation in Hindi. We propose a custom CNN architecture built on
word vectors pre-trained on a relevant corpus from the task-speci c domain.
The proposed model was the top-ranked model in this task under the track.
We conducted a series of experiments conducted using state-of-the-art models.
Experimental results show that the contexts of hate speech and o ensive
content can be captured through transfer learning of word embeddings (a.k.a. word
vectors) and those contexts can signi cantly improve the performance of hate
speech and o ensive content identi cation. We also observed that when
transfer learning through word vectors is utilised, CNN performs better than LSTM
because of the noisy nature of tweets. CNN can identify many small and large
patterns in a tweet, if some of them gets altered by noise it can still use other
patterns to identify the class of the tweet. On the other hand, LSTM uses the
sequence of a tweet to identify its class, but noise in the tweet can alter the
sequence and make it hard for LSTM to identify the class.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>Python code and pretrained word vectors of qutnocturnal-hasoc2019</article-title>
          . https://github.com/mdabashar/QutNocturnal-Hasoc2019, accessed:
          <fpage>04</fpage>
          -
          <lpage>10</lpage>
          -2019
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Badjatiya</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varma</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Deep learning for hate speech detection in tweets</article-title>
          .
          <source>In: Proceedings of the 26th International Conference on World Wide Web Companion</source>
          . pp.
          <volume>759</volume>
          {
          <fpage>760</fpage>
          .
          <string-name>
            <surname>International World Wide Web Conferences Steering Committee</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bashar</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nayak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzor</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weir</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Misogynistic tweet detection: Modelling cnn with small datasets</article-title>
          .
          <source>In: Australasian Conference on Data Mining</source>
          . pp.
          <volume>3</volume>
          {
          <fpage>16</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fersini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nozza</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pardo</surname>
            ,
            <given-names>F.M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanguinetti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter</article-title>
          .
          <source>In: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          . pp.
          <volume>54</volume>
          {
          <issue>63</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          .
          <source>In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining</source>
          . pp.
          <volume>785</volume>
          {
          <fpage>794</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Davidson</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warmsley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Automated hate speech detection and the problem of o ensive language</article-title>
          .
          <source>arXiv preprint arXiv:1703.04009</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Glorot</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Understanding the di culty of training deep feedforward neural networks</article-title>
          .
          <source>In: Proceedings of the thirteenth international conference on arti cial intelligence and statistics</source>
          . pp.
          <volume>249</volume>
          {
          <issue>256</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hearst</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumais</surname>
            ,
            <given-names>S.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osuna</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Platt</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scholkopf</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Support vector machines</article-title>
          .
          <source>IEEE Intelligent Systems and their applications 13(4)</source>
          ,
          <volume>18</volume>
          {
          <fpage>28</fpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9(8)</source>
          ,
          <volume>1735</volume>
          {
          <fpage>1780</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hoerl</surname>
            ,
            <given-names>A.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kennard</surname>
          </string-name>
          , R.W.:
          <article-title>Ridge regression: applications to nonorthogonal problems</article-title>
          .
          <source>Technometrics</source>
          <volume>12</volume>
          (
          <issue>1</issue>
          ),
          <volume>69</volume>
          {
          <fpage>82</fpage>
          (
          <year>1970</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ojha</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malmasi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zampieri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Benchmarking aggression identi cation in social media</article-title>
          .
          <source>In: Proceedings of TRAC</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>D.D.:</given-names>
          </string-name>
          <article-title>Naive (bayes) at forty: The independence assumption in information retrieval</article-title>
          .
          <source>In: European conference on machine learning</source>
          . pp.
          <volume>4</volume>
          {
          <fpage>15</fpage>
          . Springer (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Liaw</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiener</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>Classi cation and regression by randomforest</article-title>
          .
          <source>R news 2(3)</source>
          ,
          <volume>18</volume>
          {
          <fpage>22</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mathur</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sawhney</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahata</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Detecting o ensive tweets in hindi-english code-switched language</article-title>
          .
          <source>In: Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media</source>
          . pp.
          <volume>18</volume>
          {
          <issue>26</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>3111</volume>
          {
          <issue>3119</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Modha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Overview of the HASOC track at FIRE 2019: Hate Speech and O ensive Content Identi cation in Indo-European Languages</article-title>
          .
          <source>In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Nair</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>Recti ed linear units improve restricted boltzmann machines</article-title>
          .
          <source>In: Proceedings of the 27th international conference on machine learning (ICML-10)</source>
          . pp.
          <volume>807</volume>
          {
          <issue>814</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Ramanathan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A lightweight stemmer for Hindi</article-title>
          . In: Workshop on Computational Linguistics for
          <string-name>
            <surname>South-Asian</surname>
            <given-names>Languages</given-names>
          </string-name>
          ,
          <string-name>
            <surname>EACL</surname>
          </string-name>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Swami</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khandelwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhtar</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shrivastava</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A corpus of english-hindi code-mixed tweets for sarcasm detection</article-title>
          . arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>11869</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Tolias</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sicre</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jegou</surname>
          </string-name>
          , H.:
          <article-title>Particular object retrieval with integral maxpooling of cnn activations</article-title>
          .
          <source>arXiv preprint arXiv:1511.05879</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Weinberger</surname>
            ,
            <given-names>K.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saul</surname>
            ,
            <given-names>L.K.</given-names>
          </string-name>
          :
          <article-title>Distance metric learning for large margin nearest neighbor classi cation</article-title>
          .
          <source>Journal of Machine Learning Research 10(Feb)</source>
          ,
          <volume>207</volume>
          {
          <fpage>244</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Wiegand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siegel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruppenhofer</surname>
          </string-name>
          , J.:
          <article-title>Overview of the germeval 2018 shared task on the identi cation of o ensive language (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Xiang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rose</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Detecting o ensive tweets via topical feature discovery over a large scale twitter corpus</article-title>
          .
          <source>In: Proceedings of the 21st ACM international conference on Information and knowledge management</source>
          . pp.
          <year>1980</year>
          {
          <year>1984</year>
          . ACM (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Zampieri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malmasi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenthal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farra</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          , R.:
          <article-title>Predicting the type and target of o ensive posts in social media</article-title>
          . arXiv preprint arXiv:
          <year>1902</year>
          .
          <volume>09666</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Zampieri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malmasi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenthal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farra</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          , R.:
          <article-title>Semeval-2019 task 6: Identifying and categorizing o ensive language in social media (o enseval)</article-title>
          .
          <source>In: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          . pp.
          <volume>75</volume>
          {
          <issue>86</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>