<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IIITG-ADBU at HASOC 2019: Automated Hate Speech and O ensive Content Detection in English and Code-Mixed Hindi Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arup Baruah</string-name>
          <email>arup.baruah@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ferdous Ahmed Barbhuiya</string-name>
          <email>ferdous@iiitg.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kuntal Dey</string-name>
          <email>kuntadey@in.ibm.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Comp. Sc. &amp; Engg.</institution>
          ,
          <addr-line>IIIT Guwahati</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IBM Research India</institution>
          ,
          <addr-line>New Delhi</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the results obtained by using Logistic Regression (LR), Support Vector Machine (SVM), bi-directional long short-term memory (BiLSTM) and Neural Network (NN) models for subtask A of the shared task \Hate Speech and O ensive Content Identi cation in Indo-European Languages" (HASOC). This paper presents the results for English and code-mixed Hindi language. Embeddings from Language Models (ELMo), Glove and fastText embeddings, and TF-IDF features of character and word n-grams have been used to train the models. Our best models for Hindi and English language obtained F1 score of 81.05 and 74.62 respectively on the o cial run. The models obtained the 4th and 8th position in the o cial ranking.</p>
      </abstract>
      <kwd-group>
        <kwd>Hate Speech Logistic Regression Support Vector Machine Bi-directional Long Short-Term Memory</kwd>
        <kwd>Glove</kwd>
        <kwd>fastText</kwd>
        <kwd>ELMo</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Social media has made it easier for people to communicate with one another.
Publishing content to reach a vast number of people has become very easy.
However, among the constructive dialogs that take place in social media, there
are also a few negative things that are happening in social media. Content that
is hateful, o ensive or profane is also being published. Such content are harmful
for the society. There are evidences where hateful content published via social
media has fueled communal riots in di erent parts of the world.</p>
      <p>
        There has been a growing interest among the research communities to use
machine learning and natural language processing techniques to automatically
detect hateful and o ensive content. As a step towards this direction, the shared
task \Hate Speech and O ensive Content Identi cation in Indo-European
Languages" (HASOC) has been organized [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This paper presents the results
obtained by our models for subtask A of HASOC. The goal of subtask A is to
detect if a given tweet is free from hateful and o ensive content or not.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Automated detection of o ensive, hateful, abusive, aggressive, and profane text
has seen the use of rule-based, traditional machine learning, and deep learning
techniques. Risch and Krestel [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] used a LR classi er to detect abusive language.
Features such as word and character n-grams, word2vec embeddings, word and
character count etc. were used in the study. Waseem [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] used SVM and LR
classi er to detect racist or sexist content. Nobata et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] used a regression
model to detect abusive content. Djuric et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] used a LR classi er to detect
hate speech. Among other features, this study used comment embeddings as
features. Serra et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] used a character-based RNN to detect hate speech in
tweets. Gamback and Sikdar [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] used a CNN to detect racist and sexist content.
Badjatiya et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] experimented with LR, SVM, Gradient Boosted Decision
Tree (GDBT), CNN, LSTM and FastText based models. Study on hate speech
detection in code-mixed Hindi-English data has been performed in Mathur et
al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Santosh and Aravind [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and Kamble and Joshi [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Dataset</title>
      <p>The dataset for Subtask A of HASOC has been labeled as either free from
hateful, o ensive and profane content or not. Trial, train and test datasets were
released for the subtask. Table 1 below shows the details of the dataset for
both English and Hindi. As can be seen from the table, the percentage of hate,
o ensive or profane content was more in the English trial dataset compared
to the English train dataset. For Hindi, the distribution of hate and non-hate
content was identical in both trial and train dataset. The Hindi dataset was
more balanced compared to the English dataset.</p>
      <p>It was observed that performance of the models used in this study improved
when English trial and train datasets were combined for training the models.
However, combining the Hindi trial and train dataset decreased the performance
of the models. Thus, only the train dataset was used for training the models for
Hindi.
4
4.1</p>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <sec id="sec-4-1">
        <title>Preprocessing</title>
        <p>We experimented by removing the URLs, hashtags, and mentions from the
English dataset. However, we found that removing each of them degraded the
performance of our models. Thus, for our nal models the dataset was used as was
provided without performing any preprocessing.
In our study, we used Embeddings from Language Models (ELMo), Glove, and
fastText embeddings. The Glove and fastText embeddings were used to train our
BiLSTM model. ELMo was used to train a simple neural network classi er. The
200 dimensional pre-trained Glove embeddings for Twitter dataset was used.
The Glove embeddings were used only for the English language models. The
fastText embeddings were used to train models for both English and Hindi. The
300 dimensional pre-trained fastText embeddings for English and Hindi were
used.</p>
        <p>For ELMo embeddings, we ne-tuned the ELMo module provided by
TensorFlow Hub. This module returns the ELMo embeddings for each word of the
sentence, as well as the vector for the complete sentence. We used the 1024
dimensional vector of the sentence to train a neural network classi er.
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Models</title>
        <p>We used the Logistic Regression (LR), Support Vector Machine (SVM),
Bidirectional Long Short-Term Memory (BiLSTM), an ELMo based Neural
Network (NN) and an ensemble of the ELMo based NN and character-based LR
classi ers. All the classi ers used are described below:
Logistic Regression: The LR classi er was used for both the English and
Hindi dataset. L2 regularization was used for the classi er. The hyperparameter
C was set to 1.2. The classi er was trained using the TF-IDF features of word
n-grams (1,3), character n-grams (1,6), and combination of word n-grams (1,3)
and character n-grams (1,6).</p>
        <p>
          Support Vector Machine: The SVM classi er was used for both English and
Hindi dataset. The `linear' kernel was used for the classi er. L2 regularization
was used and the hyperparameter C was set to 1.0. The classi er was trained
using the same TF-IDF features as mentioned above for the LR classi er.
Bi-directional Long Short-Term Memory: The BiLSTM model used in
this study is based on the architecture from Baruah et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The architecture
of the model is shown in Fig. 1. It consisted of a BiLSTM layer and two Dense
layers. The BiLSTM layer has 100 units and used a recurrent dropout of 0.10.
A dropout of 0.25 was applied to the output of this layer. Global max pooling
was applied on the output of the BiLSTM layer. The Dense layer that followed
had 100 units and it used the ReLU activation function. A dropout of 0.25 was
applied to the output of this layer also. The nal Dense layer had 1 unit and the
sigmoid activation function was used for this layer. The Adam optimizer and the
binary cross-entropy loss function was used for training.
        </p>
        <p>The model has been trained using 200 dimensional Glove embeddings, 300
dimensional English fastText embeddings, and 300 dimensional Hindi fastText
embeddings.</p>
        <p>ELMo based Neural Network: The architecture of the ELMo based neural
network is shown in Fig. 1. It consisted of an ELMo embedding layer and two
Dense layers. The rst Dense layer had 256 units and used the ReLU activation
function. The next Dense layer had 1 unit and used the sigmoid activation
function. The 1024 dimensional tweet vector obtained from the ELMo embedding
layer is used to train the network.
Ensemble: The architecture of the Ensemble model used is shown in Fig. 1. It
is the ensemble of the ELMo based NN classi er and the character n-gram based
LR classi er. The prediction from the two classi ers were averaged to obtain the
nal prediction.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>As mentioned in section 3, training of the models for English was performed
after combining the trial and train dataset. The models for Hindi were trained
using the train dataset only. For validation, a strati ed split of the dataset was
performed. 20% of the dataset was reserved as the validation dataset and the
remaining 80% was used for training the models. Table 2 and Table 3 presents
the results obtained by our models on the English and Hindi validation dataset
respectively.</p>
      <p>As can be seen from Table 2 that for English, the BiLSTM model trained on
pre-trained fastText embeddings performed the best on all the metrics
considered. It obtained a macro F1 score of 63.59. The second best F1 score of 63.61 was
obtained using ensemble of ELMo based NN and the character n-gram based LR
model. By itself, the ELMo based NN classi er performed the worst among all
the models with an F1 score of 60.26. However, it had the second-best precision
score of 64.99. Among the LR models, the one trained using both character and
word n-grams preformed the best with an F1 score of 63.23. The performance
of all the SVM models were almost identical.</p>
      <p>From Table 3, it can be seen that for Hindi, the SVM model trained on
character n-grams performed the best on all the metrics considered. The model
obtained an F1 score of 82.73. Word n-gram based models (both LR and SVM)
did not perform well for the Hindi dataset. The BiLSTM model trained using
fastText Hindi embeddings performed the worst with an F1 score of only 54.15.
The reason for this poor performance could be that the dataset was a code-mixed
dataset and it had English words also. Whereas the embeddings used was for
Hindi only.</p>
      <p>For Hindi, as can be seen from Table 5, both character-based LR and SVM
models performed equally well in predicting the non-hate category. The
characterbased SVM models were slightly better in predicting the hate category. Both
word-based LR and SVM models performed poorly in predicting the non-hate
category.</p>
      <p>From Table 6, it can be seen that the ELMo based NN model was the best in
predicting the non-hate category among all the models. However, it was poor in
predicting the hate category. For this reason, it was paired with the
characterbased LR model in our ensemble model. The fastText based BiLSTM model was
the second best in predicting the non-hate group. Compared to the ELMo based
NN model, its performance in predicting the hate category was much better.</p>
      <p>Based on these results obtained on the validation dataset, we selected the
following models for submission: fastText based BiLSTM (English Run 1), our
ensemble model (English Run 2), character and word n-gram based LR (English
Run 3), character n-gram based SVM (Hindi Run 1), character n-gram based
LR (Hindi Run 2), and character and word n-gram based SVM (Hindi Run 3).</p>
      <p>The o cial results for our models are listed in Table 7 and Table 8. As we
made an error in submitting the results for run 3 of the English language, the
results for this run are missing. As can be seen from the tables for English,
our best performing model on the test dataset was the fastText based BiLSTM
model. It obtained a macro F1 score of 74.62. This model obtained the 8th
position among 79 submissions for English. For Hindi, our best performing models
were the character-based LR and SVM models with F1 score of 81.05 and 80.98
respectively. These two models obtained the o cial ranking of 4th and 5th
position respectively among 37 submissions made for the Hindi. Table 9 shows the
confusion matrix of our models for the o cial run.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Hate speech and o ensive content in social media is potentially dangerous for the
society. As part of the shared task HASOC, this study used LR, SVM, BiLSTM
and NN models for automated detection of hate speech and o ensive content.
Features such as word and character n-grams, Glove, fastText and ELMo
embeddings were used in the study. Our best models obtained F1 score of 74.62
and 81.05 for English and Hindi dataset respectively. In our study, we did not
use features such as dependency relations, part-of-speech tags etc. Further
experiments can be performed to check if these features improve the performance
of the classi er.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Badjatiya</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varma</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Deep Learning for Hate Speech Detection in Tweets</article-title>
          .
          <source>In: WWW 2017</source>
          . pp.
          <volume>759</volume>
          {
          <fpage>760</fpage>
          .
          <string-name>
            <surname>Perth</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Baruah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barbhuiya</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dey</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bi-directional LSTM for Hate Speech Detection</article-title>
          .
          <source>In: 13th International Workshop on Semantic Evaluation</source>
          . pp.
          <volume>317</volume>
          {
          <fpage>376</fpage>
          .
          <string-name>
            <surname>Minneapolis</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Djuric</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morris</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grbovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radosavljevic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhamidipati</surname>
          </string-name>
          , N.:
          <article-title>Hate Speech Detection with Comment Embeddings</article-title>
          .
          <source>In: WWW 2015</source>
          . pp.
          <volume>29</volume>
          {
          <fpage>30</fpage>
          .
          <string-name>
            <surname>Florence</surname>
          </string-name>
          ,
          <string-name>
            <surname>Italy</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gamback</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sikdar</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Using Convolutional Neural Networks to Classify HateSpeech</article-title>
          .
          <source>In: ALW1 at ACL 2017</source>
          . pp.
          <volume>85</volume>
          {
          <fpage>90</fpage>
          .
          <string-name>
            <surname>Vancouver</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kamble</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models</article-title>
          .
          <source>In: 15th International Conference on Natural Language Processing</source>
          . pp.
          <volume>155</volume>
          {
          <fpage>160</fpage>
          .
          <string-name>
            <surname>Punjab</surname>
          </string-name>
          ,
          <string-name>
            <surname>India</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mathur</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sawhney</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahata</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Detecting o ensive tweets in HindiEnglish code-switched language</article-title>
          .
          <source>In: Sixth International Workshop on Natural Language Processing for Social Media</source>
          . pp.
          <volume>18</volume>
          {
          <fpage>26</fpage>
          .
          <string-name>
            <surname>Melbourne</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Modha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Overview of the HASOC track at FIRE 2019: Hate Speech and O ensive Content Identi cation in Indo-European Languages</article-title>
          .
          <source>In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nobata</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tetreault</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehdad</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Abusive Language Detection in Online User Content</article-title>
          .
          <source>In: WWW 2016</source>
          . pp.
          <volume>145</volume>
          {
          <fpage>153</fpage>
          .
          <string-name>
            <surname>Montreal</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Risch</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krestel</surname>
          </string-name>
          , R.:
          <article-title>Delete or not Delete? Semi-Automatic Comment Moderation for the Newsroom</article-title>
          .
          <source>In: TRAC-1 at COLING 2018</source>
          . pp.
          <volume>166</volume>
          {
          <fpage>176</fpage>
          .
          <string-name>
            <surname>Santa</surname>
            <given-names>Fe</given-names>
          </string-name>
          , USA (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Santosh</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aravind</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Hate Speech Detection in Hindi-English Code-Mixed Social Media Text</article-title>
          .
          <source>In: ACM India Joint International Conference on Data Science and Management of Data</source>
          . pp.
          <volume>310</volume>
          {
          <fpage>313</fpage>
          .
          <string-name>
            <surname>Kolkata</surname>
          </string-name>
          ,
          <string-name>
            <surname>India</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Serra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leontiadis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spathis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stringhini</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blackburn</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vakali</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Classbased Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words</article-title>
          .
          <source>In: ALW1 at ACL 2017</source>
          . pp.
          <volume>36</volume>
          {
          <fpage>40</fpage>
          .
          <string-name>
            <surname>Vancouver</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Waseem</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Are You a Racist or Am I Seeing Things? Annotator In uence on Hate Speech Detection on Twitter</article-title>
          .
          <source>In: NLP+CSS at EMNLP 2016</source>
          . pp.
          <volume>138</volume>
          {
          <fpage>142</fpage>
          . Austin, USA (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>