<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IIT Bombay at HASOC 2019: Supervised Hate Speech and O ensive Content Detection in Indo-European Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Urmi Saha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abhijeet Dubey</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pushpak Bhattacharyya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Technology</institution>
          ,
          <addr-line>Bombay</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Text classi cation is a classical problem in NLP and has impactful applications. An essential business application is hate speech detection from online data. With the enormous amount of social media data getting generated continuously across the world, detection of hate speech is considered a very challenging task in NLP. In this paper, we describe our approaches for three shared tasks on hate speech and o ensive content identi cation in Indo-European languages (Mandl et al. [9]). We describe statistical machine learning-based approaches as well as deep learning-based approaches and present their comparisons. We observe that convolutional neural networks perform quite well in the classi cation task.</p>
      </abstract>
      <kwd-group>
        <kwd>machine learning neural networks hate speech feature engineering word embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>
        Social media has become an important communication medium today. Social
media technology enables a piece of information to be spread quickly. With the
exponential growth of social media users, public platforms are often used to
express satisfaction or grievance regarding any product, service or experience. A
huge amount of such data is generated continuously across the world.
Unfortunately, these data often contain o ensive words, which can be considered as
hate speech. The anonymity and mobility provided by the social media platforms
have made the breeding and spread of hate speech (Zhang et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]) - eventually
leading to cybercrime.
      </p>
      <p>
        The term `hate speech' was formally de ned as `any communication that
disparages a person or a group on the basis of some characteristics (to be referred
to as types of hate or hate classes) such as race, colour, ethnicity, gender, sexual
orientation, nationality, religion, or other characteristics (Nockleby et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]). In
many countries, including the United Kingdom, Canada, and France, there are
laws prohibiting hate speech, which tends to be de ned as speech that targets
minority groups in a way that could promote violence or social disorder
(Davidson et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). Constructing such countermeasures for online speech requires as
rst step, correct identi cation of hate speech. Therefore, analyzing the quality
of this the huge amount of social media data has found its importance in many
NLP tasks. Hate speech detection is critical for applications like controversial
event extraction, building AI chatbots, content recommendation, and sentiment
analysis (Badjatiya et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]).
      </p>
      <p>In this paper, we use tweets in English, Hindi and German language and
perform three classi cation tasks on them:
{ Sub-task A: Hate and O ensive, and Non Hate-O ensive
{ Sub-task B: Hate speech, O ensive, and Profane
{ Sub-task C: Targeted Insult, and Untargeted</p>
      <p>We perform the above classi cation task using both statistical machine
learning methods as well as deep learning methods - we use SVM and CNN,
respectively. We perform feature engineering on our datasets before creating feature
vectors for SVM. For CNN, we create embedding vectors using di erent word
embeddings are use them in our model.</p>
      <p>The rest of the paper is organized as follows. We discuss the related work in
detail in the next section. Next, we describe our approaches in detail in Section
3. Then, we outline the experimental setup in Section 4 and present the results of
our experiments in Section 5. Finally, we conclude the paper and discuss future
work in Section 6.
2</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Traditional machine learning methods have performed quite well in classi
cation tasks. There has been extensive work on classi cation tasks with feature
engineering. Nobata et al.[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] experimented with several n-gram-based,
syntactic, and distributional semantic features and showed that character n-grams
contribute most for an online gradient descent learner. Sood et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] trained
several Support Vector Machine classi ers. They showed that that classi cation
performance keeps improving with increased datasets, but not as rapidly after
the data size had passed 1,500 items.
      </p>
      <p>
        With the constant generation of huge amounts of data, neural networks are
starting to take over statistical machine learning models. Gehrmann et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
compare rule-based and deep learning-based models on ten di erent phenotyping
tasks and show that CNNs outperform other phenotyping algorithms in all of
them.
      </p>
      <p>
        Word embeddings in neural networks are quite in uential for classi cation
tasks. Akhtar et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] show that sentiment embedded vectors make deep
learning architectures highly e cient. Recently, Xu et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] present a CrossNet
model based on context encoding layer, which learns from a source to analyze
an unseen similar destination target. Djuric et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] in their work describe how
low-dimensional representations of comments can be learnt using neural language
models and can be fed into a classi cation algorithm. Similarly, in our work, we
use domain-speci c word embeddings trained on tweets for our convolutional
neural network.
      </p>
      <p>
        Hate speech detection in the English language has been an extensive area of
research. Xiang et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] created o ensive language topic clusters using
Logistic Regression over a set of 860; 071 tweets and supplemented with a dictionary
of 339 o ensive words. Wulczyn et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] illustrates a method that combines
crowdsourcing and machine learning to analyze personal attacks at scale. Besides
English, substantial work has been done on hate speech detection in textual data
of other languages too. Al na et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] detect hate speech in the Indonesian
language. Kamble et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] use domain-speci c embeddings for hate speech detection
in English-Hindi code-mixed tweets. Our task involves hate speech and o ensive
content detection in Hindi and German tweets, besides English tweets. We use
these domain-speci c embeddings for our convolutional neural network model.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Approaches</title>
      <p>We implement three di erent approaches for the given task:
{ Support Vector Machine without feature engineering
{ Support Vector Machine with feature engineering
{ Convolutional Neural Network
3.1</p>
      <sec id="sec-4-1">
        <title>SVM without feature engineering</title>
        <p>In this approach, we rst clean the given tweets by the following steps:</p>
        <sec id="sec-4-1-1">
          <title>1. removing blank rows if any 2. replacing any digit with 0</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>3. modifying URLs to &lt;url&gt; 4. changing text to lowercase 5. tokenizing each tweet into words 6. removing stop words and performing stemming</title>
          <p>After the data is cleaned, the tweets are encoded as feature vectors and are
directly used as input to our SVM. Results of this method are shown in Section
5.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>SVM with feature engineering</title>
        <p>In this approach, we perform feature engineering before creating feature vectors
for SVM. We select a handful of features which carry some relevant information
about a tweet. For example, if the user uses one or more angry emoticons in
a tweet, the tweet is more likely to be carrying the hatred emotion towards
something.</p>
        <p>We select the following features and include them in our feature vector:
{ emoticons: we create a dictionary of happy, sad, anger, fear, surprise,
disgust, others emoticons and count the number of each of them used in a
tweet
{ hashtags: we extract out the words used in hashtags in a tweet. Users
often summarize their opinion through a hashtag. This can help nd out the
emotion expressed in a tweet.
{ intensi ers: words like exceptionally, incredibly, awful, insanely, etc are
often used to emphasize on some descriptive word. We call them intensi ers.
{ negations: words like never, no. nothing, nowhere, etc are used to thwart
the meaning of a piece of text.
{ hate words: we use a list of hate words and nd their occurrences in a tweet</p>
        <p>Other features include char n-grams, word n-grams, etc
3.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Convolutional Neural Network</title>
        <p>In the deep neural network approach, we tokenize each input sentence and nd
the embedding of each word. Our hate speech dataset consists of tweets in three
di erent languages. We use domain-speci c word embeddings which are trained
on Twitter data.</p>
        <p>We pass the embeddings through our convolutional layers with multiple lter
widths and feature maps. After each convolutional layer, we perform
max-overtime pooling before passing them through a nal fully connected layer with
softmax output. We train this model by minimizing the categorical cross-entropy
loss.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Setup</title>
      <p>For statistical machine learning-based approaches, we use SVM with RBF kernel
and c = 1:0 using grid-search and Random-forest with number of estimators
= 50. We use nltk3 libraries for all data processing steps in our SVM model.</p>
      <p>
        For our deep learning-based approaches, we use CNN. We have 2
convolutional layers with total 128 lters with size 5 and max-pooling of 5 and 30. We
make our CNN deeper by using multiple lters - 3, 4, and 5. We use di erent
word embeddings for each language which we feed into our model as input.
{ English: Used domain speci c word embeddings trained on Twitter domain
(Kamble et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].)
{ Hindi: Embeddings trained on Hindi corpus available in CFILT4.
{ German: Embeddings from Europarl trained with FastText.
      </p>
      <p>FastText considers sub-word embedding. It is helpful in our case, as tweets
are often informal and sub-word information should be given importance to
extract the semantics of tweets.</p>
      <p>We experiment with word embeddings of di erent dimensions and nd that
100 dimensional word embeddings perform the best.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Results</title>
      <p>{ Task 1: 0 - HOF, 1 - NOT
{ Task 2: 0 - PRFN, 1 - OFFN, 2 - HATE, 3: NONE
{ Task 3: 0 - TIN, 1 - UNT, 2 - NONE
In this paper, we present our approaches for the three tasks of HASOC2019.
We describe feature engineering for our statistical machine learning-based
approaches. We also use a list of hate words for each of the three languages and use
them in feature engineering for creating feature vectors of our model. We also
describe a deep learning-based approach. In this approach, we use domain-speci c
word embeddings and nd that word embeddings trained on a particular domain
performs better for a text from the same domain. We also see improvement in
3 https://www.nltk.org/
4 http://www.c lt.iitb.ac.in/
the performance of text classi cation by convolutional neural networks on using
domain-speci c word embeddings.</p>
      <p>In this paper, we show that statistical machine learning-based methods
perform well for classi cation tasks when the dataset is not very huge. However,
with the constant generation of data over social media platforms, such tasks
often need to cater to a wide variety with fast processing. Deep learning-based
methods perform better on such massive data.</p>
      <p>
        Our work is an e ort to improve the existing methodology of detecting hate
speech in social media comments. With the increase in social media usage by
people across the world, hate speech is getting generated and propagated at high
speed. Due to the massive scale of the web, methods that automatically detect
hate speech are required (Schmidt et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]). Both Facebook and Twitter have
responded to criticism for not doing enough to prevent hate speech on their sites
by instituting policies to prohibit the use of their platforms for attacks on people
based on characteristics like race, ethnicity, gender, and sexual orientation, or
threats of violence towards others (Bird et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). Hate speech detection is an
urgent problem to solve to avoid an increase in cybercrime.
      </p>
      <p>For future work, we will add more feature engineering and compare results
with the current model. We will run our deep neural network on a larger dataset,
using domain-speci c word embeddings and observe its performance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Akhtar</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ekbal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhattacharyya</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A hybrid deep learning architecture for sentiment analysis</article-title>
          .
          <source>In: Proceedings of COLING</source>
          <year>2016</year>
          ,
          <source>the 26th International Conference on Computational Linguistics: Technical Papers</source>
          . pp.
          <volume>482</volume>
          {
          <issue>493</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Al na,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Mulia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Fanany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.I.</given-names>
            ,
            <surname>Ekanata</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          :
          <article-title>Hate speech detection in the indonesian language: A dataset and preliminary study</article-title>
          .
          <source>In: 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS)</source>
          . pp.
          <volume>233</volume>
          {
          <fpage>238</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Badjatiya</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varma</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Deep learning for hate speech detection in tweets</article-title>
          .
          <source>In: Proceedings of the 26th International Conference on World Wide Web Companion</source>
          . pp.
          <volume>759</volume>
          {
          <fpage>760</fpage>
          .
          <string-name>
            <surname>International World Wide Web Conferences Steering Committee</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
          </string-name>
          , E.:
          <article-title>Natural language processing with Python: analyzing text with the natural language toolkit. "</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          ,
          <source>Inc."</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Davidson</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warmsley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Automated hate speech detection and the problem of o ensive language</article-title>
          .
          <source>In: Eleventh international aaai conference on web and social media</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Djuric</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morris</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grbovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radosavljevic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhamidipati</surname>
          </string-name>
          , N.:
          <article-title>Hate speech detection with comment embeddings</article-title>
          .
          <source>In: Proceedings of the 24th international conference on world wide web</source>
          . pp.
          <volume>29</volume>
          {
          <fpage>30</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gehrmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dernoncourt</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carlson</surname>
            ,
            <given-names>E.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welt</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Foote</surname>
            <given-names>Jr</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Moseley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.T.</given-names>
            ,
            <surname>Grant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.W.</given-names>
            ,
            <surname>Tyler</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.D.</surname>
          </string-name>
          , et al.:
          <article-title>Comparing rule-based and deep learning models for patient phenotyping</article-title>
          .
          <source>arXiv preprint arXiv:1703.08705</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kamble</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Hate speech detection from code-mixed hindi-english tweets using deep learning models</article-title>
          . arXiv preprint arXiv:
          <year>1811</year>
          .
          <volume>05145</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Modha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Overview of the HASOC track at FIRE 2019: Hate Speech and O ensive Content Identi cation in Indo-European Languages</article-title>
          . In:
          <article-title>Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Nobata</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tetreault</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehdad</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Abusive language detection in online user content</article-title>
          .
          <source>In: Proceedings of the 25th international conference on world wide web</source>
          . pp.
          <volume>145</volume>
          {
          <fpage>153</fpage>
          .
          <string-name>
            <surname>International World Wide Web Conferences Steering Committee</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Nockleby</surname>
          </string-name>
          , J.T.:
          <article-title>Hate speech</article-title>
          .
          <source>Encyclopedia of the American constitution 3</source>
          , 1277{
          <fpage>1279</fpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegand</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A survey on hate speech detection using natural language processing</article-title>
          .
          <source>In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media</source>
          . pp.
          <volume>1</volume>
          {
          <issue>10</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sood</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Churchill</surname>
            ,
            <given-names>E.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Automatic identi cation of personal insults on social news sites</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>63</volume>
          (
          <issue>2</issue>
          ),
          <volume>270</volume>
          {
          <fpage>285</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wulczyn</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thain</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dixon</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Ex machina: Personal attacks seen at scale</article-title>
          .
          <source>In: Proceedings of the 26th International Conference on World Wide Web</source>
          . pp.
          <volume>1391</volume>
          {
          <fpage>1399</fpage>
          .
          <string-name>
            <surname>International World Wide Web Conferences Steering Committee</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Xiang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rose</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Detecting o ensive tweets via topical feature discovery over a large scale twitter corpus</article-title>
          .
          <source>In: Proceedings of the 21st ACM international conference on Information and knowledge management</source>
          . pp.
          <year>1980</year>
          {
          <year>1984</year>
          . ACM (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , Paris,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Nepal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Sparks</surname>
          </string-name>
          , R.:
          <article-title>Cross-target stance classi cation with self-attention networks</article-title>
          .
          <source>arXiv preprint arXiv:1805</source>
          .
          <volume>06593</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Hate speech detection: A solved problem? the challenging case of long tail on twitter</article-title>
          .
          <source>Semantic Web (Preprint)</source>
          ,
          <volume>1</volume>
          {
          <fpage>21</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>